Vectorisation in numpy
Martin McBride, 2021-02-27
Tags arrays data types vectorisation
Vectorisation is the secret sauce of NumPy. It allows you to perform element-wise operations on NumPy arrays without using Python loops. Behind the scenes, the processing is done by optimised C code.
This can allow many array operations to be written in simple Python but execute almost as fast as C code. This can make a huge difference when processing a large array, such as an array of image data.
Performing simple maths on an array
Here is a simple example of vectorisation:
import numpy as np a = np.array([1.1, 3.6, 4.0, 8.2]) b = a + 1
We first create an array with content
[1.1, 3.6, 4.0, 8.2]. Then we execute:
b = a + 1
a is a NumPy array, Python uses the NumPy version of the
+ operator. This operator adds 1 to each element in the array, resulting in this:
b = [2.1 4.6 5. 9.2]
That is vectorisation in a nutshell. You can apply pretty much any Python maths operator to an array, and it will automatically be applied to every element of that array, at lightning speed!
Here is the equivalent code to do a similar thing with a Python list:
a = [1.1, 3.6, 4.0, 8.2] b =  for x in a: b.append(x + 1)
Or if you prefer to use a list comprehension:
a = [1.1, 3.6, 4.0, 8.2] b = [x + 1 for x in a]
As you can see, not only is the NumPy version faster, it is also shorter and more readable!
Vectorisation with other data types
In this case we will use the
dtype parameter to create an array of 16 bit integers:
a = np.array([1, 3, 4, 8], dtype=np.int16) b = a * 2
This time we are multiplying
a by 2. Again, we are using the NumPy
* operator. If
a was a list, the multiply operator would do something very different of course! But for a NumPy array, it simple doubles every element:
[ 2 6 8 16]
If you check the
b, you will find it is also
int16. The new array takes the type of the original array.
In fact, that isn't quite true. Similar to normal Python arithmetic, ints can be converted to floats automatically when required. So for example:
a = np.array([1, 3, 4, 8], dtype=np.int16) b = a * 2.1
Because we are multiplying an int by a float, the result is automatically a float, so we get this:
[ 2.1 6.3 8.4 16.8]
Which is a float array.
Vectorisation with multi-dimensional arrays
We can apply vectorisation to a multi-dimensional array. NumPy just applies the operation to every element in the array, we don't need to do anything special:
a = np.array([[7, 5, 3], [2, 4, 6]], dtype=np.int16) b = a*2 - 5
[[ 9 5 1] [-1 3 7]]
Notice also that we have used a compound expression - we are multiplying
a by 2, then subtracting 5.
Expressions using two arrays
You can use more than one array in a NumPy expression. For example:
a = np.array([1, 2, 3, 4]) b = np.array([10, 20, 30, 40]) c = a + b
This performs an element by element addition of the two arrays
b, to create the result in
c. This gives:
c = [11 22 33 44]
This result is obtained by adding each corresponding element in
The first element of each array (1 and 10) add to give the first element of the result (11). The second element of each array (2 and 20) add to give the second element of the result (22). And so on.
Arrays must have compatible shapes to be combined in this way. Two arrays that have the exact same shape will always be compatible. However, NumPy arrays also support broadcasting. This allows two arrays of different shapes to be matched under specific circumstances, by replicating elements to make them the same shape. This is covered in a later chapter.
Expressions using two multi-dimensional arrays
You can use multi-dimensional arrays in a NumPy expression. For example here we use 2 by 2 arrays:
a = np.array([[1, 2], [3, 4]]) b = np.array([[5, 6], [7, 8]]) c = a * b
This performs an element by element multiplication of the two arrays
b, to create the result in
c. This gives:
c = [[ 5 12] [21 32]]
This result is obtained by multiplying each corresponding element in
The element [0, 0] of each array (1 and 5) multiply to give the element [0, 0] of the result (5). The element [0, 1] of each array (2 and 6) multiply to give the element [0, 1] of the result (12). And so on.
If you are familiar with matrix multiplication note that NumPy array multiplication doesn't work in the same way. You can use the NumPy
dot function to perform matrix multiplication, but the
* operator always performs element by element multiplication.
4.6 More complex expressions
You can, of course, use expressions that include multiple arrays, for example:
a = np.array([1, 2, 3, 4]) b = np.array([5, 6, 7, 8]) c = np.array([10, 11, 12, 13]) d = np.array([14, 15, 16, 17]) e = a * b + c**2 + c**3 +2*d
As always, the arrays must be compatible, and broadcasting can be used.
Using conditional operators
You can vectorise conditional operators:
a = np.array([1, 6, 9, 4, 2, 8, 7]) b = a > 5
What will this give us? Well a regular conditional expression returns a
bool value, so a vectorised conditional expression will give us an array of NumPy bools:
b = [False True True False False True True]
b is true for every element of
a that is greater than 5, false otherwise.
Combining conditional operators
You cannot use
or directly with a numpy array. This is to say, the following are not allowed:
a = np.array([1, 2, 3, 4, 5, 6]) b = a > 2 and a < 5 # Not allowed c = a < 3 or a > 4 # Not allowed
Instead you must use special NumPy universal functions. For example
np.logical_and is used in place of
and. This is covered further the universal functions article.
Visit the PythonInformer Discussion Forum for numeric Python.