Vectorisation in numpy

By Martin McBride, 2021-02-27
Tags: arrays data types vectorisation
Categories: numpy

Vectorisation is the secret sauce of NumPy. It allows you to perform element-wise operations on NumPy arrays without using Python loops. Behind the scenes, the processing is done by optimised C code.

This can allow many array operations to be written in simple Python but execute almost as fast as C code. This can make a huge difference when processing a large array, such as an array of image data.

Performing simple maths on an array

Here is a simple example of vectorisation:

import numpy as np

a = np.array([1.1, 3.6, 4.0, 8.2])
b = a + 1

We first create an array with content [1.1, 3.6, 4.0, 8.2]. Then we execute:

b = a + 1

Now because a is a NumPy array, Python uses the NumPy version of the + operator. This operator adds 1 to each element in the array, resulting in this:

b = [2.1 4.6 5.  9.2]

That is vectorisation in a nutshell. You can apply pretty much any Python maths operator to an array, and it will automatically be applied to every element of that array, at lightning speed!

Here is the equivalent code to do a similar thing with a Python list:

a = [1.1, 3.6, 4.0, 8.2]
b = []

for x in a:
    b.append(x + 1)

Or if you prefer to use a list comprehension:

a = [1.1, 3.6, 4.0, 8.2]
b = [x + 1 for x in a]

As you can see, not only is the NumPy version faster, it is also shorter and more readable!

Vectorisation with other data types

In this case we will use the dtype parameter to create an array of 16 bit integers:

a = np.array([1, 3, 4, 8], dtype=np.int16)
b = a * 2

This time we are multiplying a by 2. Again, we are using the NumPy * operator. If a was a list, the multiply operator would do something very different of course! But for a NumPy array, it simple doubles every element:

[ 2  6  8 16]

If you check the dtype of b, you will find it is also int16. The new array takes the type of the original array.

In fact, that isn't quite true. Similar to normal Python arithmetic, ints can be converted to floats automatically when required. So for example:

a = np.array([1, 3, 4, 8], dtype=np.int16)
b = a * 2.1

Because we are multiplying an int by a float, the result is automatically a float, so we get this:

[ 2.1  6.3  8.4 16.8]

Which is a float array.

Vectorisation with multi-dimensional arrays

We can apply vectorisation to a multi-dimensional array. NumPy just applies the operation to every element in the array, we don't need to do anything special:

a = np.array([[7, 5, 3],
              [2, 4, 6]], dtype=np.int16)
b = a*2 - 5

This gives:

[[ 9  5  1]
 [-1  3  7]]

Notice also that we have used a compound expression - we are multiplying a by 2, then subtracting 5.

Expressions using two arrays

You can use more than one array in a NumPy expression. For example:

a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

c = a + b

This performs an element by element addition of the two arrays a and b, to create the result in c. This gives:

c = [11 22 33 44]

This result is obtained by adding each corresponding element in a and b:


The first element of each array (1 and 10) add to give the first element of the result (11). The second element of each array (2 and 20) add to give the second element of the result (22). And so on.

Arrays must have compatible shapes to be combined in this way. Two arrays that have the exact same shape will always be compatible. However, NumPy arrays also support broadcasting. This allows two arrays of different shapes to be matched under specific circumstances, by replicating elements to make them the same shape. This is covered in a later chapter.

Expressions using two multi-dimensional arrays

You can use multi-dimensional arrays in a NumPy expression. For example here we use 2 by 2 arrays:

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = a * b

This performs an element by element multiplication of the two arrays a and b, to create the result in c. This gives:

c = [[ 5 12]
     [21 32]]

This result is obtained by multiplying each corresponding element in a and b:


The element [0, 0] of each array (1 and 5) multiply to give the element [0, 0] of the result (5). The element [0, 1] of each array (2 and 6) multiply to give the element [0, 1] of the result (12). And so on.

If you are familiar with matrix multiplication note that NumPy array multiplication doesn't work in the same way. You can use the NumPy dot function to perform matrix multiplication, but the * operator always performs element by element multiplication.

4.6 More complex expressions

You can, of course, use expressions that include multiple arrays, for example:

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
c = np.array([10, 11, 12, 13])
d = np.array([14, 15, 16, 17])

e = a * b + c**2 + c**3 +2*d

As always, the arrays must be compatible, and broadcasting can be used.

Using conditional operators

You can vectorise conditional operators:

a = np.array([1, 6, 9, 4, 2, 8, 7])
b = a > 5

What will this give us? Well a regular conditional expression returns a bool value, so a vectorised conditional expression will give us an array of NumPy bools:

b = [False  True  True False False  True  True]

The array b is true for every element of a that is greater than 5, false otherwise.

Combining conditional operators

You cannot use and or or directly with a numpy array. This is to say, the following are not allowed:

a = np.array([1, 2, 3, 4, 5, 6])

b = a > 2 and a < 5  # Not allowed
c = a < 3 or a > 4   # Not allowed

Instead you must use special NumPy universal functions. For example np.logical_and is used in place of and. This is covered further the universal functions article.

See also

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest