Martin McBride, 2021-09-21
Tags arrays data types vectorisation pandas matplotlib scipy data science
NumPy is a Python package that allows you to efficiently store and process large arrays of numerical data. Obvious examples of this type of data are sound data and image data, but NumPy can also be used anywhere you have large data sets to process.
Part of the attraction of NumPy is that it uses simple and familiar Python syntax to perform complex operations on arrays, which simplifies your code. The other benefit is that NumPy is highly efficient, both in terms of speed and memory usage. These two factors are not unrelated - NumPy provides high-level array operations, and these operations are efficient because, under the hood, the entire processing loop is written in C.
In this tutorial, we will take a quick tour of NumPy arrays.
Before you start, you will need to install NumPy. The official numpy.org site will point you at the latest version, with instructions for installing the package.
NumPy for data science
NumPy is a key library for handling large numerical data sets in Python. It is often used as the interface between other data science libraries such as Pandas, Matplotlib, and SciPy
First, of course, you must import the NumPy package. It is common practice to
import numpy as np (so that you can use the short name
np in your code). You don't have to, but most people who use NumPy do and will recognise the
>>> import numpy as np
Creating NumPy arrays
There are several ways to create NumPy arrays, we will just look at a couple of methods here.
You can create an array of zeros using the
zeros function, supplying the required array length:
>>> a = np.zeros(5) >>> print(a) [ 0. 0. 0. 0. 0.] >>> m = np.zeros((3, 4)) >>> print(m) [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]]
As you can see, we can also create a 2-dimensional array by passing in a tuple such as (3, 4) to specify the number of rows and columns. You can create 3-dimensional array by passing in a tuple with 3 values, etc. You can have as many dimensions as you like.
You can also initialise an array from the values in a list, using the
>>> a = np.array([2, 4, 6, 8]) >>> print(a) [2 4 6 8]
A multidimensional list will create a multidimensional NumPy array:
>>> m = np.array([[1, 2], [3, 4], [5, 6]]) >>> print(m) [[1 2] [3 4] [5 6]]
When you apply arithmetic operations to NumPy arrays, they are automatically applied to each element individually. This is called vectorisation. Here is a simple example:
>>> x = np.array([1, 3, 5, 7]) >>> y = np.array([0, 1, 2, 3]) >>> z = x * y >>> print(z) [ 0 3 10 21]
Each element of
z is calculated by multiplying together the corresponding elements of
- x is 1, y is 0, so z is 0
- x is 3, y is 1, so z is 3
- x is 5, y is 2, so z is 10, etc
This makes your code a lot neater, but it is also usually faster. The implicit loop is performed in NumPy's native C code, which is usually faster than a Python for loop.
NumPy has its own versions of common maths functions like sin, cos, exp etc, that are applied to each element individually. For example:
>>> a = np.array([1, 4, 9, 16]) >>> b = np.sqrt(a) >>> print(b) [1. 2. 3. 4.]
This code applies the square root function to all the elements in
a and creates a new NumPy array with the results. As with vectorised operators, the implicit loop is performed very efficiently.
You can slice NumPy arrays, just like lists. You can also slice multidimensional arrays. For example, this code inserts a 2 row by 4 column array into the middle two rows of a 4 by 4 array
>>> a = np.zeros((4, 4)) >>> b = np.array([[1., 2., 3., 4.], [5., 6., 7., 8.]]) >>> a[1:3] = b >>> print(a) [[ 0. 0. 0. 0.] [ 1. 2. 3. 4.] [ 5. 6. 7. 8.] [ 0. 0. 0. 0.]]
You can also insert a 4 by 2 array into a 4 by 4 array:
>>> a = np.zeros((4, 4)) >>> b = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]]) >>> a[:,1:3] = b >>> print(a) [[ 0. 1. 2. 0.] [ 0. 3. 4. 0.] [ 0. 5. 6. 0.] [ 0. 7. 8. 0.]]
You can slice in more than one dimension, and copy a slice of one array into a slice of another. This code copies the middle 4 elements of
into the bottom right corner of
>>> a = np.zeros((4, 4)) >>> b = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]]) >>> a[2:4, 2:4] = b[1:3] >>> print(a) [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 3. 4.] [ 0. 0. 5. 6.]]
NumPy uses homogeneous arrays (all the elements must be the same type). This is different to a Python list, where different elements of the same list can have different types.
By default, when you create an array with the
zeros function, it will contain floating-point values. You can choose a different type by using
dtype parameter. For example, this creates an array of 16-bit integer values.
>>> a = np.ones(4, dtype=np.int16) >>> print(a) [1 1 1 1]
If you create an array using the
array function, the data type will depend on the types in the source list. If the source list is all integers, the
NumPy array will contain ints. If the list contains any floats, the array will contain floats. If the list is a mixture, the array will contain all floats, with the
integer values converted to float. Once again, you can use the
dtype parameter to override this.
Arrays filled with a value range
You can use the
arange function to fill an array with a range of values:
>>> a = np.arange(5) >>> print(a) [0 1 2 3 4]
This function can be used with optional start and step arguments, just like the standard
range function. But you can also use float values:
>>> a = np.arange(1.0, 3.0, .3) >>> print(a) [ 1. 1.3 1.6 1.9 2.2 2.5 2.8]
An alternative function,
linspace, allows you to specify the exact start and end values, and the exact number of elements, and it will calculate
the increment between the values:
>>> a = np.linspace(1.0, 3.0, 5) >>> print(a) [ 1. 1.5 2. 2.5 3. ]
This has just been a quick introduction to NumPy arrays. You can learn more by following the more detailed articles in the rest of this tutorial, or by visiting the numpy.org site.
Visit the PythonInformer Discussion Forum for numeric Python.