Creating data series in numpy
Categories: numpy
In this article, we will look at how to create numpy arrays initialised with data series. There are three main ways to do this:
arangecreates a data series based on a start value, an end value, and a step value.linspacecreates a data series based on a start value, an end value, and the required array length.- Vectorisation can be used to create a more complex series.
Why use a data series?
Typically, in Python, if we want to perform a repeated operation on a sequence of numbers, we might use a for loop, something like this:
r = []
for i in range(4): # Loop over 0, 1, 2, 3
r.append(i*2)
print(r) # [0, 2, 4, 6]
This code loops over values of i from 0 to 3, and for each value of i it appends i*2 to the list r. This gives the result [0, 2, 4, 6].
When we use NumPy, we try to avoid using explicit loops. It is much more efficient to use vectorisation to process an entire NumPy array in one operation. So the equivalent code would be:
from numpy import arange
a = arange(4) # a = [0 1 2 3]
r = a*2
print(r) # [0 2 4 6]
Here, we have used arange instead of a loop to create the list of input values in a. When we multiply the NumPy array a by 2, this is a vectorised operation. The whole array is processed by optimised C code in the NumPy library. If our array contained millions of elements, that would be a lot faster than the Python loop.
arange
As we have seen, arange works similarly to the built-in range function, except that it creates a numpy array. The other difference is that it can work with floating-point values:
r1 = arange(4.9)
print(r1) # [0. 1. 2. 3. 4.]
r2 = arange(.5, 4.9)
print(r2) # [0.5 1.5 2.5 3.5 4.5]
r3 = arange(.5, 4.9, 1.3)
print(r3) # [0.5 1.8 3.1 4.4]
r1 uses the default start and step values. It counts from 0.0 up to but not including 4.9, in steps of 1.0:
[0. 1. 2. 3. 4.]
r2 uses the default step value. It counts from 0.5 up to but not including 4.9, in steps of 1.0:
[0.5 1.5 2.5 3.5 4.5]
r3 counts from 0.5 up to but not including 4.9, in steps of 1.3:
[0.5 1.8 3.1 4.4]
Setting the type
You can set the data type of the array using the dtype parameter of arange:
r = arange(5, dtype='np.int8')
print(r)
This creates an array of 8-bit integers:
[0 1 2 3 4]
All the functions described in this section support dtype. The types available are described in data types.
arange and rounding errors
There is a potential problem with arange when using floating point values. Consider this:
r = arange(10.0, 10.4, 0.05)
print(r)
We would expect this to create an array containing values from 10.0, up to but not including 10.4, in steps of 0.05. We would expect this:
[10. 10.05 10.1 10.15 10.2 10.25 10.3 10.35]
But, in fact, we get this:
[10. 10.05 10.1 10.15 10.2 10.25 10.3 10.35 10.4]
This array includes all the expected values, but also an extra value of 10.4.
What is going on here? Well, if we start with 10.0, and then add 0.5 to it 8 times, we would expect to get 10.4. But when computers add floating-point numbers, there are sometimes very small errors due to the way floating-point numbers are represented. When we take 10.0 and add 0.5 8 times, we don't get exactly 10.4. Instead, we get something like 10.3999999999999999.
So the arange function decides that it hasn't quite reached 10.4 yet, and therefore adds the extra value to the list.
To make things even more confusing, when Python prints the number 10.3999999999999999, the print function rounds it up to 10.4, hiding the original problem.
That means that in some cases, the length of the array could change depending on tiny rounding errors. One way around this is to make sure we specify a final value that is not close to a valid value. For example:
r = np.arange(10.0, 10.39, 0.05)
print(r)
The previous endpoint of 10.4 was unreliable because 10.4 is a value in the series. But if we use 10.39 instead, that is safe. The value 10.4 will always test as greater than 3.9, even with rounding errors, because rounding errors are very small.
Often, a better solution is to use linspace, which we will look at next.
linspace
linspace creates a series of equally spaced numbers, in a similar way to arange. The difference is that linspace specifies the start and end points, plus the required number of steps:
from numpy import linspace
k = linspace(0, 10, 5)
print(k)
This prints:
[ 0. 2.5 5. 7.5 10. ]
That is, 5 equally spaced values between 0 and 10, inclusive. Unlike arange, the start and end values will be exactly correct (exactly 0 and 10) because they are specified rather than being calculated. You will also get exactly the required number of elements in the array.
endpoint parameter for linspace
endpoint can be set to False to alter the behaviour of linspace (to make it a bit more like arange):
k = linspace(0, 10, 4, endpoint=False)
print(k)
In this case, linspace creates 5 equally spaced values, just like the previous example. However, it doesn't return the final value because endpoint is false. So only the first 4 values are returned.
[0. 2.5 5. 7.5]
Notice that the third parameter has a value of 4, rather than the value 5 which was used before. This can be a little confusing. What this parameter is requesting is that the function should return 4 values, but those values should not include the endpoint.
retstep parameter for linspace
retstep can be set to True to obtain the step size used by linspace. The sample array and the step are returned as a tuple:
k, step = linspace(0, 10, 5, retstep=True)
print(k)
print(step)
This prints:
[ 0. 2.5 5. 7.5 10. ] # samples
2.5 # step size
Using vectorisation
If you need a non-standard data series, it will usually be most efficient to use vectorisation if possible.
For example, to create a series containing the cubes of each number: 0, 8, 27, 64... you could do this:
cubes = arange(10)**3
print(cubes)
This will normally be a lot quicker than using a Python loop. In this case, the result is:
[ 0 1 8 27 64 125 216 343 512 729]
Related articles
Join the GraphicMaths/PythonInformer Newsletter
Sign up using this form to receive an email when new content is added to the graphpicmaths or pythoninformer websites:
Popular tags
2d arrays abstract data type and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes close closure cmyk colour combinations comparison operator context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing integer iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library range recipes rectangle recursion regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slicing sound spirograph sprite square str stream string stroke structural pattern symmetric encryption template tex text tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest