Creating data series in numpy
Martin McBride, 2019-09-15
Tags arrays data types arange linspace vectorisation
In this section we will look at how to create numpy arrays initialised with data series.
arange works in a similar way to the built-in
range function, except that it creates a numpy array. The other difference is that it can work with floating point values:
r1 = np.arange(4.9) print(r1) r2 = np.arange(.5, 4.9) print(r2) r3 = np.arange(.5, 4.9, 1.3) print(r3)
r1 uses the default start and step values. It counts from 0.0 up to but not including 4.9, in steps of 1.0:
[0. 1. 2. 3. 4.]
r2 uses the default step value. It counts from 0.5 up to but not including 4.9, in steps of 1.0:
[0.5 1.5 2.5 3.5 4.5]
r3 counts from 0.5 up to but not including 4.9, in steps of 1.3:
[0.5 1.8 3.1 4.4]
Setting the type
You can set the type of the array using the
dtype parameter of arange:
i1 = np.arange(5, dtype='np.int8') print(11)
THis creates an array of 8 bit integers:
[0 1 2 3 4]
All the functions described in this section support
dtype. The types available are described in data types.
arange and rounding errors
There is a potential problem with
arange when using floating point values. Consider this:
r2 = np.arange(0, 6, 1.2)
This creates an array:
[0. 1.2 2.4 3.6 4.8]
As you would expect. The next element is 6.0, and since
arange counts up to but not including 6.0, the array has only 5 elements.
A problem could occur if a rounding error caused the final calculation to be very slightly wrong, for example 5.999999999999999. Since that is less than 6.0, the final element would be included in the array, so it would now have 6 elements.
That means that the in some cases length of the array could change depending on tiny rounding errors. A possible solution is
linspace creates a series of equally spaced numbers, in a similar way to
arange. The difference is that
linspace specifies the start and end points, plus the required number of steps:
k5 = np.linspace(0, 10, 5) print(k5)
[ 0. 2.5 5. 7.5 10. ]
That is, 5 equally spaced values between 0 and 10, inclusive. Unlike
arange, the start and end values will be exactly correct (exactly 0 and 10) because they are specified rather than being calculated. You will also get exactly the required number of elements in the array.
endpoint parameter for linspace
endpoint can be set to
False alter the behaviour of
linspace (to make it a bit more like
k5 = np.linspace(0, 10, 5, endpoint=False) print(k5)
In this case,
linspace creates 6 equally spaced values, but doesn't return the final value (so the result still has 5 elements). Here is the result:
[0. 2. 4. 6. 8.]
As you can see, the range is now divided into intervals of 2.0 (rather than 2.5), but the final element is 8.0 rather than 10.0
retstep parameter for linspace
retstep can be set to
True to obtain the step size used by
linspace. The sample array and the step are returned as a tuple:
k5, step = np.linspace(0, 10, 5, retstep=True) print(k5) print(step)
[ 0. 2.5 5. 7.5 10. ] # samples 2.5 # step size
If you need a non-standard data series, it will usually be most efficient to use vectorisation if possible.
For example, to create a series containing the cubes of each number: 0, 8, 27, 64... you could do this:
cubes = np.arange(10)**3 print(cubes)
This will normally be a lot quicker than using a Python loop.