Creating data series in numpy

By Martin McBride, 2026-01-10

Tags: arrays data types arange linspace vectorisation
Categories: numpy

In this article, we will look at how to create numpy arrays initialised with data series. There are three main ways to do this:

arange creates a data series based on a start value, an end value, and a step value.
linspace creates a data series based on a start value, an end value, and the required array length.
Vectorisation can be used to create a more complex series.

Why use a data series?

Typically, in Python, if we want to perform a repeated operation on a sequence of numbers, we might use a for loop, something like this:

r = []
for i in range(4):   # Loop over 0, 1, 2, 3
    r.append(i*2)

print(r)             # [0, 2, 4, 6]

This code loops over values of i from 0 to 3, and for each value of i it appends i*2 to the list r. This gives the result [0, 2, 4, 6].

When we use NumPy, we try to avoid using explicit loops. It is much more efficient to use vectorisation to process an entire NumPy array in one operation. So the equivalent code would be:

from numpy import arange

a = arange(4)    # a = [0 1 2 3]
r = a*2
print(r)            # [0 2 4 6]

Here, we have used arange instead of a loop to create the list of input values in a. When we multiply the NumPy array a by 2, this is a vectorised operation. The whole array is processed by optimised C code in the NumPy library. If our array contained millions of elements, that would be a lot faster than the Python loop.

arange

As we have seen, arange works similarly to the built-in range function, except that it creates a numpy array. The other difference is that it can work with floating-point values:

r1 = arange(4.9)
print(r1)                     # [0. 1. 2. 3. 4.]
r2 = arange(.5, 4.9)
print(r2)                     # [0.5 1.5 2.5 3.5 4.5]
r3 = arange(.5, 4.9, 1.3)
print(r3)                     # [0.5 1.8 3.1 4.4]

r1 uses the default start and step values. It counts from 0.0 up to but not including 4.9, in steps of 1.0:

[0. 1. 2. 3. 4.]

r2 uses the default step value. It counts from 0.5 up to but not including 4.9, in steps of 1.0:

[0.5 1.5 2.5 3.5 4.5]

r3 counts from 0.5 up to but not including 4.9, in steps of 1.3:

[0.5 1.8 3.1 4.4]

Setting the type

You can set the data type of the array using the dtype parameter of arange:

r = arange(5, dtype='np.int8')
print(r)

This creates an array of 8-bit integers:

[0 1 2 3 4]

All the functions described in this section support dtype. The types available are described in data types.

arange and rounding errors

There is a potential problem with arange when using floating point values. Consider this:

r = arange(10.0, 10.4, 0.05)
print(r)

We would expect this to create an array containing values from 10.0, up to but not including 10.4, in steps of 0.05. We would expect this:

[10. 10.05 10.1 10.15 10.2 10.25 10.3 10.35]

But, in fact, we get this:

[10. 10.05 10.1 10.15 10.2 10.25 10.3 10.35 10.4]

This array includes all the expected values, but also an extra value of 10.4.

What is going on here? Well, if we start with 10.0, and then add 0.5 to it 8 times, we would expect to get 10.4. But when computers add floating-point numbers, there are sometimes very small errors due to the way floating-point numbers are represented. When we take 10.0 and add 0.5 8 times, we don't get exactly 10.4. Instead, we get something like 10.3999999999999999.

So the arange function decides that it hasn't quite reached 10.4 yet, and therefore adds the extra value to the list.

To make things even more confusing, when Python prints the number 10.3999999999999999, the print function rounds it up to 10.4, hiding the original problem.

That means that in some cases, the length of the array could change depending on tiny rounding errors. One way around this is to make sure we specify a final value that is not close to a valid value. For example:

r = np.arange(10.0, 10.39, 0.05)
print(r)

The previous endpoint of 10.4 was unreliable because 10.4 is a value in the series. But if we use 10.39 instead, that is safe. The value 10.4 will always test as greater than 3.9, even with rounding errors, because rounding errors are very small.

Often, a better solution is to use linspace, which we will look at next.

linspace

linspace creates a series of equally spaced numbers, in a similar way to arange. The difference is that linspace specifies the start and end points, plus the required number of steps:

from numpy import linspace

k = linspace(0, 10, 5)
print(k)

This prints:

[ 0.   2.5  5.   7.5 10. ]

That is, 5 equally spaced values between 0 and 10, inclusive. Unlike arange, the start and end values will be exactly correct (exactly 0 and 10) because they are specified rather than being calculated. You will also get exactly the required number of elements in the array.

endpoint parameter for linspace

endpoint can be set to False to alter the behaviour of linspace (to make it a bit more like arange):

k = linspace(0, 10, 4, endpoint=False)
print(k)

In this case, linspace creates 5 equally spaced values, just like the previous example. However, it doesn't return the final value because endpoint is false. So only the first 4 values are returned.

[0.  2.5 5.  7.5]

Notice that the third parameter has a value of 4, rather than the value 5 which was used before. This can be a little confusing. What this parameter is requesting is that the function should return 4 values, but those values should not include the endpoint.

retstep parameter for linspace

retstep can be set to True to obtain the step size used by linspace. The sample array and the step are returned as a tuple:

k, step = linspace(0, 10, 5, retstep=True)
print(k)
print(step)

This prints:

[ 0.   2.5  5.   7.5 10. ]  # samples
2.5                         # step size

Using vectorisation

If you need a non-standard data series, it will usually be most efficient to use vectorisation if possible.

For example, to create a series containing the cubes of each number: 0, 8, 27, 64... you could do this:

cubes = arange(10)**3
print(cubes)

This will normally be a lot quicker than using a Python loop. In this case, the result is:

[  0   1   8  27  64 125 216 343 512 729]