Martin McBride, 2021-09-21
Tags arrays efficiency
When we talk about the efficiency of NumPy arrays, what exactly do we mean? There are two separate issues:
- Efficiency of memory usage.
- Processing efficiency.
NumPy is written in C, which is quite a low-level language. Values are stored directly in memory. If you want to store an 8-bit integer using C, it takes exactly one byte.
In Python, numbers are stored as objects. Every Python object has to store information about its type, and also a reference count (to tell Python when the object is no longer being used, so it can be automatically deleted), in addition to the actual value of the object. The exact amount of memory used can depend on the type of computer you have (whether it is a 32-bit or 64-bit system) and which Python implementation you are using. On my 64 bit computer, it takes 28 bytes to store an 8-bit integer.
Now consider an array of 1,000 8-bit numbers. In C, the elements of an array are stored one after the other in a single block of memory. So an array of 1000 8-bit numbers takes exactly 1,000 bytes.
In Python, a list is an object, and each of its elements (the numbers) is another separate object. The list contains an array of references, which point to the element objects. On a 64-bit computer, each reference is 8 bytes long. With various other information the list has to store, the list object itself is a little over 8,000 bytes.
But don't forget, that is just the list! Each element of the list (each number) requires 28 bytes. So 1,000 elements take over 36,000 bytes of storage.
If you have a 2-dimensional array of 1000 by 1000 elements, again C stores these elements one after the other in a single block of memory. It stores the 1000 bytes of the first row, followed immediately by the 1000 bytes of the second row, etc. So the whole arrays takes exactly 1,000,000 bytes (1,000 x 1,000).
In Python, a 2D list is implemented as a list of lists. Each row is a separate list of 1000 elements (taking about 36,000 bytes). There are 1,000 of these lists, one for each row. There is also the main list, the list of all the rows - but with so many lists already, one extra doesn't make much difference to the total amount of space used. the total space used would be a little over 36,000,000 bytes.
Example of memory usage
A high-quality digital image of about 10 million pixels would require about 30 MBytes to store in a NumPy array but would take about a gigabyte to store as a standard Python list.
If you were writing an image editor, you might need to store several versions of the image in memory (for example, if you wanted to be able to undo the most recent changes), which would take several gigabytes. If you wanted to edit several images at the same time, you would need tens of gigabytes, which starts to get unmanageable. Using NumPy arrays requires a fraction of the memory.
Suppose we wanted to take an existing NumPy array
a, and use it to create a new NumPy array
b, where each element of
b is one greater
than the corresponding element of
a. That is, if
[1, 3, 8, 0]
we would create
b with elements:
[2, 4, 9, 1]
The code to do this is (assuming
a contains a NumPy array):
b = a + 1
The whole process of looping over the values in
a, adding 1 to each value, and storing the result in
b, is executed in highly efficient
code in the NumPy library. Here is a diagram of what happens:
b are stored contiguously in separate areas of memory. A pointer
p1 points to the current element in
p2 points to the current element in
b. Here is what the main loop does (assuming
n is the length of
- Read the value from memory location
- Add 1 to the value
- Store the new value in memory location
Each of these operations is very simple and might be performed by a single CPU instruction. The whole loop might take just a handful of instructions, and run very quickly indeed.
Now let's do the same thing with normal Python lists. The code would look like this:
b =  for x in a: b.append(x+1)
or if you prefer to use list comprehensions:
b = [x + 1 for x in a]
Here is what is python needs to do to execute this loop:
- Create an iterator to loop over
- Get the next value of
a, stored in
x + 1and create a new number object
- Append the value to
This code isn't horribly inefficient, it will still run quite quickly. Python itself may optimise some of these steps. But it will be a lot slower than the highly optimised code in the NumPy library, for several reasons:
- It uses an iterator to get the values from
a, rather than a simple pointer.
- Values in the list are Python int objects, rather than simple byte values.
- Numbers in Python are immutable, so to add 1 to
xa new number object must be created.
- The result has to be appended to list
brather than just written to a memory location.
This might take a few dozen instructions, rather than a handful that NumPy takes, and might run 10 times slower (of course, a lot depends on exactly what you are doing, and what type of processor you are running on, and how much optimisation Python can perform).