Arrays (data structures)

By Martin McBride, 2020-03-14

Tags: array bytearray bytes array numpy mutability abstract data type
Categories: data structures

Arrays are an abstract data type. Python doesn't make extensive use of arrays, lists are usually used instead. But there are a few special cases when arrays are used, which we will discuss at the end of the article.

What is an array?

An array is an abstract data type that holds a collection of values that can be referenced by an index.

The index is typically an integer that starts at 0, so for example in an array of 10 values, the values would be numbered 0 to 9. This is very similar to a Python list.

The main differences between an array an a list are:

An array usually has a fixed length, set when the array is created, and it isn't possible to grow or shrink the array afterwards.
An array usually stores primitive data types, with all elements in the array being the same type.

Primitive data types

A primitive data type is something like an ASCII character, a byte value, a 32-bit integer, a floating point value, etc, that is stored as pure binary byte data. Many languages, such as Java and C++, support primitive types as we all objects.

In Python all values are objects rather than primitive types (even integer values are object in Python). Python has very little built-in support for primitive types.

How arrays are stored

Here is an array with 5 integer elements:

[3, 1, 5, 0, 9]

Let's assume each integer is a 32 bit int (ie it occupies 4 bytes of memory). The integers would be stored contiguously (ie next to each other, with no gaps) in memory, like this:

The complete array occupies exactly 20 bytes of memory (4 bytes for each of the 5 integers). Provided we know where the array is located in memory (labelled Start in the diagram) and the Width of each element, we can locate any element easily. For example:

The first element, value 3, has index 0 and is located at Start.
The second element, value 1, has index 1 and is located at Start + Width.
The third element, value 5, has index 2 and is located at Start + 2*Width.

In general, the element with index n is located at Start + n*Width.

Advantages of arrays

Arrays are very simple and efficient data structures. They simply consist of a set of primitive data types packed into memory.

This means that data can be stored very efficiently because only the raw data is stored without the overhead associated with storing higher level objects. It also means that low level code (for example C library code) can access the data very efficiently. Typically to loop through an array, C code will use a memory pointer that increments each time through the loop, which is very fast. This makes it ideal for efficiently storing and accessing large blocks of data such as sound, images and video.

Also, many low level libraries use arrays to store data internally, so Python arrays can be passed in and out of the library.

Disadvantages of arrays

Arrays are homogeneous, ie you can't mix different types of data (for example you cannot have some elements that are 32 bit integers and some that are byte values). This isn't a problem for image data, for example, where all the data is stored as byte values. For non-homogeneous data, it is usually best to use a list.

Arrays have a fixed size, so once an array has been created it cannot be increased or decreased in size. (A dynamic array can change size by automatically creating a new array of a different size and copying the data to it. But that is basically how a list works.)

Arrays generally store primitive types. It is possible, in principle, to store Python objects in an array, but if you are using Python objects then you will lose many of the efficiency benefits of using an array, so it is easier just to use a list.

Python array implementations

Python provides several options for using arrays

bytearray and bytes types.
The built in array module.
The numpy library.

These are not described in great detail here, and their usage is quite specialised. Numpy is covered in detail elsewhere on this site, see the links below.

bytearray and bytes

A bytearray is an array of byte values (integers in the range 0 to 255). It can be initialised from string or byte data in various ways, and can also be used to exchange byte data with external libraries. bytedata has many similar methods to Python strings.

A bytearray is actually a dynamic array that can grow or shrink automatically. It is similar to a list in that respect, but the list elements are bytes, stored in an array of bytes. The underlying array can be exchanged with C routines that expect byte array data.

bytes is an immutable version of bytearray.

array module

The Python built-in array module can store and manipulate arrays of integer, float or character data. As with bytearray it is a dynamic array, but offers the flexibility of different data types and sizes. It can be thought of as a list, but where every element is the same primitive type, and the underlying data is stored in a C compatible array.

numpy

numpy is an external module that offers very comprehensive support for arrays. It supports multidimensional arrays, with many data types and formats. It allows vectorisation, which means that element wise operations on arrays are performed in C code for efficiency, slicing in multiple dimensions, and many other features.

numpy is supported by many other libraries, see our various numpy articles for more details.