Arrays (data structures)

By Martin McBride, 2020-03-14
Tags: array bytearray bytes array numpy mutability abstract data type
Categories: data structures


Arrays are an abstract data type. Python doesn't make extensive use of arrays, lists are usually used instead. But there are a few special cases when arrays are used, which we will discuss at the end of the article.

What is an array?

An array is an abstract data type that holds a collection of values that can be referenced by an index.

The index is typically an integer that starts at 0, so for example in an array of 10 values, the values would be numbered 0 to 9. This is very similar to a Python list.

The main differences between an array an a list are:

  • An array usually has a fixed length, set when the array is created, and it isn't possible to grow or shrink the array afterwards.
  • An array usually stores primitive data types, with all elements in the array being the same type.

Primitive data types

A primitive data type is something like an ASCII character, a byte value, a 32-bit integer, a floating point value, etc, that is stored as pure binary byte data. Many languages, such as Java and C++, support primitive types as we all objects.

In Python all values are objects rather than primitive types (even integer values are object in Python). Python has very little built-in support for primitive types.

How arrays are stored

Here is an array with 5 integer elements:

[3, 1, 5, 0, 9]

Let's assume each integer is a 32 bit int (ie it occupies 4 bytes of memory). The integers would be stored contiguously (ie next to each other, with no gaps) in memory, like this:

The complete array occupies exactly 20 bytes of memory (4 bytes for each of the 5 integers). Provided we know where the array is located in memory (labelled Start in the diagram) and the Width of each element, we can locate any element easily. For example:

  • The first element, value 3, has index 0 and is located at Start.
  • The second element, value 1, has index 1 and is located at Start + Width.
  • The third element, value 5, has index 2 and is located at Start + 2*Width.

In general, the element with index n is located at Start + n*Width.

Advantages of arrays

Arrays are very simple and efficient data structures. They simply consist of a set of primitive data types packed into memory.

This means that data can be stored very efficiently because only the raw data is stored without the overhead associated with storing higher level objects. It also means that low level code (for example C library code) can access the data very efficiently. Typically to loop through an array, C code will use a memory pointer that increments each time through the loop, which is very fast. This makes it ideal for efficiently storing and accessing large blocks of data such as sound, images and video.

Also, many low level libraries use arrays to store data internally, so Python arrays can be passed in and out of the library.

Disadvantages of arrays

Arrays are homogeneous, ie you can't mix different types of data (for example you cannot have some elements that are 32 bit integers and some that are byte values). This isn't a problem for image data, for example, where all the data is stored as byte values. For non-homogeneous data, it is usually best to use a list.

Arrays have a fixed size, so once an array has been created it cannot be increased or decreased in size. (A dynamic array can change size by automatically creating a new array of a different size and copying the data to it. But that is basically how a list works.)

Arrays generally store primitive types. It is possible, in principle, to store Python objects in an array, but if you are using Python objects then you will lose many of the efficiency benefits of using an array, so it is easier just to use a list.

Python array implementations

Python provides several options for using arrays

  • bytearray and bytes types.
  • The built in array module.
  • The numpy library.

These are not described in great detail here, and their usage is quite specialised. Numpy is covered in detail elsewhere on this site, see the links below.

bytearray and bytes

A bytearray is an array of byte values (integers in the range 0 to 255). It can be initialised from string or byte data in various ways, and can also be used to exchange byte data with external libraries. bytedata has many similar methods to Python strings.

A bytearray is actually a dynamic array that can grow or shrink automatically. It is similar to a list in that respect, but the list elements are bytes, stored in an array of bytes. The underlying array can be exchanged with C routines that expect byte array data.

bytes is an immutable version of bytearray.

array module

The Python built-in array module can store and manipulate arrays of integer, float or character data. As with bytearray it is a dynamic array, but offers the flexibility of different data types and sizes. It can be thought of as a list, but where every element is the same primitive type, and the underlying data is stored in a C compatible array.

numpy

numpy is an external module that offers very comprehensive support for arrays. It supports multidimensional arrays, with many data types and formats. It allows vectorisation, which means that element wise operations on arrays are performed in C code for efficiency, slicing in multiple dimensions, and many other features.

numpy is supported by many other libraries, see our various numpy articles for more details.

See also

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest