Martin McBride, 2019-09-14

Tags data types, efficiency

Categories numpy

In section Python libraries

numpy supports five main data types - ints, unsigned ints, floats, complex numbers, and booleans.

Integers in Python can represent positive or negative numbers of any size. That is because Python integers are objects, and the implementation automatically grabs more memory if necessary to store very large values.

Integers in numpy are very different. An integer occupies a fixed number of bytes. For example, the type `np.int32`

occupies exactly 4 byte of memory (A byte contains 8 bits, so 4 bytes is 32 bits, hence `int32`

). These are called *primitive types* because they aren't object, they are just data bytes stored directly in memory.

The reasons for using primitive types is explained in detail in the article on numpy efficiency. In summary:

- An arrays of primitive types takes a lot less memory than a list of Python integer objects.
- Accessing primitive values is faster.
- Primitive types don't require garbage collection.

In fact, numpy provides several different integer sizes:

Yype | Bytes | Range |
---|---|---|

np.int8 | 1 | -128 to 127 |

np.int16 | 2 | -32768 to 32767 |

np.int32 | 4 | -2147483648 to 2147483647 |

np.int64 | 8 | -9223372036854775808 to 9223372036854775807 |

There are a couple of reasons for this. The first is fairly obvious, if you are using data that has a limited range there is no point using more memory than you need. For example, sound data is often stored using 16 bits per sample (ie the sound is represented by an array of 16 bit values). Storing this data as 64 bit integers would make no sense, you would be using 4 times a much memory for no reason.

The second reason is slightly less obvious. Some applications use a mix of Python and C code for efficiency. With numpy, it is possible to pass a pointer to the array data into a C function, so that the C code can access the data in memory without the need to make a copy of it. This can improve efficiency when dealing with very large arrays. For this to work, the data needs to be stored in the format the C code is expecting. So if the C code is expecting an array of 16 bit integers, it is useful to be able to specify that in numpy. We won't be covering that in these tutorials, it is quite specialised.

Unsigned integers are similar to normal integers, but they can only hold non-zero values. Here are the available types:

Type | Bytes | Range |
---|---|---|

np.uint8 | 1 | 0 to 255 |

np.uint16 | 2 | 0 to 65535 |

np.uint32 | 4 | 0 to 4294967295 |

np.uint64 | 8 | 0 to 18446744073709551615 |

Unsigned integers are useful for data that can never be negative, for example population data. The population of a town can never be less than zero.

The advantage of unsigned data is that it can represent larger positive numbers than signed data. An `int8`

goes up to 127, but a `uint8`

goes up to 255.

numpy floating point numbers also have different sizes (usually called precisions). There are two types:

Type | Bytes | Range | Precision |
---|---|---|---|

np.float32 | 4 | ±1.18×10^{−38} to ±3.4×10^{38} |
7 to 8 decimal digits |

np.float64 | 8 | ±2.23×10^{−308} to ±1.80×10^{308} |
15 to 16 decimal digits |

`float64`

numbers store floating point numbers in the same way as a Python `float`

value. They are sometimes called *double precision*.

`float32`

numbers take half as much storage as `float64`

, but they have considerably smaller range and . They are sometimes called *single precision*.

A complex number consist of two floating point numbers, on representing the real part and one representing the imaginary part. If you have not met complex numbers before, here is a wikipedia article.

Type | Bytes | Precision |
---|---|---|

np.complex64 | 8 | Two 32-bit floats |

np.complex128 | 16 | Two 64-bit floats |

`complex128`

is equivalent to the Python `complex`

type.

numpy supports boolean values `np.bool`

. A `bool`

is one byte in size, with 0 representing false, and any non-zero value representing true.

All of the functions available for created numpy arrays have an optional parameter `dtype`

that allows you to specify the data type (such as `np.uint8`

or `np.float64`

etc). For example:

a = np.zeros((2, 3), dtype=np.int32)

Creates an array that is 2 rows by 3 columns of zeros with data type int32:

[[0 0 0] [0 0 0]]

numpy also provides a numpy of types that don't specify a particular size. These include `np.byte`

, `np.short`

, `np.int`

, `np.long`

, amongst others. There are also unsigned versions `np.ubyte`

, `np.ushort`

etc.

These types have *system dependent* sizes. For example `np.int`

might be equivalent to `np.int32`

or `np.int64`

depending on the system it is running on. It depends on the type of processor, the type of operating system, and perhaps the version of the operating system.

In general, don't use these types. They are provided for situations where numpy is passing data in memory to a library written in C. For historical reasons, C has always had system dependent types like `int`

and `short`

whose exact size can vary between systems. If you were interfacing to such a library you would need to use compatible types. Unless you are using any libraries that specifically tell you to use these types, don't use them. Stick to the fixed-size types shown above instead.

Some functions (such as `zeros`

used above) allow you to select an `order`

for the data. The choices are C-style or Fortran-style ordering (sometimes a couple of other variants too). Again, these options are intended for use if you are passing data in memory to a library written in C (or even Fortran). Unless you have good reason to change it, just use the default option.

Visit the PythonInformer Discussion Forum for numeric Python.

*If you found this article useful, you might be interested in the book NumPy Recipes, or other books, by the same author.*

Copyright (c) Axlesoft Ltd 2020