Matplotlib data and code

By Martin McBride, 2022-06-17
Tags: data code numeric python matplotlib
Categories: matplotlib numpy


In this article, we will discuss the data we will be using in this series, and the method we will use to read the data into a Python list.

All the data and code is available on here on github.

Data sets

Matplotlib can be used with many types of data, but for these articles, we will be using UK temperature and rainfall data for the years 2009 and 2010. The data sets are derived from public sector information licensed under the Open Government Licence v3.0. The data has been organised to make it easy to use in the examples.

The data is stored in comma-separated value (CSV) format. This is a text format containing lines of numerical values separated by commas.

The years 2009 and 2010 are not leap years. This is a deliberate choice to simplify the data handling, so we can concentrate on the graph plotting code.

Temperature data

The temperature data is based on the maximum temperature each day, in degrees Centigrade. The following files are used:

File Type
2009-temp-daily.csv Daily temperatures (1)
2009-temp-monthly.csv Monthly average temperatures (2)
2009-temp-monthly-list.csv Daily temperatures, one line per month (3)
2009-temp-daily.csv Daily temperatures (1)
2009-temp-monthly.csv Monthly average temperatures (2)

Type (1) files contain 365 entries, indicating the maximum temperature of each day of the year like this:

1.2
3.8
2.4
etc...

Type (2) files contain 12 entries, indicating the average of the maximum daily temperature of each month of the year, like this:

5.980645161290322
6.732142857142856
11.022580645161291
etc...

Type (3) files contain 365 entries, indicating the maximum temperature of each day of the year, similar to type (1). But all the data for a given month is contained on a single line. So there are 12 lines, each with a month's worth of daily data, like this:

1.2, 3.8, 2.4, 1.7, ...
2.7, 0.6, 2.6, ...
10.1, 10.3, 9.3, ...
etc...

The first line has 31 entries (for January), the second line has 28 entries (for February), and so on. This data is only provided for 2009 because we only use it for box plots.

Rainfall data

The rainfall data is based on the total rainfall each day, in millimetres. The following files are used:

File Type
2009-rain-daily.csv Daily rainfall (1)
2009-rain-monthly.csv Monthly average rainfall (2)
2009-rain-daily.csv Daily rainfall (1)
2009-rain-monthly.csv Monthly average rainfall (2)

Type (1) files contain 365 entries, indicating the total rainfall of each day of the year, like this:

1.2
3.8
2.4
etc...

Type (2) files contain 12 entries, indicating the total rainfall of each month of the year, like this:

5.980645161290322
6.732142857142856
11.022580645161291
etc...

Reading the data in Python

Here is the code to read the daily temperature data from a CSV file:

import csv

with open("2009-temp-daily.csv") as csv_file:
    csv_reader = csv.reader(csv_file, quoting=csv.QUOTE_NONNUMERIC)
    temperature = [x[0] for x in csv_reader]

First, we must import the csv module that we will use to parse the CSV file. The csv module is a Python built-in module, so we don't need to install anything extra to use it.

Next, we open the 2009 daily temperature CSV file. We open it using a with statement. This means that the file will be closed automatically when we have finished with it. We name the opened file object csv_file.

We then create a CSV reader object based on the CSV file. The reader function takes an optional parameter called quoting. We set this parameter to QUOTE_NONNUMERIC. This may be slightly counter-intuitive, but it tells the CSV reader to convert all values to numbers unless they are in quote marks.

Since our data file contains unquoted data, the reader will convert all the values to numbers. This means we will get a list of numbers rather than a list of strings, which is exactly what we want.

The reader returns a list for each line in the CSV file, in other words, it returns a list of lists. So if our file contained the following data:

1.2
3.8
2.4
1.7

The reader would return a list of lists like this:

[ [1.2], [3.8], [2.4], [1.7] ]

What we need is a normal list like this:

[ 1.2, 3.8, 2.4, 1.7 ]

We create the required list using a list comprehension:

temperature = [x[0] for x in csv_reader]

For each sublist in the original data, the list comprehension reads the first element and adds it to the output list. If you are not familiar with list comprehensions, there is a quick description next.

List comprehensions

The code above will read data from a CSV file into a Python list. The way the code works isn't particularly important, because we are mainly here to learn about plotting graphs with Matplotlib.

But if you haven't used list comprehensions before, and would like to understand how they work, here is a short description.

A list comprehension creates a new list based on the contents of an existing sequence. The original sequence can be a list, tuple, string, range function, iterable, or any other object that provides a sequence of values.

So for example, suppose we had a list a:

a = [1, 2, 3, 4]

And we wished to create a new list a where each element is double the equivalent element in a:

b = [2, 4, 6, 8]

We could do this with a loop:

b = []
for x in a:
    b.append(x*2)

A list comprehension is just a shorter way to do the same thing:

b = [2*x for x in a]

It takes the form:

[expression for x in sequence]

And creates a new list by evaluating expression for every value of x in sequence.

In our specific case:

[x[0] for x in csv_reader]

csv_reader is a sequence of lists:

[ [1.2], [3.8], [2.4], [1.7] ]

The values of x will be:

[1.2]
[3.8]
[2.4]
[1.7]

The values of x[0] will be:

1.2
3.8
2.4
1.7

Which will create a final list of:

[ 1.2, 3.8, 2.4, 1.7 ]
If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart polygon positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest