Using categorical data with Matplotlib


Martin McBride, 2022-06-13
Tags bar chart pie chart scatter plot
Categories matplotlib

Categorical data is data where at least one of the variables is a category rather than a numerical value. For example:

  • The average UK temperature for each month of the year 2009. Temperature is a numerical value, but the values are grouped and averaged by month - the temperatures measured in January are one category, the temperatures measured in February are another category, and so on.
  • The most popular names of given to girls born in the UK in 2021 is also categorical data. "Oliuia", "Amelia", "Isla", etc are categories, the number of girls with each name are numerical values.

These types of plot are useful is the case where one variable is categorical and the other is numerical:

Sometimes, both variables can be categorical. For example, if we looked at the single most popular girl's name each year for the last 50 years, then the names are categorical and the year are also categorical.

Years are numbers, but those number are effectively the names of the years. If we named our years differently, it would make no differnce to the meaning of the data.

Bar charts and pie charts are not very useful for displaying data where both variables are categorical. A scatter plot is better.

If you found this article useful, you might be interested in the course Introduction to Matplotlib by the same author.

Prev

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart polygon positional parameter print pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip