Pure functions

By Martin McBride, 2019-09-11
Tags: pure function lru_cache functools
Categories: functional programming


Functional programming has the concept of pure functions. What are they, and why are they so useful?

Mathematical functions

Functional programming is based on the mathematical idea of functions. A mathematical function such as sin(x) simply returns a value. You give it an x value, it returns the value of the sine of x. If you give it the same x value you will always get the same answer.

Python functions are different to mathematical functions, because a Python function doesn't just calculate values, it can actually do things too. A python function can set a global variable that might influence the result of a different function when that is called. It might write something to disk, or send some data across the network. These influences are called side effects.

Side effects make programming, and debugging, much more difficult. You don't just have to think about what a particular function is doing, you also have to consider what other functions might have done previously that can affect things. Pure functions attempt to eliminate side effects.

Pure functions

The basic definition of a pure function is a function that doesn't cause or rely on side effects. The output of a pure function should only depend on its inputs.

There are two basic ways a function can cause side effects that directly affect other parts of the code. The first is by reading or writing global variables. For example:

gvalue = 0

def set_value(x):
    global gvalue;
    gvalue = x

def print_value():
    print(gvalue)

set_value(3)
print_value()
set_value(5)
print_value()

Here, set_value isn't a pure function, because it set the value gvalue, which in turn affects how print_value behaves. print_value isn't a pure function either, because its output depends on the global variable. You can possible predict what print_value is going to print, without knowing how set_value was called before.

The other way that function can create side effects is by altering data structures. For example:

def tail(s):
    del s[0]
    return s

def print_value():
    a = [1, 2, 3]
    b = tail(a)
    print(b)
    print(a)

print_value()

Here the function tail accepts a list as input. It returns a list that contains all the elements except the first (head) element.

The function print_value calls tail, passing in the value a of [1, 2, 3]. The return value b is as we expect, [2, 3], the tail of the list.

However, when we print a after the call to tail, we see that it now contains [2, 3] as well. Calling tail has altered the list we passed into it, which is also a side effect. If print_value was expecting a to remain unchanged, it might not work properly.

A pure function must not alter the value of any data structure that is passed into it. This version of tail is not pure. We could create a pure version like this:

def tail(s):
    return s[1:]

def print_value():
    a = [1, 2, 3]
    b = tail(a)
    print(b)
    print(a)

print_value()

This time, tail returns the slice s[1:] which contains a copy of the tail of s. The original list is not changed.

Other considerations

Functional that read or write data to disk can also cause unwanted side effects. For example if function_a writes data to a configuration file, and function_b reads that data, then neither function can be considered to be a pure function, because the interact (at least indirectly). Similar things can happen with functions that interact with a database, or exchange data over a network, where one function can indirectly influence another.

Of course, writing to a file doesn't automatically mean a function cannot be pure. For example, if a function simply writes to a log file, that might not cause any side effects provided no other parts of the program take actions based on what is written to the file.

A second aspect of pure functions that we mentioned earlier is that the output of a pure function should depend only on its inputs. Put another way, if you call it twice with the same inputs, you should always get the same result.

This is generally true of maths functions like sin or sqrt. There are some cases where it might not be true:

  • Functions in the random module generate random values. Every time you call a random function you will get a different result. That is the whole point, of course, but it means the functions in this module are not strictly pure.
  • The input function, which queries the user for an input value on the command line, returns a completely unpredictable result (whatever the user decided to type in), so is not a pure function.
  • Any function that reads data from a file, database or network is also unpredictable and so not pure.

Advantages of pure functions

The main advantage of pure functions is predictability. Pure functions eliminate unexpected interactions that are the cause of so many bugs.

An additional benefit is that can make far easier to to use multithreading with your program. Imagine that you needed to run the same function on a very large number of data items. With pure code, you know that each time you call the function it will operate completely independently. So it doesn't matter what order you process the data in, you will still get the same result. You can split the data between different threads or even different computers, in parallel, without any danger of things getting out of step.

Finally, if you have pure functions where the output depends only on the input values, you can avoid having to calculate the same value more than once. For example if we needed to calculate the square root of these numbers:

[9, 16, 9, 25] 

We could keep a record (a cache) of all the values we have already calculated, and the result. When we hit the second occurrence of 9 in the list, we could avoid calculating the value again, and simply return the previous value 3. If you are performing complex calculations on a set of data that has a lot of repeated values, this type of caching can be a major optimisation.

The functools module contains a decorator lru_cache that can apply this sort of caching more or less automatically.

See also

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest