As an example of functional programming, in this article we will use some Python built-in functions to analyse a text file. In the first example we will count the number of words in the file. In the second example we will extract the unique words.
The basic pattern we will be using is map/reduce. In this pattern the map
function is used to pre-process the input data. The reduce
function is
then used to combine the data to give the final result.
As an aside, Python has function equivalents of every operator. These functions live in the operator
module. For example, the operator.add
function is equivalent to the +
operator. Therefore the following two lines of code do the same thing:
import operator c = operator.add(a, b) c = a + b
This is very useful because you can use add
wherever you need a function object that perform the equivalent of +
. There are functions
for every operator, including non-symbolic operators like in
.
The simplest form of map
takes a function object and an iterable. It applies the function to each item in the iterable, and returns
an iterator that accesses the results. For example:
from operator import neg it = map(neg, [1, -3, 5, -7]) print(list(it))
Here, we apply the neg
function (again from the operator
module) to the list [1, -3, 5, -7]
. neg
is a function that does the same job as the unary minus operator. This results in a iterator it
.
Remember that iterators are lazy - they don't perform any calculations until they need to. If we try to print the iterator we would just see the iterator object (a map
object), because the actual work of the map operation hasn't been done yet. When we convert the iterator to to a list, the iterable is processed to obtain its elements:
[-1, 3, -1, 7]
map
can take more than one iterable, provided the function object takes the same number of parameters. For example, mul
(the function equivalent of *
) takes two parameters, so map
accepts two iterables:
from operator import mul it = map(mul, [1, 2, 3, 4], [10, 100, 1000]) print(list(it))
This takes a number from each list, and multiplies them together: 1*10, 2*100, 3*1000. If the iterables have different lengths (as in this example) it stops when the shortest sequence is exhausted. So it prints:
[10, 200, 3000]
The reduce
function takes a sequence of values and reduces it to a single value by applying a function repeatedly, and accumulating the result.
Here is a simple example to illustrate how it works:
from functools import reduce from operator import add x = reduce(add, [1, 2, 3, 4]) print(x)
What reduce does is take the first two elements and combine them using the function (add
in this case). It then takes the result and
combines it with the next element, and so on. It calculates
(((1+2)+3)+4)
In other words, it sums all the values. In fact, there is a built-in function sum
that does the same job - you should use that normally as
it is simpler and more efficient. This is just an illustration.
As a practical example, here is a list of strings (an excerpt from the Zen of Python):
zen = ['beautiful is better than ugly', 'explicit is better than implicit', 'simple is better than complex', 'complex is better than complicated', 'flat is better than nested', 'sparse is better than dense', 'readability counts']
Now imagine we want to count the total number of words in all these strings. We can use map/reduce:
map
to convert the list of strings into a sequence of number-of-words values for each stringreduce
to convert the list of lengths to a single total number-of-wordsHere is how this looks in code
from functools import reduce from operator import add def number_of_words(s): words = s.split(' ') return len(words) counts = map(number_of_words, zen) total = reduce(add, counts)
We could get rid of the number_of_words
function by using a lambda:
lambda x: len(x.split(' '))
You can also combine the map
and reduce
into a single line. This is our code for finding the total number of words in a
list of sentences:
total = reduce(add, map(lambda x: len(x.split(' ')), zen))
If you think this is a bit too terse, you could leave it as separate lines - that is a matter of style. But if you are used to functional Python this code is quite readable.
Now suppose we want to count the number of unique words in the list of sentences. We could do it like this:
map
to convert each sentence into a set of words (a set will only contain one copy of each word, even if it appears more than
once in the sentence).reduce
to combine the list of sets into a single set.This example is a little different because, rather than obtaining a single integer, we are obtaining a set of values. We are using reduce
to take a groups of sets of words, and reduce it down to a single set of unique words. But that is a perfectly valid use of reduce
.
Here is the map
code:
def set_of_words(s): words = s.split(' ') return set(words) sets = map(set_of_words, zen)
Once again, we use split
to convert a string to a list of words. But this time we use set
to convert the list into a set. This creates
a list of sets (one set for each sentence).
Next we want to find the union of all these sets:
(((s1 | s2) | s3) | s4) etc...
Just as we can use the add
function instead of the + operator, we can use the or_
function instead of the | operator (don't forget
to import or_
from operator):
all_words = reduce(or_, sets) total = len(all_words)
Once again we can use a lambda and code the whole thing as one line:
total = len(reduce(or_, map(lambda x: set(x.split(' ')), zen))
In this case, of course we could have simply concatenated all our input strings together, and removed the need to bother with map/reduce. However, imagine that instead of processing a few sentences, we had a massive database with millions of sentences.
The first advantage of map
is that it uses lazy iteration, so we don't have to load all that data into memory. We only ever need to store one sentence in memory at a time.
The second advantage is that every map
operation is independent. That is the beauty of functional programming, we can map the data in any order because there are no side effects. If we wanted, we could split our database into parts, and run map
on several different computers, each processing its own section of the database. A final computer could run the reduce task, speeding the whole process up by many times.
The map/reduce pattern is often used in big data applications because it allows for large scale parallelisation of data processing. But even on smaller data sets, it is an interesting way of breaking problems down into functional units.
If you found this article useful, you might be interested in the book Functional Programming in Python, or other books, by the same author.
<<PrevCopyright (c) Axlesoft Ltd 2020