itertools module - general iterators
Categories: python standard library
The itertools
module provides several general iterators. In many cases, these iterators are variants of built-in functions such as map
and filter
. There are also some other generally useful iterators.
The iterators in this section are:
- Map-like iterators -
starmap
. - Filter-like iterators -
filterfalse
,dropwhile
,takewhile
,compress
, andislice
. - Zip-like iterators -
chain
,chain.from_iterable
, andzip_longest
. - Splitting iterators -
tee
andgroupby
. - Accumulating iterators -
accumulate
.
Since each of these functions returns an iterator, if you want to print the resulting data you will need to convert it to a list. For example:
i = starmap(...)
print(i) # <map object ...>
print(list(i)) # prints the data
Map-like iterators - starmap
The starmap
function is related to map
, but it always accepts its arguments in a different format.
Here is a simple use of map
:
from operator import add
a = [2, 4, 6]
b = [10, 20, 30]
i = map(add, a, b)
This creates a result:
[12, 24, 36]
But suppose we had the initial parameters in a "pre-zipped" format:
[(2, 10), (4, 20), (6, 30)]
We can use starmap
to process this data:
from itertools import starmap
from operator import add
z = [(2, 10), (4, 20), (6, 30)]
i = starmap(add, z)
starmap
is roughly equivalent to unzipping the valued and using map
:
i = map(add, *zip(*z))
Filter-like iterators
These operators provide useful variants of the built-in filter function.
To recap, filter
takes a predicate (a function that returns true or false) and applies it to every element in the iterable. The resulting iterator only includes the values for which the iterator is true. For example:
def is_negative(x):
return x < 0
a = [-3, -2, 0, 1, -5, 6]
i = filter(is_negative, a)
The predicate function returns true if the value is negative. So in this case, the output iterator would produce the values:
-3, -2, -5
filterfalse
filterfalse
is similar to filter
, except it only allows the values for which the predicate is false. Based on the code above:
from itertools import filterfalse
i = filterfalse(is_negative, a)
Produces only the values that are greater than or equal to zero:
0, 1, 6
takewhile
takewhile
takes values from the input iterable while ever the predicate is true, then it stops.
from itertools import takewhile
i = takewhile(is_negative, a)
In our case, it will take the first two values, because they are negative, then it will stop because the third value is 0 (ie not negative). It will ignore any items after the first non-negative even though some of them are also negative.
-3, -2
dropwhile
dropwhile
is the opposite of takewhile
. It ignores values from the input iterable while ever the predicate is true, and then everything after that.
from itertools import dropwhile
i = dropwhile(is_negative, a)
In our case, it will drop the first two values, because they are negative. it will return everything after that:
0, 1, -5, 6
compress
compress
filters an iterable based on a sequence of selectors. It accepts two iterators. The first, data
, contains the input data. The second, selectors
, contains a set of values that filter the data.
For each element in data
, that element will be included if the corresponding selectors
value is true, and excluded otherwise. For example:
from itertools import compress
data = [10, 20, 30, 40, 50 , 60]
selectors = [1, 0, 0, 1, 1, 0]
i = compress(data, selectors)
Since selectors
is only true at positions 0, 3 and 4, only those elements of data
will be included in the output iterator:
10, 40, 50
If data
and selectors
are different lengths, the length of the output sequence by whichever is shorter.
islice
islice
provides slicing for iterators. It works in a similar way to slicing a list, but of course, since it works on iterables the operation is lazy, that is it isn't applied until you read the iterable values.
There are several ways to call islice
. The two-argument form takes an iterable and a stop
value:
from itertools import islice
data = [10, 20, 30, 40, 50, 60]
i = islice(data, 4)
This creates an iterator that stops at 4, ie returns every item up to but not including index 4. This is equivalent to a slice [:4]
applied to a list. The result is:
10, 20, 30, 40
The three-argument form takes an iterable, a start
and a stop
value:
i = islice(data, 2, 5)
This is equivalent to a slice [2:5]
applied to a list. The result is:
30, 40, 50
Finally, the four argument form adds a step value:
The three-argument form takes an iterable, a start
and a stop
value:
i = islice(data, 1, 5, 2)
This is equivalent to a slice [1:5:2]
applied to a list. It takes values from position 1, up to but not including 5, in steps of 2. The result is:
20, 40
In either of the previous two examples, if the stop
value is set to None
the iteration continues to the end of the sequence:
i = islice(data, 1, None, 2) # equivalent to [1::2]
Zip-like iterators
These functions join two or more iterables, in various ways.
chain
chain
joins two or more iterables to act like a single iterable with all the values joined end to end. For example:
from itertools import chain
a = [1, 2, 3, 4]
b = [10, 20]
c = [100, 200, 300]
i = chain(a, b, c)
This gives an iterable with the following sequence of values:
1, 2, 3, 4, 10, 20, 100, 200, 300
chain.from_iterable
This is similar to chain
, except that it takes a single iterable:
from itertools import chain
m = [[1, 2, 3, 4],
[10, 20],
[100, 200, 300]]
i = chain.from_iterable(m)
i = chain(a, b, c)
In this case, we have used a list of lists, but it can accept any iterable of iterables. The main iterable m
is evaluated lazily, that is to say, chain
will not attempt to access the next iterable until the previous one is exhausted.
zip_longest
In a normal zip
operation, if you attempt to zip several iterables of different lengths, the sequence will stop when the shortest iterable is exhausted:
a = [1, 2, 3, 4]
b = [10, 20]
c = [10, 200, 300]
i = zip(a, b, c)
Since the shortest iterable, b
, has length 2, the output sequence also has length 2:
(1, 10, 10), (2, 20, 200)
With zip_longest
the sequence will stop when the longest iterable is exhausted:
from itertools import zip_longest
a = [1, 2, 3, 4]
b = [10, 20]
c = [10, 200, 300]
i = zip_longest(a, b, c, fillvalue=-1)
Since the longest iterable, a
, has length 4, the outout sequence also has length 4:
(1, 10, 10), (2, 20, 200), (3, -1, 300), (4, -1, -1)
fillvalue
is used to fill in any blanks. If a value is not supplied it defaults to None.
Splitting iterators
These iterators split a single input into several outputs.
tee
tee
effectively provides two or more iterators that can iterate over the input iterable independently. Here is an example:
from itertools import tee
r = range(4)
a, b = tee(r, 2)
for i in a:
print(i)
for i in b:
print(i)
Here, r
is a range object. You can only iterate over a range object once, and then it is spent.
The tee
function creates two new iterators, that can each iterate over the values in r
, independently. We illustrate this by looping over a
then looping over b
, The result is:
0
1
2
3
0
1
2
3
There are a couple of caveats to this function:
- After creating the tee, you should not access
r
from anywhere else. Thetee
function effectively takes ownership of the iterable, and things could get out if step if something else consumes items from the iterable at the same time. - The
tee
iterators are not threadsafe.
Of course, a similar effect can be obtained by creating a list from r
, then you can iterate over the list multiple times. The main advantage of tee
is lazy evaluation.
groupby
groupby
will split an iterable into several iterators, grouping the original elements according to some chosen characteristic.
Here is an example:
from itertools import groupby
items = ['apple', 'apricot', 'cherry', 'carrot',
'cranberry', 'banana', 'blueberry',
'avocado', 'almond']
grouped = groupby(items, lambda x: x[0])
for key, values in grouped:
print('{}: {}'.format(key, ', '.join(values)))
groupby
accepts two arguments:
- The iterable to be grouped.
- A key function that will be applied to each item in the input iterable, to calculate that key that will be used for grouping.
In the example, the input iterable contains strings. The key function is a lambda that calculates x[0]
, which is the first character of the string. This means that the elements in the original list will be grouped by their first letter.
groupby
returns an iterator. The elements of the iterator are all tuples of the form (key, values)
, where:
key
is the key (the first letter in the case of our example).values
is another iterable, that returns every name in the group.
There is a separate (key, values)
pair for each group.
We print each pair like this:
print('{}: {}'.format(key, ', '.join(values)))
values
is an iterator, but the join
function iterates over it and converts it to a set of real values that are printed. Here is what the code displays:
a: apple, apricot
c: cherry, carrot, cranberry
b: banana, blueberry
a: avocado, almond
The first key is the letter 'a', and the first group contains the two elements that begin with an 'a'. The next key is 'c', and the group contains the three elements that begin with 'c', and so on.
Notice that there are two groups with the letter 'a' as a key. That is because the function will only group similar elements that are adjacent to each other. That is by design, it is a grouping function, not a sorting function. There are two distinct groups of words that begun with 'a', and each gets its own group.
If you specifically want to group all the names that begin with the same letter into a single group, then you should first sort the group using the same key function as the one used in groupby
:
items.sort(key=lambda x: x[0])
The result is then:
a: apple, apricot, avocado, almond
b: banana, blueberry
c: cherry, carrot, cranberry
accumulate
At its simplest, accumulate
is a bit like sum, except that it provides a running total. However, accumulate
can be used in other ways, as we will see.
Here is the simplest case:
from itertools import accumulate
items = [5, 2, 6, 1, 9]
totals = accumulate(items)
This creates an iterator, totals
, the gives a running total of the sum of the items in the original iterable:
5, 7, 13, 14, 23
This is formed from 5, (5 + 2), (5 + 2 + 6), etc.
By default, accumulate
will add the values. However, you can supply a different function. For example, if you use the built-in max
function, the result will be a running maximum (ie the maximum value so far):
maxima = accumulate(items, func=max)
Giving:
5, 5, 6, 6, 9
It is also possible to create a series based on a recurrence relationship. To do this we can define a function of a
and b
that only uses the value of a
, for example:
items = [2]*8
def fn(a, b):
return a*2
totals = accumulate(items, fn)
In this case, each new value is equal to the previous value multiplied by 2, giving a sequence that is the powers of 2:
2, 4, 8, 16, 32, 64, 128, 256
Notice that the only value of items
that we actually use is the first element, which is the starting point of the sequence. After that, the values are ignored (but of course the length of items
determines the length of the output sequence). Since we only care about the first element of items
, the easiest way to create it is just to fill it with the initial value, 2.
See also
Join the PythonInformer Newsletter
Sign up using this form to receive an email when new content is added:
Popular tags
2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest