Iterator/iterable protocol
Categories: object protocols
Python has two related types, iterators and iterables, that support sequences of values.
The most common use of these types is the for loop. A for loop works with any iterable, and will execute once for each element in the iterable:
k = [10, 20, 30]
for x in k:
print(x)
This code works because lists are iterables, and Python for loops know how to work with iterables.
Iterables and iterators
An iterable is a Python object that we can iterate over. It is often an object that contains data, for example, a list. When we iterate over a list, we just get the list items, one by one.
An iterator is an object that does the iterating. For example, with a list:
- The
list
object holds the data. - A
list_iterator
object does the iterating.
One way to think of this is that an iterable is like a book (it has a list of pages), an iterator is like a bookmark (it tells us where we are in the book).
If we have an iterable object, we can get an iterable by calling the built-in iter
function:
k = [10, 20, 30]
it = iter(k)
print(type(it)) # <class 'list_iterator'>
For a list, the iter
function returns a specific iterator, of type list_iterator
, that has been initialised specifically to iterate over the list k
.
The iter
function calls the magic method __iter__
on the object, as we will see later.
We can get values from the iterator using the built in next
function:
print(next(it)) # 10
print(next(it)) # 20
print(next(it)) # 30
print(next(it)) # Throws StopIteration
Each call to next
returns the next value from the list. This works for the first three calls, returning 10, 20 then 30. Since the list is only 3 elements long, the fourth call has no value to return. It throws a StopIteration
exception.
The next
function calls the magic method __next__
on the object, as we will see later.
An iterator can only be used once. When it runs out of values, it is no longer useful, there is no way to reset it to the beginning again. But of course, we can use iter
to get a new iterator if we want to loop over the values again.
The steps we just went through are exactly what a for loop does, under the hood:
- We supply the for loop with an iterable (such as a list).
- It gets an iterator from the iterable.
- It fetches values from the iterator, one by one.
- When the iterator throws a
StopIteration
, the loop terminates.
However, Python bypasses the iter
and next
functions, it just calls the __iter__
and __next__
methods on the objects instead.
Why do we have separate iterables and iterators?
You might be wondering why we have iterables and iterators. Wouldn't it be easier if iterables just had a next
method that you could use directly?
There is a very good reason for this. The iterator keeps track of where it is in the list of items. We could merge the iterator function into the iterable, and it would work in most cases. But what if we ever wanted to iterate over the same object twice, at the same time?
This is not quite as far fetched as it might sound. Here is an example:
k = [10, 20, 30]
for x in k:
for y in k:
print(x, y)
This code loop over k
, and each time through the loop loops through k
again. It prints every possible combination of pairs of values from k
:
10 10
10 20
10 30
20 10
20 20
20 30
30 10
30 20
30 30
The reason this code works is that each for loop creates its own iterator. Although both iterators are working on the same list, they each keep track of where they are, so everything works out. If the iterator was maintained by the list itself, both loops would be incrementing the same iterator, so things would go wrong.
To go back to the bookmark analogy, suppose two people were both sharing the same book (iterable) - maybe one reads it in the mornings, the other reads it in the evenings. They would each need a separate bookmark (iterator), because at any given time they might each be on a different page of the book. There is nothing wrong with them both reading the same book, but if they tried to share the same bookmark things would go very wrong.
Every iterator is an iterable
As we have seen, Python for loops expect an iterable, and they use that iterable to obtain an iterator from the iterable, and loops over it.
But what if, for some reason, you had an iterator that you wanted to loop over? Well, you might think that would fail, because Python is asking an iterator to give it an iterator, and only iterables can do that.
That would be rather silly, of course, because you already have an iterator!
To avoid this nonsense, Python has a rule that every iterator must be an iterable too. So you can ask an iterator to give you an iterator. In most cases, it will simply return itself.
Here is an example:
k = [10, 20, 30]
it = iter(k)
it2 = iter(it)
print(it is it2) # True
Here, it
is the iterator obtained from the list (an iterable) k
. it2
is the iterator obtained from the iterator it
. As the example shows, it2
is the same object as it
. iter(it)
just returns it
.
Built-in iterables and iterators
An iterable is something that Python can iterate over. In other words, anything that you can use in a for loop.
This includes lists, tuples, strings, sets, dictionaries, arrays.
It also includes range objects. The function range(10)
returns a range object that is an iterable and produces the sequence 0 to 9. That is how a basic for loop like this works:
for i in range(10): # Range returns a range object
print(i)
range
returns a range object. The for loop iterates over the range object.
The Python itertools
module provides a useful set of additional iterators.
Creating your own iterables and iterators
Python also provides generators. A generator looks a lot like a function, but instead of returning a value, it creates an iterator. The body of the generator defines the sequence of values that the iterator will provide. If you require a specific iterator that isn't provided by itertools
, you will usually be able to implement it using a generator. That will normally be the simplest option.
Here is an example of a generator that creates a geometric progression of length n
. A geometric progression is one where each term is equal to the previous term multiplied by some value a
. So if a
is 2 and n
is 6 the sequence would be:
2 4 8 16 32 64
Here is how we use a generator to create such an iterable:
def geometric(a, n):
current = 1
for i in range(n):
current *= a
yield(current)
for x in geometric(3, 5):
print(x)
The generator geometric
is similar to a function, but it has a yield statement to send back a value. Unlike a return statement of a function, the yield statement doesn't end the generator, it carries on creating more values until the loop ends.
Calling geoemetric
returns a generator
object, which is a type of iterator. When we loop over this iterator in a for loop, it will create a sequence of the first 5 powers of 3.
Alternatively, we can create your own iterable and iterator objects, as described below.
Creating your own iterators
To create an iterator, we should normally make a class that implements the __iter__
and __next__
functions:
__iter__
should return the object itself (see the discussion above).__next__
should return the next item from the sequence, or throw aStopIteration
exception when the sequence is ended.
Here is the geometric iterator implemented as a class:
class Geometric:
def __init__(self, a, n):
self.a = a
self.n = n
self.current = 1
def __iter__(self):
return self
def __next__(self):
if self.n <= 0:
raise StopIteration
self.current *= self.a
self.n -= 1
return self.current
for x in Geometric(3, 5):
print(x)
Here we have implemented the two required functions to create an iterator. The basic logic is identical to the generator example.
Notice that Geometric
is just a basic class. It doesn't inherit some special base class that makes it into an iterator. The class implements __iter__
and __next__
, so Python treats it as an iterator. This is an example of duck typing.
The previous generator implementation is considerably shorter than this version and easier to read and understand. You should normally use a generator where possible. You would only need to use an iterator class if you need it to do extra things a generator can't handle.
Creating your own iterable
Iterators are fine for generating sequences. However, if you want to iterate over a data structure you will normally need to implement an iterable and an iterator.
As we saw earlier, if you want to iterate over a data structure, you will usually need a separate iterator.
As an example, we will make a class that holds an IP address as a sequence of 4 integers:
class IPAddress:
def __init__(self, a, b, c, d):
self.address = [a, b, c, d]
def __iter__(self):
return IPAddressIterator(self)
This class simply holds the 4 parts of the IP address in a list address
. The __iter__
method returns an IPAddressIterator
, passing itself as a parameter. We also need to write the iterator class:
class IPAddressIterator:
def __init__(self, obj):
self.obj = obj
self.position = 0
def __iter__(self):
return self
def __next__(self):
if self.position >= 4:
raise StopIteration
value = self.obj.address[self.position]
self.position += 1
return value
This class stores the IPAddress
as obj
. Its __iter__
method returns itself.
The __next__
method uses position
to store the current position in the IPAddress
objects address. On the first call it will return address[0], then on the next call address[1], and so on. After 4 calls it will throw the StopIteration
exception. This is fairly similar to the Geometric
iterator, except that it is using an iterable as its source of values.
Using these two classes we can iterate over the values in an IP address:
for x in IPAddress(255, 0, 0, 1):
print(x)
An alternative to the iterator protocol
There is an older style iteration protocol that can also be used to make objects iterable. To use this protocol, the object must implement:
__len__
that returns the length of the sequence.__getitem(key)__
that returns the item identified bykey
, an integer in the range 0 tolen - 1
.
If an object is not iterable, and we use it in a for loop, Python will attempt to iterate the object by calling __getitem__
supplying key
values that count up from 0. Here is an example:
class IPAddress:
def __init__(self, a, b, c, d):
self.address = [a, b, c, d]
def __len__(self):
return 4
def __getitem__(self, i):
return self.address[i]
for x in IPAddress(255, 0, 0, 1):
print(x)
This implementation is quite a lot simpler. It doesn't require a separate iterator class because Python itself is keeping track of the current position in the object.
The only thing to bear in mind is that Python might make __getitem__
calls in any order. It won't necessarily present keys in the order 0, 1, 2, 3. So our code must be able to calculate the nth value directly, which makes it inefficient for values that can only be created sequentially.
Join the PythonInformer Newsletter
Sign up using this form to receive an email when new content is added:
Popular tags
2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest