Generators are a very fascinating concept in python; generators have a wide range of applications from simple lazy evaluation to mind-blowing advanced concurrent execution of tasks (see David Beazley). Before we dive into the fascinating world of python generators, we take a little detour to explain python iterators, a concept that I feel is integral to grasping generators.
Python Iterators
Simply put, an iterator in python is any python type that can be used with a for
loop. Python lists, tuples, dicts and sets are all examples of inbuilt iterators. One may ask, what about these types that make them iterators and is this a property of only inbuilt types?
These types are iterators because they implement the iterator protocol. Then again, What is the iterator protocol?
To answer this question requires another little detour. In python, there are some special object methods commonly referred to as magic methods. Just stay with me on this and believe what I say on faith at least till we get to object orientation in python. These methods are not normally called explicitly in code but are called implicitly by the python interpreter during code execution. A very familiar example of these magic methods is the __init__
method that is roughly analogous to a constructor that is called during the initialization of a python object. Similar to the way __init__
magic method has to be implemented for custom object initialization, the iterator protocol has a number of magic methods that need to be implemented by any object that wants to be used as an iterator. These methods are
__iter__
method that is called on initialization of an iterator. This should return an object that has anext
method (In python 3 this is changed to__next__
).next
method that is called whenever thenext()
global function is invoked with the iterator as argument. The iteratornext
method should return the next value for the iterable. When an iterator is used with a for loop, the for loop implicitly callsnext()
on the iterator object. This method should raise aStopIteration
exception when there is no longer any new value to return to signal the end of the iteration.
Any python class can be defined to act as an iterator so long as the iterator protocol is implemented. This is illustrated by implementing a simple iterator that returns Fibonacci numbers up to a given maximum value.
class Fib:
def __init__(self, max):
self.max = max
def __iter__(self):
self.a = 0
self.b = 1
return self
def next(self):
fib = self.a
if fib > self.max:
raise StopIteration
self.a, self.b = self.b, self.a + self.b
return fib
>>>for i in Fib(10):
print i
0
1
1
2
3
5
8
We also go ahead and implement our own custom range
function for looping through numbers. This simple implementation only loops from 0 upwards.
class CustomRange:
def __init__(self, max):
self.max = max
def __iter__(self):
self.curr = 0
return self
def next(self):
numb = self.curr
if self.curr >= self.max:
raise StopIteration
self.curr += 1
return numb
for i in CustomRange(10):
print i
0
1
2
3
4
5
6
7
8
9
Back To Generators
Now, we have a basic understanding of iterators but how do they relate to generators. In short, python generators are iterators. PEP 255 that describes simple generators refers to generators by their full name, generator-iterators. Generators are used either by calling the next method on the generator object or using the generator object in a for loop.
In python, generator functions or just generators return generator objects. These generators are functions that contain the yield
key word. Rather than having to write every generators with the __iter__
and next
which is pretty cumbersome, python provides the yield
key word that provides an easy way for defining generators. For example the Fibonacci iterator can be recast as a generator using the yield
key word as shown below:
def fib(max):
a, b = 0, 1
while a < max:
yield a
a, b = b, a + b
The use of the yield
key word greatly simplifies the creation of the generator.
The yield
keyword
The yield
keyword is used in the following way
yield expression_list
The yield
keyword is central to python generator functions but what does this yield
keyword do? To understand the yield
keyword, we contrast it with the return
key word; the other keyword that gives back control to the caller of a function. When a function that is executing encounters the yield
keyword, it suspends execution at that point, saves its context and returns to the caller along with any value in the expression_list; when the caller invokes next on the object, execution of the function continues till another yield or return is encountered or end of function is reached. To quote PEP 255,
If a yield statement is encountered, the state of the function is
frozen, and the value of expression_list is returned to .next()‘s
caller. By “frozen” we mean that all local state is retained,
including the current bindings of local variables, the instruction
pointer, and the internal evaluation stack: enough information is
saved so that the next time .next() is invoked, the function can
proceed exactly as if the yield statement were just another external
call.
On the other hand when a function encounters areturn statement
, it returns to the caller along with any value proceeding thereturn
statement and the execution of such function is complete for all intent and purposes. One can think ofyield
as causing only a temporary interruption in the executions of a function.
Python generators in action
Returning to the Fibonacci number function, if we wanted to generate all Fibonacci numbers up to a maximum value, the following non-generator snippet can be used to create the sequence
def fib(max):
numbers = []
a, b = 0, 1
while a < max:
numbers.append(a)
a, b = b, a + b
return numbers
The above snippets eagerly calculates all the numbers below max and returns the collection of such numbers using a single function call. On the other hand, using the Fibonacci generator to solve the same problem is a different ball game. We can either use it in a for loop and allow the for construct to implicitly initialize the generator and call next
on the generator object or by explicitly initializing and calling next
on it. The values are returned one after the order by calling the next
on the generator.
The Fibonacci number generator is implemented using the yield
keyword as shown below:
def fib(max):
a, b = 0, 1
while a < max:
yield a
a, b = b, a + b
In the following sections, we explicitly initialize the generator and make use of the `next
function to get values from the generator. First, we initialize the generator object as shown below
>>> gen = fib(10)
>>> gen
<generator object fib at 0x1069a6d20>
>>>
What has happened above is that when the generator is called, the arguments (max has a value of 10) are bound to names but the body of the function is not executed. Rather a generator-iterator object is returned as shown by the value of gen
. This object can then be used as an iterator. Note that it is the presence of the yield
keyword is responsible for this.
>>> next(gen)
0
>>> next(gen)
1
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
3
>>> next(gen)
5
>>> next(gen)
8
>>> next(gen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Now when the next
function is called with the generator object as argument, the generator function body executes till it encounters a yield
or return
statement or the end of the function body is reached. In the case of encountering a yield statement, the expression following the yield is returned to the caller and the state of the function is saved. When next
is called on the Fibonacci generator object a is bound to 0
and b is bound to 1
. The while
condition is true so the first statement of the while loop is executed which happens to be a yield
expression. This expression return the value of a
which happens to be 0
to the caller and suspends at that point with all local context saved.
Think of this as eating your lunch partly and then storing it so as to continue eating later. You can keep eating till the lunch is exhausted and in the case of a generator, this is the function getting to a return statement or the end of function body. When the next
is called on the Fibonacci object again, execution resumes at the a, b = b, a+b
line and continues executing as normal till yield
is encountered again. This continues till the loop condition is false
and a StopIteration
exception is then raised to signal that there is no more data to generate.
Generator Expressions
In python comprehensions, we discussed list comprehensions and how they are formed. One drawback with list comprehensions is that values are calculated all at once regardless of whether the values are needed at that time or not. This may sometimes consume an inordinate amount of computer memory. PEP 289 proposed the generator expression to resolve this and this proposal was accepted and added to the language. Generator expressions are like list comprehensions; the only difference is that the square brackets in list comprehensions are replaced by circular brackets that return. We contrast list comprehensions and generators below.
To generate a list of the square of number from 0 to 10 using list comprehensions the following is done:
>>> squares = [i**2 for i in range(10)]
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We could use a generator expression such as below in place of a list comprehension:
>>> squares = (i**2 for i in range(10))
>>> squares
<generator object <genexpr> at 0x1069a6d70>
We can then access the values of the generators using for loops or next
methods as shown below. Each value is then computed only on demand.
>>> squares = (i**2 for i in range(10))
>>> for square in squares:
print square
0
1
4
9
16
25
36
49
64
81
Of what use are these generators?
Python generators provide the basis for lazy evaluation or calculation on demand in python. Lazy evaluation is an integral part of stream processing (processing of huge amount of data). For example, imagine we wanted an create an indeterminate amount of Fibonacci numbers, this would not be possible with a non-generator approach because we have to define the amount of numbers we need or go into an infinite loop. On the other hand, adopting the generator approach makes doing so trivial; we just have to call next
to get the next Fibonacci number without bothering about where or when the stream of numbers ends. A more practical type of stream processing is handling large data files such as log files. Generators provide a space efficient method for such data processing as only parts of the file are handled at one given point in time. (David Beazley.
Generators can also be used to replace callbacks. Rather than pass a callback to a function, the function can yield control to the caller when it needs to report to the caller. The caller can then invoke the function that would have been used as a callback. This frees the main function from having to know about the callback.
At a more advanced level, generators can also be used to implement concurrency (David Beazley). When a generator yields
control to the caller, the caller can then go ahead to call another generator simulating concurrency.
The above listed are just a few of possible applications of python generators. In a follow up post, we will discuss new additions to python generators that enable a caller to send values to a generator as well as some advanced uses of generators.
Further Reading
PEP 289 – Generator Expressions
Generators: The Final Frontier by David Beazley
Leave a Reply