Introduction to Python Generators
Generators are a very fascinating concept in python; generators have a wide range of applications from simple lazy evaluation to mind-blowing advanced concurrent execution of tasks (see David Beazley). Before we dive into the fascinating world of python generators, we take a little detour to explain python iterators, a concept that I feel is integral to grasping generators.
Simply put, an iterator in python is any python type that can be used with a
for loop. Python lists, tuples, dicts and sets are all examples of inbuilt iterators. One may ask, what about these types that make them iterators and is this a property of only inbuilt types?
These types are iterators because they implement the iterator protocol. Then again, What is the iterator protocol?
To answer this question requires another little detour. In python, there are some special object methods commonly referred to as magic methods. Just stay with me on this and believe what I say on faith at least till we get to object orientation in python. These methods are not normally called explicitly in code but are called implicitly by the python interpreter during code execution. A very familiar example of these magic methods is the
__init__ method that is roughly analogous to a constructor that is called during the initialization of a python object. Similar to the way
__init__ magic method has to be implemented for custom object initialization, the iterator protocol has a number of magic methods that need to be implemented by any object that wants to be used as an iterator. These methods are
__iter__method that is called on initialization of an iterator. This should return an object that has a
nextmethod (In python 3 this is changed to
nextmethod that is called whenever the
next()global function is invoked with the iterator as argument. The iterator
nextmethod should return the next value for the iterable. When an iterator is used with a for loop, the for loop implicitly calls
next()on the iterator object. This method should raise a
StopIterationexception when there is no longer any new value to return to signal the end of the iteration.
Any python class can be defined to act as an iterator so long as the iterator protocol is implemented. This is illustrated by implementing a simple iterator that returns Fibonacci numbers up to a given maximum value.
class Fib: def __init__(self, max): self.max = max def __iter__(self): self.a = 0 self.b = 1 return self def next(self): fib = self.a if fib > self.max: raise StopIteration self.a, self.b = self.b, self.a + self.b return fib >>>for i in Fib(10): print i 0 1 1 2 3 5 8
We also go ahead and implement our own custom
range function for looping through numbers. This simple implementation only loops from 0 upwards.
class CustomRange: def __init__(self, max): self.max = max def __iter__(self): self.curr = 0 return self def next(self): numb = self.curr if self.curr >= self.max: raise StopIteration self.curr += 1 return numb for i in CustomRange(10): print i 0 1 2 3 4 5 6 7 8 9
Back To Generators
Now, we have a basic understanding of iterators but how do they relate to generators. In short, python generators are iterators. PEP 255 that describes simple generators refers to generators by their full name, generator-iterators. Generators are used either by calling the next method on the generator object or using the generator object in a for loop.
In python, generator functions or just generators return generator objects. These generators are functions that contain the
yield key word. Rather than having to write every generators with the
next which is pretty cumbersome, python provides the
yield key word that provides an easy way for defining generators. For example the Fibonacci iterator can be recast as a generator using the
yield key word as shown below:
def fib(max): a, b = 0, 1 while a < max: yield a a, b = b, a + b
The use of the
yield key word greatly simplifies the creation of the generator.
yield keyword is used in the following way
yield keyword is central to python generator functions but what does this
yield keyword do? To understand the
yield keyword, we contrast it with the
return key word; the other keyword that gives back control to the caller of a function. When a function that is executing encounters the
yield keyword, it suspends execution at that point, saves its context and returns to the caller along with any value in the expression_list; when the caller invokes next on the object, execution of the function continues till another yield or return is encountered or end of function is reached. To quote PEP 255,
If a yield statement is encountered, the state of the function is
frozen, and the value of expression_list is returned to .next()‘s
caller. By “frozen” we mean that all local state is retained,
including the current bindings of local variables, the instruction
pointer, and the internal evaluation stack: enough information is
saved so that the next time .next() is invoked, the function can
proceed exactly as if the yield statement were just another external
On the other hand when a function encounters a
return statement, it returns to the caller along with any value proceeding the
returnstatement and the execution of such function is complete for all intent and purposes. One can think of
yieldas causing only a temporary interruption in the executions of a function.
Python generators in action
Returning to the Fibonacci number function, if we wanted to generate all Fibonacci numbers up to a maximum value, the following non-generator snippet can be used to create the sequence
def fib(max): numbers =  a, b = 0, 1 while a < max: numbers.append(a) a, b = b, a + b return numbers
The above snippets eagerly calculates all the numbers below max and returns the collection of such numbers using a single function call. On the other hand, using the Fibonacci generator to solve the same problem is a different ball game. We can either use it in a for loop and allow the for construct to implicitly initialize the generator and call
next on the generator object or by explicitly initializing and calling
next on it. The values are returned one after the order by calling the
next on the generator.
The Fibonacci number generator is implemented using the
yield keyword as shown below:
def fib(max): a, b = 0, 1 while a < max: yield a a, b = b, a + b
In the following sections, we explicitly initialize the generator and make use of the `
next function to get values from the generator. First, we initialize the generator object as shown below
>>> gen = fib(10) >>> gen <generator object fib at 0x1069a6d20> >>>
What has happened above is that when the generator is called, the arguments (max has a value of 10) are bound to names but the body of the function is not executed. Rather a generator-iterator object is returned as shown by the value of
gen. This object can then be used as an iterator. Note that it is the presence of the
yield keyword is responsible for this.
>>> next(gen) 0 >>> next(gen) 1 >>> next(gen) 1 >>> next(gen) 2 >>> next(gen) 3 >>> next(gen) 5 >>> next(gen) 8 >>> next(gen) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
Now when the
next function is called with the generator object as argument, the generator function body executes till it encounters a
return statement or the end of the function body is reached. In the case of encountering a yield statement, the expression following the yield is returned to the caller and the state of the function is saved. When
next is called on the Fibonacci generator object
a is bound to 0 and
b is bound to 1. The
while condition is true so the first statement of the while loop is executed which happens to be a
yield expression. This expression return the value of
a which happens to be
0 to the caller and suspends at that point with all local context saved.
Think of this as eating your lunch partly and then storing it so as to continue eating later. You can keep eating till the lunch is exhausted and in the case of a generator, this is the function getting to a return statement or the end of function body. When the
next is called on the Fibonacci object again, execution resumes at the
a, b = b, a+b line and continues executing as normal till
yield is encountered again. This continues till the loop condition is
false and a
StopIteration exception is then raised to signal that there is no more data to generate.
In python comprehensions, we discussed list comprehensions and how they are formed. One drawback with list comprehensions is that values are calculated all at once regardless of whether the values are needed at that time or not. This may sometimes consume an inordinate amount of computer memory. PEP 289 proposed the generator expression to resolve this and this proposal was accepted and added to the language. Generator expressions are like list comprehensions; the only difference is that the square brackets in list comprehensions are replaced by circular brackets that return. We contrast list comprehensions and generators below.
To generate a list of the square of number from 0 to 10 using list comprehensions the following is done:
>>> squares = [i**2 for i in range(10)] >>> squares [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We could use a generator expression such as below in place of a list comprehension:
>>> squares = (i**2 for i in range(10)) >>> squares <generator object <genexpr> at 0x1069a6d70>
We can then access the values of the generators using for loops or
next methods as shown below. Each value is then computed only on demand.
>>> squares = (i**2 for i in range(10)) >>> for square in squares: print square 0 1 4 9 16 25 36 49 64 81
Of what use are these generators?
Python generators provide the basis for lazy evaluation or calculation on demand in python. Lazy evaluation is an integral part of stream processing (processing of huge amount of data). For example, imagine we wanted an create an indeterminate amount of Fibonacci numbers, this would not be possible with a non-generator approach because we have to define the amount of numbers we need or go into an infinite loop. On the other hand, adopting the generator approach makes doing so trivial; we just have to call
next to get the next Fibonacci number without bothering about where or when the stream of numbers ends. A more practical type of stream processing is handling large data files such as log files. Generators provide a space efficient method for such data processing as only parts of the file are handled at one given point in time. (David Beazley.
Generators can also be used to replace callbacks. Rather than pass a callback to a function, the function can yield control to the caller when it needs to report to the caller. The caller can then invoke the function that would have been used as a callback. This frees the main function from having to know about the callback.
At a more advanced level, generators can also be used to implement concurrency (David Beazley). When a generator
yields control to the caller, the caller can then go ahead to call another generator simulating concurrency.
The above listed are just a few of possible applications of python generators. In a follow up post, we will discuss new additions to python generators that enable a caller to send values to a generator as well as some advanced uses of generators.
PEP 255 – Simple Generators
PEP 289 – Generator Expressions
Generators: The Final Frontier by David Beazley
Python Tutorials - Generators
Python Tutorials - Generator Expressions
Python Tutorials - Iterators