Introduction to Python Generators

Generators are a very fascinating concept in python; generators have a wide range of applications from simple lazy evaluation to mind-blowing advanced concurrent execution of tasks (see David Beazley). Before we dive into the fascinating world of python generators, we take a little detour to explain python iterators, a concept that I feel is integral to grasping generators.

 Python Iterators

Simply put, an iterator in python is any python type that can be used with a for loop. Python lists, tuples, dicts and sets are all examples of inbuilt iterators. One may ask, what about these types that make them iterators and is this a property of only inbuilt types?
These types are iterators because they implement the iterator protocol. Then again, What is the iterator protocol?
To answer this question requires another little detour. In python, there are some special object methods commonly referred to as magic methods. Just stay with me on this and believe what I say on faith at least till we get to object orientation in python. These methods are not normally called explicitly in code but are called implicitly by the python interpreter during code execution. A very familiar example of these magic methods is the __init__ method that is roughly analogous to a constructor that is called during the initialization of a python object. Similar to the way __init__ magic method has to be implemented for custom object initialization, the iterator protocol has a number of magic methods that need to be implemented by any object that wants to be used as an iterator. These methods are

Any python class can be defined to act as an iterator so long as the iterator protocol is implemented. This is illustrated by implementing a simple iterator that returns Fibonacci numbers up to a given maximum value.

class Fib:                                        
    def __init__(self, max):                      
        self.max = max

    def __iter__(self):                          
        self.a = 0
        self.b = 1
        return self

    def next(self):                          
        fib = self.a
        if fib > self.max:
            raise StopIteration                  
        self.a, self.b = self.b, self.a + self.b
        return fib           

>>>for i in Fib(10):
        print i      

0
1
1
2
3
5
8               

We also go ahead and implement our own custom range function for looping through numbers. This simple implementation only loops from 0 upwards.


class CustomRange:
    def __init__(self, max):
        self.max = max

    def __iter__(self):
        self.curr = 0
        return self

    def next(self):
        numb = self.curr
        if self.curr >= self.max:
            raise StopIteration
        self.curr += 1
        return numb

for i in CustomRange(10):
     print i 
0
1
2
3
4
5
6
7
8
9

 Back To Generators

Now, we have a basic understanding of iterators but how do they relate to generators. In short, python generators are iterators. PEP 255 that describes simple generators refers to generators by their full name, generator-iterators. Generators are used either by calling the next method on the generator object or using the generator object in a for loop.

In python, generator functions or just generators return generator objects. These generators are functions that contain the yield key word. Rather than having to write every generators with the __iter__ and next which is pretty cumbersome, python provides the yield key word that provides an easy way for defining generators. For example the Fibonacci iterator can be recast as a generator using the yield key word as shown below:

def fib(max):
    a, b = 0, 1
    while a < max:
        yield a
        a, b = b, a + b

The use of the yield key word greatly simplifies the creation of the generator.

 The yield keyword

The yield keyword is used in the following way

yield expression_list

The yield keyword is central to python generator functions but what does this yield keyword do? To understand the yield keyword, we contrast it with the return key word; the other keyword that gives back control to the caller of a function. When a function that is executing encounters the yield keyword, it suspends execution at that point, saves its context and returns to the caller along with any value in the expression_list; when the caller invokes next on the object, execution of the function continues till another yield or return is encountered or end of function is reached. To quote PEP 255,

If a yield statement is encountered, the state of the function is
frozen, and the value of expression_list is returned to .next()‘s
caller. By “frozen” we mean that all local state is retained,
including the current bindings of local variables, the instruction
pointer, and the internal evaluation stack: enough information is
saved so that the next time .next() is invoked, the function can
proceed exactly as if the yield statement were just another external
call.
On the other hand when a function encounters a return statement, it returns to the caller along with any value proceeding the return statement and the execution of such function is complete for all intent and purposes. One can think of yield as causing only a temporary interruption in the executions of a function.

 Python generators in action

Returning to the Fibonacci number function, if we wanted to generate all Fibonacci numbers up to a maximum value, the following non-generator snippet can be used to create the sequence

def fib(max):
    numbers = []
    a, b = 0, 1
    while a < max:
        numbers.append(a)
        a, b = b, a + b
     return numbers

The above snippets eagerly calculates all the numbers below max and returns the collection of such numbers using a single function call. On the other hand, using the Fibonacci generator to solve the same problem is a different ball game. We can either use it in a for loop and allow the for construct to implicitly initialize the generator and call next on the generator object or by explicitly initializing and calling next on it. The values are returned one after the order by calling the next on the generator.
The Fibonacci number generator is implemented using the yield keyword as shown below:

def fib(max):
    a, b = 0, 1
    while a < max:
        yield a
        a, b = b, a + b

In the following sections, we explicitly initialize the generator and make use of the `next function to get values from the generator. First, we initialize the generator object as shown below

 >>> gen = fib(10)
>>> gen
<generator object fib at 0x1069a6d20>
>>> 

What has happened above is that when the generator is called, the arguments (max has a value of 10) are bound to names but the body of the function is not executed. Rather a generator-iterator object is returned as shown by the value of gen. This object can then be used as an iterator. Note that it is the presence of the yield keyword is responsible for this.

>>> next(gen)
0
>>> next(gen)
1
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
3
>>> next(gen)
5
>>> next(gen)
8
>>> next(gen)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Now when the next function is called with the generator object as argument, the generator function body executes till it encounters a yield or return statement or the end of the function body is reached. In the case of encountering a yield statement, the expression following the yield is returned to the caller and the state of the function is saved. When next is called on the Fibonacci generator object a is bound to 0 and b is bound to 1. The while condition is true so the first statement of the while loop is executed which happens to be a yield expression. This expression return the value of a which happens to be 0 to the caller and suspends at that point with all local context saved.
Think of this as eating your lunch partly and then storing it so as to continue eating later. You can keep eating till the lunch is exhausted and in the case of a generator, this is the function getting to a return statement or the end of function body. When the next is called on the Fibonacci object again, execution resumes at the a, b = b, a+b line and continues executing as normal till yield is encountered again. This continues till the loop condition is false and a StopIteration exception is then raised to signal that there is no more data to generate.

 Generator Expressions

In python comprehensions, we discussed list comprehensions and how they are formed. One drawback with list comprehensions is that values are calculated all at once regardless of whether the values are needed at that time or not. This may sometimes consume an inordinate amount of computer memory. PEP 289 proposed the generator expression to resolve this and this proposal was accepted and added to the language. Generator expressions are like list comprehensions; the only difference is that the square brackets in list comprehensions are replaced by circular brackets that return. We contrast list comprehensions and generators below.

To generate a list of the square of number from 0 to 10 using list comprehensions the following is done:

>>> squares = [i**2 for i in range(10)]
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We could use a generator expression such as below in place of a list comprehension:

>>> squares = (i**2 for i in range(10))
>>> squares
<generator object <genexpr> at 0x1069a6d70>

We can then access the values of the generators using for loops or next methods as shown below. Each value is then computed only on demand.

>>> squares = (i**2 for i in range(10))
>>> for square in squares:
            print square
0
1
4
9
16
25
36
49
64
81

 Of what use are these generators?

Python generators provide the basis for lazy evaluation or calculation on demand in python. Lazy evaluation is an integral part of stream processing (processing of huge amount of data). For example, imagine we wanted an create an indeterminate amount of Fibonacci numbers, this would not be possible with a non-generator approach because we have to define the amount of numbers we need or go into an infinite loop. On the other hand, adopting the generator approach makes doing so trivial; we just have to call next to get the next Fibonacci number without bothering about where or when the stream of numbers ends. A more practical type of stream processing is handling large data files such as log files. Generators provide a space efficient method for such data processing as only parts of the file are handled at one given point in time. (David Beazley.

Generators can also be used to replace callbacks. Rather than pass a callback to a function, the function can yield control to the caller when it needs to report to the caller. The caller can then invoke the function that would have been used as a callback. This frees the main function from having to know about the callback.

At a more advanced level, generators can also be used to implement concurrency (David Beazley). When a generator yields control to the caller, the caller can then go ahead to call another generator simulating concurrency.

The above listed are just a few of possible applications of python generators. In a follow up post, we will discuss new additions to python generators that enable a caller to send values to a generator as well as some advanced uses of generators.

 Further Reading

PEP 255 – Simple Generators

PEP 289 – Generator Expressions

Generators: The Final Frontier by David Beazley

Python Tutorials - Generators

Python Tutorials - Generator Expressions

Python Tutorials - Iterators

 
1,013
Kudos
 
1,013
Kudos

Now read this

Classes and Objects II: Descriptors

Descriptors are an esoteric but integral part of the python programming language. They are used widely in the core of the python language and a good grasp of descriptors provides a python programmer with an extra trick in his or her... Continue →