Where to use yield in Python best?

Simply put, yield gives you a generator. You'd use it where you would normally use a return in a function. As a really contrived example cut and pasted from a prompt...

>>> def get_odd_numbers(i):
...     return range(1, i, 2)
... 
>>> def yield_odd_numbers(i):
...     for x in range(1, i, 2):
...             yield x
... 
>>> foo = get_odd_numbers(10)
>>> bar = yield_odd_numbers(10)
>>> foo
[1, 3, 5, 7, 9]
>>> bar
<generator object yield_odd_numbers at 0x1029c6f50>
>>> next(bar)
1
>>> next(bar)
3
>>> next(bar)
5

As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable--which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it--this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)

This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.

Further reading:

  • python wiki http://wiki.python.org/moin/Generators
  • PEP on generators http://www.python.org/dev/peps/pep-0255/

yield is best used when you have a function that returns a sequence and you want to iterate over that sequence, but you do not need to have every value in memory at once.

For example, I have a python script that parses a large list of CSV files, and I want to return each line to be processed in another function. I don't want to store the megabytes of data in memory all at once, so I yield each line in a python data structure. So the function to get lines from the file might look something like:

def get_lines(files):
    for f in files:
        for line in f:
            #preprocess line
            yield line

I can then use the same syntax as with lists to access the output of this function:

for line in get_lines(files):
    #process line

but I save a lot of memory usage.

Tags:

Python

Yield