In what situations should you actually use generators in python?

Does this defeat the purpose of using a generator as it then creates this in an even list. In this case in what exact cases are generators useful?

This is a bit opinion based, but there are some situations where a list might not do the trick (for example because of hardware limitations).

Saving CPU cycles (time)

Imagine that you have a list of even numbers, and then want to take the sum of the first five numbers. In Python we could do that with an islice, like:

sumfirst5even = sum(islice(even(100), 5))

If we would first generate a list of 100 even numbers (not knowing what we will later do with that list), then we have spent a lot of CPU cycles in the construction of such list, that are wasted.

By using a generator, we can restrict this to only the elements we really need. So we will only yield the first five elements. The algorithm will never calculate elements larger than 10. Yes, here it is doubful that this will have any (significant) impact. It is even possible that the "generator protocol" will require more CPU cycles compared to generating a list, so for small lists, there is no advantage. But now imagine that we used even(100000), then the amount of "useless CPU cycles" we spent on generating an entire list, can be significantly.

Saving memory

Another potential benefit is saving memory, given we do not need all elements of the generator in memory concurrently.

Take for example the following example:

for x in even(1000):
    print(x)

If even(..) constructs a list of 1000 elements, then that means that all these numbers need to be objects in memory concurrently. Depending on the Python interpreter, objects can take significant amount(s) of memory. For example an int takes in CPython, 28 bytes of memory. So that means that a list containing 500 such ints can take roughly 14 kB of memory (some extra memory for the list). Yes most Python interpreters maintain a "flyweight" pattern to reduce the burden of small ints (these are shared, and so we do not create a separate object for each int we construct in the process), but still it can easily add up. For an even(1000000), we will need 14 MB of memory.

If we use a generator, than depending on how we use the generator, we might save memory. Why? Because once we no longer need the number 123456 (since the for loop advances to the next item), the space that object "occupied" can be recycled, and given to an int object with value 12348. So it means that - given the way we use the generator permits this - that the memory usage remains constant, whereas for a list it scales linear. Of course the generator itself also needs to do proper management: if in the generator code, we build up a collection, than the memory will of course increase as well.

In 32-bit systems, this can even result in some problems since Python lists have a maximum length. A list can contain at most 536'870'912 elements. Yes that is a huge number, but what if you for example want to generate all permutations of a given list? If we store the permutations in a list, then that means that for a 32-bit system, a list of 13 (or more elements), we will never be able to construct such a list.

"online" programs

In theoretical computer science, an "online algorithm" is by some researchers defined as an algorithm that receives input gradually, and thus does not knows the entire input in advance.

A practical example can be a webcam, that each second makes an image, and sends it to a Python webserver. We do not know at that moment how a picture that will be captured by the webcam within 24 hours will look like. But we might be interested in detecting a burglar that aims to steal something. In that case a list of frames will thus not contain all images. A generator can however construct an elegant "protocol" where we iteratively fetch an image, detect a burglar, and raise an alarm, like:

for frame in from_webcam():
    if contains_burglar(frame):
        send_alarm_email('Maurice Moss')

Infinite generators

We do not need webcams or other hardware to exploit the elegance of generators. Generators can yield an "infinite" sequence. Or even generator could for example look like:

def even():
    i = 0
    while True:
        yield i
        i += 2

This is a generator that will eventually generate all even numbers. If we keep iterating over it, eventually we will yield the number 123'456'789'012'345'678 (although it might take a very long time).

The above can be useful if we want to implement a program that for example keeps yielding even numbers that are palindromes. This could look like:

for i in even():
    if is_palindrome(i):
        print(i)

We thus can assume that this program will keep working, and do not need to "update" the list of even numbers. In some pure functional languages that make lazy programming transparent, programs are written as if you create a list, but in fact it is typically a generator in place.

"enriched" generators: range(..) and friends

In Python a lot of classes do not construct lists when you iterate over them, for example a range(1000) object does not first construct a list (it does in python-2.x, but not in python-3.x). The range(..) object simply represents a range. A range(..) object is not a generator, but it is a class that can generate an iterator object, that works like a generator.

Besides iterating, we can do all kinds of things with a range(..) object, that is possible with lists, but not in an efficient way.

If we for example want to know whether 1000000000 is an element of range(400, 10000000000, 2), then we can write 1000000000 in range(400, 10000000000, 2). Now there is an algorithm in place that will check this without generating the range, or constructing a list: it sees if the elements is an int, is in the range of the range(..) object (so greater than or equal to 400, and less than 10000000000), and whether it is yielded (taking the step into account), this does not require iterating over it. As a result the membership check can be done instantly.

If we had generated a list, this would mean that Python had to enumerate over every element until it finally can find that element (or reaches the end of the list). For numbers like 1000000000, this can easily take minutes, hours, maybe days.

We can also "slice" the range object, which yield another range(..) object, for example:

>>> range(123, 456, 7)[1::4]
range(130, 459, 28)

with an algorithm we can thus instantly slice the range(..) object into a new range object. Slicing a list takes linear time. This can again (for huge lists) take significant time and memory.


Generators are shorter and more readable:

In your example, you have to create an empty list, use append and return the resulting list:

def even(k):
    evens_list = []
    for i in range(k):
        if i % 2 != 0:
           evens_list.append(i)
    return evens_list

The generator just needs yield:

def even(k):
    for i in range(k):
        if i % 2 != 0:
           yield i

And the use is nearly the same, if you need really a list. Instead of

event_list = even(100)

the line becomes

event_list = list(even(100))

The generator but in general a lazy semantic offer some advantages:

  • You can create infinite list
  • You can save a lot of memory because it doesn't keep in memory all the list
  • Is often used for expensive IO operations so you can effectively retrieve data only when you really use it

But also some drawbacks:

  • Overhead
    • You need to keep in memory the variables of the generator function
    • also risk of memory leak
  • Every time you want to reuse the elements in a collection it must be regenerated