Why does itertools.product run through all elements at initialization?

To answer your question we need to look at the implementation of itertools.product:

def product(*args, repeat=1):
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

here you find the real C implementation, but to answer this question, it is enough to refer to python (see the EXTRA paragraph at the bottom).

focus on this line of code:

pools = [tuple(pool) for pool in args] * repeat

in this way all the elements of the two iterators (taken in input) are transformed into a list of tuples (only the first time you call next()), and at this time they are actually created.

Returning to your code, when you call next(l) for the first time, all elements of the iterators are created. In your example the list will be created the polls list with the following elements:

# pools: [(0, 1, 2), (0, 1, 2)]

which is why you got those outputs.

As for the print("Here"), to understand why it is printed first you need to understand how the generators work:

itertool.product() returns a generator object. The generator does not execute the function code until it is stimulated by the first next(). Subsequently, each call next() allows you to calculate the next element, executing only once the loop containing the keyword yield.

Here you will find excellent resources to better understand how python generators work.

Why did 'itertools' choose to keep the list of tuples in memory?

Because the Cartesian product must evaluate the same element several times, and iterators cannot instead be consumed only once.

EXTRA

in C the list of tuple pools it is created equivalent to python, as you can see from this code, are evaluated eagerly. Each iterable argument is first converted to a tuple:

pools = PyTuple_New(npools);
if (pools == NULL)
    goto error;

for (i=0; i < nargs ; ++i) {
    PyObject *item = PyTuple_GET_ITEM(args, i);
    PyObject *pool = PySequence_Tuple(item);
    if (pool == NULL)
        goto error;
    PyTuple_SET_ITEM(pools, i, pool);
    indices[i] = 0;
}
for ( ; i < npools; ++i) {
    PyObject *pool = PyTuple_GET_ITEM(pools, i - nargs);
    Py_INCREF(pool);
    PyTuple_SET_ITEM(pools, i, pool);
    indices[i] = 0;
}

I'd like to point out that while for both instances of class A the __next__ method gets called exhaustively (until StopIteration is encountered), the itertools.product iterator is still lazy evaluated with the subsequent calls to next. Notice that:

> 'I am at 0'

> 'I am at 1'

> 'I am at 2'

> 'I am at 0'

> 'I am at 1'

> 'I am at 2'

> 'Here'

is just a result of calling exhaustively next first for the first passed instance, and then for the second. This is more readily seen when calling product(A(2), A(3)), which results in:

> 'I am at 0'

> 'I am at 1'

> 'I am at 0'

> 'I am at 1'

> 'I am at 2'

The same behavior is observed for combinations and permutations. In fact searching for so informed question with "Does itertools.product evaluate its arguments lazily?" brought me to this SO question which also answers your question. The arguments are not evaluated lazily:

since product sometimes needs to go over an iterable more than once, which is not possible if the arguments were left as iterators that can only be consumed once.

Why does itertools.product run through all elements at initialization?

Why did 'itertools' choose to keep the list of tuples in memory?

EXTRA

Tags:

Iteration

Generator

Python 3.7

Related

Recent Posts