Length of a finite generator

I ran Windows 64-bit Python 3.4.3 timeit on a few approaches I could think of:

>>> from timeit import timeit
>>> from textwrap import dedent as d
>>> timeit(
...     d("""
...     count = -1
...     for _ in s:
...         count += 1
...     count += 1
...     """),
...     "s = range(1000)",
... )
50.70772041983173
>>> timeit(
...     d("""
...     count = -1
...     for count, _ in enumerate(s):
...         pass
...     count += 1
...     """),
...     "s = range(1000)",
... )
42.636973504498656
>>> timeit(
...     d("""
...     count, _ = reduce(f, enumerate(range(1000)), (-1, -1))
...     count += 1
...     """),
...     d("""
...     from functools import reduce
...     def f(_, count):
...         return count
...     s = range(1000)
...     """),
... )
121.15513102540672
>>> timeit("count = sum(1 for _ in s)", "s = range(1000)")
58.179126025925825
>>> timeit("count = len(tuple(s))", "s = range(1000)")
19.777029680237774
>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932
>>> timeit("count = len(list(1 for _ in s))", "s = range(1000)")
57.41422175998332

Shockingly, the fastest approach was to use a list (not even a tuple) to exhaust the iterator and get the length from there:

>>> timeit("count = len(list(s))", "s = range(1000)")
18.145157531932

Of course, this risks memory issues. The best low-memory alternative was to use enumerate on a NOOP for-loop:

>>> timeit(
...     d("""
...     count = -1
...     for count, _ in enumerate(s):
...         pass
...     count += 1
...     """),
...     "s = range(1000)",
... )
42.636973504498656

Cheers!


If you have to do this, the first method is much better - as you consume all the values, itertools.tee() will have to store all the values anyway, meaning a list will be more efficient.

To quote from the docs:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().