How to print the progress of a list comprehension in python?

tqdm

Using the tqdm package, a fast and versatile progress bar utility

pip install tqdm
from tqdm import tqdm

def process(token):
    return token['text']

l1 = [{'text': k} for k in range(5000)]
l2 = [process(token) for token in tqdm(l1)]
100%|███████████████████████████████████| 5000/5000 [00:00<00:00, 2326807.94it/s]

No requirement

1/ Use a side function

def report(index):
    if index % 1000 == 0:
        print(index)

def process(token, index, report=None):
    if report:
        report(index) 
    return token['text']

l1 = [{'text': k} for k in range(5000)]

l2 = [process(token, i, report) for i, token in enumerate(l1)]

2/ Use and and or statements

def process(token):
    return token['text']

l1 = [{'text': k} for k in range(5000)]
l2 = [(i % 1000 == 0 and print(i)) or process(token) for i, token in enumerate(l1)]

3/ Use both

def process(token):
    return token['text']

def report(i):
    i % 1000 == 0 and print(i)

l1 = [{'text': k} for k in range(5000)]
l2 = [report(i) or process(token) for i, token in enumerate(l1)]

All 3 methods print:

0
1000
2000
3000
4000

How 2 works

  • i % 1000 == 0 and print(i): and only checks the second statement if the first one is True so only prints when i % 1000 == 0
  • or process(token): or always checks both statements, but returns the first one which evals to True.
    • If i % 1000 != 0 then the first statement is False and process(token) is added to the list.
    • Else, then the first statement is None (because print returns None) and likewise, the or statement adds process(token) to the list

How 3 works

Similarly as 2, because report(i) does not return anything, it evals to None and or adds process(token) to the list


Just do:

from time import sleep
from tqdm import tqdm

def foo(i):
    sleep(0.01)
    return i

[foo(i) for i in tqdm(range(1000))]

For Jupyter notebook:

from tqdm.notebook import tqdm

doc_collection = [[1, 2],
                  [3, 4],
                  [5, 6]]

result = [print(progress) or
          [str(token) for token in document]
          for progress, document in enumerate(doc_collection)]

print(result)  # [['1', '2'], ['3', '4'], ['5', '6']]

I don't consider this good or readable code, but the idea is fun.

It works because print always returns None so print(progress) or x will always be x (by the definition of or).