Progress indicator during pandas operations

Due to popular demand, I've added pandas support in tqdm (pip install "tqdm>=4.9.0"). Unlike the other answers, this will not noticeably slow pandas down -- here's an example for DataFrameGroupBy.progress_apply:

import pandas as pd
import numpy as np
from tqdm import tqdm
# from tqdm.auto import tqdm  # for notebooks

# Create new `pandas` methods which use `tqdm` progress
# (can use tqdm_gui, optional kwargs, etc.)
tqdm.pandas()

df = pd.DataFrame(np.random.randint(0, int(1e8), (10000, 1000)))
# Now you can use `progress_apply` instead of `apply`
df.groupby(0).progress_apply(lambda x: x**2)

In case you're interested in how this works (and how to modify it for your own callbacks), see the examples on GitHub, the full documentation on PyPI, or import the module and run help(tqdm). Other supported functions include map, applymap, aggregate, and transform.

EDIT


To directly answer the original question, replace:

df_users.groupby(['userID', 'requestDate']).apply(feature_rollup)

with:

from tqdm import tqdm
tqdm.pandas()
df_users.groupby(['userID', 'requestDate']).progress_apply(feature_rollup)

Note: tqdm <= v4.8: For versions of tqdm below 4.8, instead of tqdm.pandas() you had to do:

from tqdm import tqdm, tqdm_pandas
tqdm_pandas(tqdm())

In case you need support for how to use this in a Jupyter/ipython notebook, as I did, here's a helpful guide and source to relevant article:

from tqdm._tqdm_notebook import tqdm_notebook
import pandas as pd
tqdm_notebook.pandas()
df = pd.DataFrame(np.random.randint(0, int(1e8), (10000, 1000)))
df.groupby(0).progress_apply(lambda x: x**2)

Note the underscore in the import statement for _tqdm_notebook. As referenced article mentions, development is in late beta stage.

UPDATE as of 11/12/2021

I'm currently now using pandas==1.3.4 and tqdm==4.62.3, and I'm not sure which version tqdm authors implemented this change, but the above import statement is deprecated. Instead use:

 from tqdm.notebook import tqdm_notebook

UPDATE as of 02/01/2022 It's now possible to simplify import statements for .py an .ipynb files alike:

from tqdm.auto import tqdm
tqdm.pandas()

That should work as expected for both types of development environments, and should work on pandas dataframes or other tqdm-worthy iterables.

UPDATE as of 05/27/2022 If you're using a jupyter notebook on SageMaker, this combo works:

from tqdm import tqdm
from tqdm.gui import tqdm as tqdm_gui
tqdm.pandas(ncols=50)