Pandas - replace all NaN values in DataFrame with empty python dict objects

I was able to use DataFrame.applymap in this way:

>>> from pandas import isnull
>>> frame=frame.applymap(lambda x: {} if isnull(x) else x)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

This solution avoids the pitfalls in both EdChum's solution (where all NaN cells wind up pointing at same underlying dict object in memory, preventing them from being updated independently from one another) and Shashank's (where a potentially large data structure needs to be constructed with nested dicts, just to specify a single empty dict value).

DataFrame.where is a way of achieving this quite directly:

>>> data = {'Q': {'X': {2: 2010}, 'Y': {2: 2011, 3: 2009}}, 'R': {'X': {1: 2013}}}
>>> frame = DataFrame(data)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

>>> frame.where(frame.notna(), lambda x: [{}])
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

Also, it appears to be a bit faster:

>>> %timeit frame.where(frame.notna(), lambda x: [{}])
791 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit frame.applymap(lambda x: {} if isnull(x) else x)
1.07 ms ± 7.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

(on larger datasets I've observed speedups of ~10x)

Pandas - replace all NaN values in DataFrame with empty python dict objects

Tags:

Python

Pandas

Related

Recent Posts