How to split a pandas time-series by NAN values

You can use numpy.split and then filter the resulting list. Here is one example assuming that the column with the values is labeled "value":

events = np.split(df, np.where(np.isnan(df.value))[0])
# removing NaN entries
events = [ev[~np.isnan(ev.value)] for ev in events if not isinstance(ev, np.ndarray)]
# removing empty DataFrames
events = [ev for ev in events if not ev.empty]

You will have a list with all the events separated by the NaN values.


Note, this answer is for pandas<0.25.0, if you're using 0.25.0 or greater see this answer by thesofakillers


I found an efficient solution for very large and sparse datasets. In my case, hundreds of thousands of rows with only a dozen or so brief segments of data between NaN values. I (ab)used the internals of pandas.SparseIndex, which is a feature to help compress sparse datasets in memory.

Given some data:

import pandas as pd
import numpy as np

# 10 days at per-second resolution, starting at midnight Jan 1st, 2011
rng = pd.date_range('1/1/2011', periods=10 * 24 * 60 * 60, freq='S')
dense_ts = pd.Series(np.nan, index=rng, dtype=np.float64)

# Three blocks of non-null data throughout timeseries
dense_ts[500:510] = np.random.randn(10)
dense_ts[12000:12015] = np.random.randn(15)
dense_ts[20000:20050] = np.random.randn(50)

Which looks like:

2011-01-01 00:00:00   NaN
2011-01-01 00:00:01   NaN
2011-01-01 00:00:02   NaN
2011-01-01 00:00:03   NaN
                       ..
2011-01-10 23:59:56   NaN
2011-01-10 23:59:57   NaN
2011-01-10 23:59:58   NaN
2011-01-10 23:59:59   NaN
Freq: S, Length: 864000, dtype: float64

We can find the blocks efficiently and easily:

# Convert to sparse then query index to find block locations
sparse_ts = dense_ts.to_sparse()
block_locs = zip(sparse_ts.sp_index.blocs, sparse_ts.sp_index.blengths)

# Map the sparse blocks back to the dense timeseries
blocks = [dense_ts.iloc[start:(start + length - 1)] for (start, length) in block_locs]

Voila:

[2011-01-01 00:08:20    0.531793
 2011-01-01 00:08:21    0.484391
 2011-01-01 00:08:22    0.022686
 2011-01-01 00:08:23   -0.206495
 2011-01-01 00:08:24    1.472209
 2011-01-01 00:08:25   -1.261940
 2011-01-01 00:08:26   -0.696388
 2011-01-01 00:08:27   -0.219316
 2011-01-01 00:08:28   -0.474840
 Freq: S, dtype: float64, 2011-01-01 03:20:00   -0.147190
 2011-01-01 03:20:01    0.299565
 2011-01-01 03:20:02   -0.846878
 2011-01-01 03:20:03   -0.100975
 2011-01-01 03:20:04    1.288872
 2011-01-01 03:20:05   -0.092474
 2011-01-01 03:20:06   -0.214774
 2011-01-01 03:20:07   -0.540479
 2011-01-01 03:20:08   -0.661083
 2011-01-01 03:20:09    1.129878
 2011-01-01 03:20:10    0.791373
 2011-01-01 03:20:11    0.119564
 2011-01-01 03:20:12    0.345459
 2011-01-01 03:20:13   -0.272132
 Freq: S, dtype: float64, 2011-01-01 05:33:20    1.028268
 2011-01-01 05:33:21    1.476468
 2011-01-01 05:33:22    1.308881
 2011-01-01 05:33:23    1.458202
 2011-01-01 05:33:24   -0.874308
                              ..
 2011-01-01 05:34:02    0.941446
 2011-01-01 05:34:03   -0.996767
 2011-01-01 05:34:04    1.266660
 2011-01-01 05:34:05   -0.391560
 2011-01-01 05:34:06    1.498499
 2011-01-01 05:34:07   -0.636908
 2011-01-01 05:34:08    0.621681
 Freq: S, dtype: float64]