Max in a sliding window in NumPy array

Pandas has a rolling method for both Series and DataFrames, and that could be of use here:

import pandas as pd

lst = [6,4,8,7,1,4,3,5,7,8,4,6,2,1,3,5,6,3,4,7,1,9,4,3,2]
lst1 = pd.Series(lst).rolling(5).max().dropna().tolist()

# [8.0, 8.0, 8.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 9.0, 9.0, 9.0, 9.0]

For consistency, you can coerce each element of lst1 to int:

[int(x) for x in lst1]

# [8, 8, 8, 7, 7, 8, 8, 8, 8, 8, 6, 6, 6, 6, 6, 7, 7, 9, 9, 9, 9]

Approach #1 : You could use 1D max filter from Scipy -

from scipy.ndimage.filters import maximum_filter1d

def max_filter1d_valid(a, W):
    hW = (W-1)//2 # Half window size
    return maximum_filter1d(a,size=W)[hW:-hW]

Approach #2 : Here's another approach with strides : strided_app to create a 2D shifted version as view into the array pretty efficiently and that should let us use any custom reduction operation along the second axis afterwards -

def max_filter1d_valid_strided(a, W):
    return strided_app(a, W, S=1).max(axis=1)

Runtime test -

In [55]: a = np.random.randint(0,10,(10000))

# @Abdou's solution using pandas rolling
In [56]: %timeit pd.Series(a).rolling(5).max().dropna().tolist()
1000 loops, best of 3: 999 µs per loop

In [57]: %timeit max_filter1d_valid(a, W=5)
    ...: %timeit max_filter1d_valid_strided(a, W=5)
    ...: 
10000 loops, best of 3: 90.5 µs per loop
10000 loops, best of 3: 87.9 µs per loop

Starting in Numpy 1.20, the sliding_window_view provides a way to slide/roll through windows of elements. Windows that you can then find the max for:

from numpy.lib.stride_tricks import sliding_window_view

# values = np.array([6,4,8,7,1,4,3,5,7,2,4,6,2,1,3,5,6,3,4,7,1,9,4,3,2])
np.max(sliding_window_view(values, window_shape = 5), axis = 1)
# array([8, 8, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 7, 7, 9, 9, 9, 9])

where:

  • window_shape is the size of the sliding window
  • np.max(array, axis = 1) finds the max for each sub-array

and the intermediate result of the sliding is:

sliding_window_view(values, window_shape = 5)
# array([[6, 4, 8, 7, 1],
#        [4, 8, 7, 1, 4],
#        [8, 7, 1, 4, 3],
#        ...
#        [7, 1, 9, 4, 3],
#        [1, 9, 4, 3, 2]])

I have tried several variants now and would declare the Pandas version as the winner of this performance race. I tried several variants, even using a binary tree (implemented in pure Python) for quickly computing maxes of arbitrary subranges. (Source available on demand). The best algorithm I came up with myself was a plain rolling window using a ringbuffer; the max of that only needed to be recomputed completely if the current max value was dropped from it in this iteration; otherwise it would remain or increase to the next new value. Compared with the old libraries, this pure-Python implementation was faster than the rest.

In the end I found that the version of the libraries in question was highly relevant. The rather old versions I was mainly still using were way slower than the modern versions. Here are the numbers for 1M numbers, rollingMax'ed with a window of size 100k:

         old (slow HW)           new (better HW)
scipy:   0.9.0:  21.2987391949   0.13.3:  11.5804400444
pandas:  0.7.0:  13.5896410942   0.18.1:   0.0551438331604
numpy:   1.6.1:   1.17417216301  1.8.2:    0.537392139435

Here is the implementation of the pure numpy version using a ringbuffer:

def rollingMax(a, window):
  def eachValue():
    w = a[:window].copy()
    m = w.max()
    yield m
    i = 0
    j = window
    while j < len(a):
      oldValue = w[i]
      newValue = w[i] = a[j]
      if newValue > m:
        m = newValue
      elif oldValue == m:
        m = w.max()
      yield m
      i = (i + 1) % window
      j += 1
  return np.array(list(eachValue()))

For my input this works great because I'm handling audio data with lots of peaks in all directions. If you put a constantly decreasing signal into it (e. g. -np.arange(10000000)), then you will experience the worst case (and maybe you should reverse the input and the output in such cases).

I just include this in case someone wants to do this task on a machine with old libraries.