Shift elements in a numpy array

Not numpy but scipy provides exactly the shift functionality you want,

import numpy as np
from scipy.ndimage.interpolation import shift

xs = np.array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

shift(xs, 3, cval=np.NaN)

where default is to bring in a constant value from outside the array with value cval, set here to nan. This gives the desired output,

array([ nan, nan, nan, 0., 1., 2., 3., 4., 5., 6.])

and the negative shift works similarly,

shift(xs, -3, cval=np.NaN)

Provides output

array([  3.,   4.,   5.,   6.,   7.,   8.,   9.,  nan,  nan,  nan])

Benchmarks & introducing Numba

1. Summary

  • The accepted answer (scipy.ndimage.interpolation.shift) is the slowest solution listed in this page.
  • Numba (@numba.njit) gives some performance boost when array size smaller than ~25.000
  • "Any method" equally good when array size large (>250.000).
  • The fastest option really depends on
        (1)  Length of your arrays
        (2)  Amount of shift you need to do.
  • Below is the picture of the timings of all different methods listed on this page (2020-07-11), using constant shift = 10. As one can see, with small array sizes some methods are use more than +2000% time than the best method.

Relative timings, constant shift (10), all methods

2. Detailed benchmarks with the best options

  • Choose shift4_numba (defined below) if you want good all-arounder

Relative timings, best methods (Benchmarks)

3. Code

3.1 shift4_numba

  • Good all-arounder; max 20% wrt. to the best method with any array size
  • Best method with medium array sizes: ~ 500 < N < 20.000.
  • Caveat: Numba jit (just in time compiler) will give performance boost only if you are calling the decorated function more than once. The first call takes usually 3-4 times longer than the subsequent calls. You can get even more performance boost with ahead of time compiled numba.
import numba

def shift4_numba(arr, num, fill_value=np.nan):
    if num >= 0:
        return np.concatenate((np.full(num, fill_value), arr[:-num]))
        return np.concatenate((arr[-num:], np.full(-num, fill_value)))

3.2. shift5_numba

  • Best option with small (N <= 300.. 1500) array sizes. Treshold depends on needed amount of shift.
  • Good performance on any array size; max + 50% compared to the fastest solution.
  • Caveat: Numba jit (just in time compiler) will give performance boost only if you are calling the decorated function more than once. The first call takes usually 3-4 times longer than the subsequent calls. You can get even more performance boost with ahead of time compiled numba.
import numba

def shift5_numba(arr, num, fill_value=np.nan):
    result = np.empty_like(arr)
    if num > 0:
        result[:num] = fill_value
        result[num:] = arr[:-num]
    elif num < 0:
        result[num:] = fill_value
        result[:num] = arr[-num:]
        result[:] = arr
    return result

3.3. shift5

  • Best method with array sizes ~ 20.000 < N < 250.000
  • Same as shift5_numba, just remove the @numba.njit decorator.

4 Appendix

4.1 Details about used methods

  • shift_scipy: scipy.ndimage.interpolation.shift (scipy 1.4.1) - The option from accepted answer, which is clearly the slowest alternative.
  • shift1: np.roll and out[:num] xnp.nan by IronManMark20 & gzc
  • shift2: np.roll and np.put by IronManMark20
  • shift3: np.pad and slice by gzc
  • shift4: np.concatenate and np.full by chrisaycock
  • shift5: using two times result[slice] = x by chrisaycock
  • shift#_numba: @numba.njit decorated versions of the previous.

The shift2 and shift3 contained functions that were not supported by the current numba (0.50.1).

4.2 Other test results

4.2.1 Relative timings, all methods

  • Relative timings, 10% shift, all methods
  • Relative timings, constant shift (10), all methods

4.2.2 Raw timings, all methods

  • Raw timings, constant shift (10), all methods
  • Raw timings, 10% shift, all methods

4.2.3 Raw timings, few best methods

  • Raw timings with small arrays, constant shift (10), few best methods
  • Raw timings with small arrays, 10% shift, few best methods
  • Raw timings with large arrays, constant shift (10), few best methods
  • Raw timings with large arrays, 10% shift, few best methods

For those who want to just copy and paste the fastest implementation of shift, there is a benchmark and conclusion(see the end). In addition, I introduce fill_value parameter and fix some bugs.


import numpy as np
import timeit

# enhanced from IronManMark20 version
def shift1(arr, num, fill_value=np.nan):
    arr = np.roll(arr,num)
    if num < 0:
        arr[num:] = fill_value
    elif num > 0:
        arr[:num] = fill_value
    return arr

# use np.roll and np.put by IronManMark20
def shift2(arr,num):
    if num<0:
    elif num > 0:
    return arr

# use np.pad and slice by me.
def shift3(arr, num, fill_value=np.nan):
    l = len(arr)
    if num < 0:
        arr = np.pad(arr, (0, abs(num)), mode='constant', constant_values=(fill_value,))[:-num]
    elif num > 0:
        arr = np.pad(arr, (num, 0), mode='constant', constant_values=(fill_value,))[:-num]

    return arr

# use np.concatenate and np.full by chrisaycock
def shift4(arr, num, fill_value=np.nan):
    if num >= 0:
        return np.concatenate((np.full(num, fill_value), arr[:-num]))
        return np.concatenate((arr[-num:], np.full(-num, fill_value)))

# preallocate empty array and assign slice by chrisaycock
def shift5(arr, num, fill_value=np.nan):
    result = np.empty_like(arr)
    if num > 0:
        result[:num] = fill_value
        result[num:] = arr[:-num]
    elif num < 0:
        result[num:] = fill_value
        result[:num] = arr[-num:]
        result[:] = arr
    return result

arr = np.arange(2000).astype(float)

def benchmark_shift1():
    shift1(arr, 3)

def benchmark_shift2():
    shift2(arr, 3)

def benchmark_shift3():
    shift3(arr, 3)

def benchmark_shift4():
    shift4(arr, 3)

def benchmark_shift5():
    shift5(arr, 3)

benchmark_set = ['benchmark_shift1', 'benchmark_shift2', 'benchmark_shift3', 'benchmark_shift4', 'benchmark_shift5']

for x in benchmark_set:
    number = 10000
    t = timeit.timeit('%s()' % x, 'from __main__ import %s' % x, number=number)
    print '%s time: %f' % (x, t)

benchmark result:

benchmark_shift1 time: 0.265238
benchmark_shift2 time: 0.285175
benchmark_shift3 time: 0.473890
benchmark_shift4 time: 0.099049
benchmark_shift5 time: 0.052836


shift5 is winner! It's OP's third solution.


