Why does numpy not short-circuit on non-contiguous arrays?

I got interested in solving this problem. So I`ve come with the next solution that manages to avoid the "a[::-1]" problem case due to internal ndarray copies by np.argmax:

I created a small library that implements a function argmax which is a wrapper of np.argmax, but it has increased performance when the input argument is a 1D boolean array with stride value set to -1:

https://github.com/Vykstorm/numpy-bool-argmax-ext

For those cases, it uses a low-level C routine to find the index k of an item with maximum value (True), starting from the end to the beginning of the array a.
Then you can compute argmax(a[::-1]) with len(a)-k-1

The low-level method doesn't perform any internal ndarray copies because it operates with the array a which is already C-contiguous and aligned in memory. It also applies short-circuit


EDIT: I extended the library to improve the performance argmax also when dealing with stride values different than -1 (with 1D boolean arrays) with good results: a[::2], a[::-3], e.t.c.

Give it a try.


The problem is related with the memory alignment of the array when using strides. Either a[1:-1], a[::-1] are considered aligned in memory but a[::2] dont:

a = np.random.randint(0,2,1000000,bool)

print(a[1:-1].flags.c_contiguous) # True
print(a[::-1].flags.c_contiguous) # False
print(a[::2].flags.c_contiguous) # False

This explains why np.argmax is slow on a[::2] (from documentation on ndarrays):

Several algorithms in NumPy work on arbitrarily strided arrays. However, some algorithms require single-segment arrays. When an irregularly strided array is passed in to such algorithms, a copy is automatically made.

np.argmax(a[::2]) is making a copy of the array. So if you do timeit(lambda: np.argmax(a[::2]), number=5000) you are timing 5000 copies of the array a

Execute this and compare the results of this two timing calls:

print(timeit(lambda: np.argmax(a[::2]), number=5000))

b = a[::2].copy()
print(timeit(lambda: np.argmax(b), number=5000))

EDIT: Digging into the source code in C of numpy, i found the underline implementation of argmax function, PyArray_ArgMax which calls at some point to PyArray_ContiguousFromAny to ensure that the given input array is aligned in memory (C-style)

Then, if the dtype of the array is bool, it delegates to BOOL_argmax function. Looking at its code, seems that short-ciruit is always applied.

Summary

  • In order to avoid copies by np.argmax, make sure that the input array is contiguous in memory
  • short-circuit is always applied when data type is boolean.