Fast way to find length and start index of repeated elements in array

Here is a pedestrian try, solving the problem by programming the problem.

We prepend and also append a zero to A, getting a vector ZA, then detect the 1 islands, and the 0 islands coming in alternating manner in the ZA by comparing the shifted versions ZA[1:] and ZA[-1]. (In the constructed arrays we take the even places, corresponding to the ones in A.)

import numpy as np

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]

Some sample runs:

In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2,  6, 10]), array([3, 2, 1]))

In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0,  5,  9, 13, 15]), array([3, 3, 2, 1, 1]))

In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))

In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0,  2,  5,  7, 11, 14]), array([1, 2, 1, 3, 2, 3]))

Let's try unique:

_, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)

# your expected output:
idx, counts

Output:

(array([ 2,  6, 10]), array([3, 2, 1]))

You can use the fact that the indexes of '1s' provide all information you need. It's enough to find starts and ends of series of '1s'.

A = np.concatenate(([0], A, [0]))  #  get rid of some edge cases
diff = np.argwhere((A[:-1] + A[1:]) == 1).ravel()
starts = diff[::2]
ends = diff[1::2]
    
print(starts, ends - starts)

Tags:

Python

Numpy