Numpy finding interval which has a least k points

After a bit of struggle I came up with this solution.

First a bit of explanations, and order of thoughts:

  • Ideally we would want to set a window size and slide it from the most left acceptable point until the most right acceptable point, and start counting when min_points are in the window, and finish count when min_points no longer inside it (imagine it as a convultion oprtator or so)
  • the basic pitfall is that we want to discrete the sliding, so the trick here is to check only when amount of points can fall under or up higher than min_points, which means on every occurance of element or window_size below it (as optional_starts reflects)
  • then to iterate over optional_starts and sample the first time condition mets, and the last one that condition mets for each interval

so the following code was written as described above:

def consist_at_least(start, points, min_points, window_size):
    a = [point for point in points if start <= point <= start + window_size]
    return len(a)>=min_points
    


points = [1.4,1.8,   11.3,11.8,12.3,13.2,  18.2,18.3,18.4,18.5]
min_points = 4
window_size = 3
total_interval = [0,20]
optional_starts = points + [item-window_size for item in points if item-window_size>=total_interval[0]] + [total_interval[0] + window_size] + [total_interval[1] - window_size] + [total_interval[0]]
optional_starts = [item for item in optional_starts if item<=total_interval[1]-window_size]
intervals = []
potential_ends = []
for start in sorted(optional_starts):
    is_start_interval = len(intervals)%2 == 0
    if consist_at_least(start, points, min_points, window_size):
        if is_start_interval:
            intervals.append(start)
        else:
            potential_ends.append(start)
    elif len(potential_ends)>0 :
        intervals.append(potential_ends[-1])
        potential_ends = []
if len(potential_ends)>0:
    intervals.append(potential_ends[-1])

print(intervals)

output:

[10.2, 11.3, 15.5, 17]

Each 2 consequtive elements reflects start and end of interval


So, after additional information were given regarding the nature of the "intervals", I propose the following solution, which assumes inter-interval distances of at least window_size:

import numpy as np


def get_start_windows(inter, ws, p, mp):

    # Initialize list of suitable start ranges
    start_ranges = []

    # Determine possible intervals w.r.t. to window size
    int_start = np.insert(np.array([0, p.shape[0]]), 1,
                          (np.argwhere(np.diff(p) > ws) + 1).squeeze()).tolist()

    # Iterate found intervals
    for i in np.arange(len(int_start)-1):

        # The actual interval
        int_ = p[int_start[i]:int_start[i+1]]

        # If interval has less than minimum points, reject
        if int_.shape[0] < mp:
            continue

        # Determine first and last possible starting point
        first = max(inter[0], int_[mp-1] - ws)
        last = min(int_[-mp], inter[1] - ws)

        # Add to list of suitable start ranges
        start_ranges.append((first, last))

    return start_ranges


# Example 1
interval = [0, 20]
window_size = 3.0
min_points = 4
points = [1.4, 1.8, 11.3, 11.8, 12.3, 13.2, 18.2, 18.3, 18.4, 18.5]
print(get_start_windows(interval, window_size, np.array(points), min_points))

# Example 2
points = [1.4, 1.8, 1.9, 2.1, 11.3, 11.8, 12.3, 13.2, 18.2, 18.3, 18.4, 18.5]
print(get_start_windows(interval, window_size, np.array(points), min_points))

# Example 3
points = [1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 3.49]
print(get_start_windows(interval, window_size, np.array(points), min_points))

(Code might be optimized, I didn't pay attention to that...)

Output:

[(10.2, 11.3), (15.5, 17.0)]
[(0, 1.4), (10.2, 11.3), (15.5, 17.0)]
[(0, 1.9)]

Hopefully, the desired cases are covered by that solution.

-------------------------------------
System information
-------------------------------------
Platform:   Windows-10-10.0.16299-SP0
Python:     3.8.5
NumPy:      1.19.2
-------------------------------------