Finding multiple entries with binary search

Well, as the list is sorted, all the entries you are interested in are contiguous. This means you need to find the first item equal to the found item, looking backwards from the index which was produced by the binary search. And the same about last item.

You can simply go backwards from the found index, but this way the solution may be as slow as O(n) if there are a lot of items equal to the found one. So you should better use exponential search: double your jumps as you find more equal items. This way your whole search is still O(log n).


I would do two binary searches, one looking for the first element comparing >= the value (in C++ terms, lower_bound) and then one searching for the first element comparing > the value (in C++ terms, upper_bound). The elements from lower_bound to just before upper bound are what you are looking for (in terms of java.util.SortedSet, subset(key, key)).

So you need two different slight modifications to the standard binary search: you still probe and use the comparison at the probe to narrow down the area in which the value you are looking for must lie, but now e.g. for lower_bound if you hit equality, all you know is that the element you are looking for (the first equal value) is somewhere between the first element of the range so far and the value you have just probed - you can't return immediately.


If I'm following your question, you have a list of objects which, for the purpose of comparison, look like {1,2,2,3,4,5,5,5,6,7,8,8,9}. A normal search for 5 will hit one of objects that compare as 5, but you want to get them all, is that right?

In that case, I'd suggest a standard binary search which, upon landing on a matching element, starts looking left until it stops matching, and then right (from the first match) again until it stops matching.

Be careful that whatever data structure you are using is not overwriting elements that compare to the same!

Alternatively, consider using a structure that stores elements that compare to the same as a bucket in that position.


First let's recall the naive binary search code snippet:

int bin_search(int arr[], int key, int low, int high)
{
    if (low > high)
        return -1;

    int mid = low + ((high - low) >> 1);

    if (arr[mid] == key) return mid;
    if (arr[mid] > key)
        return bin_search(arr, key, low, mid - 1);
    else
        return bin_search(arr, key, mid + 1, high);
}

Quoted from Prof.Skiena: Suppose we delete the equality test if (s[middle] == key) return(middle); from the implementation above and return the index low instead of −1 on each unsuccessful search. All searches will now be unsuccessful, since there is no equality test. The search will proceed to the right half whenever the key is compared to an identical array element, eventually terminating at the right boundary. Repeating the search after reversing the direction of the binary comparison will lead us to the left boundary. Each search takes O(lgn) time, so we can count the occurrences in logarithmic time regardless of the size of the block.

So, we need two rounds of binary_search to find the lower_bound (find the first number no less than the KEY) and the upper_bound (find the first number bigger than the KEY).

int lower_bound(int arr[], int key, int low, int high)
{
    if (low > high)
        //return -1;
        return low;

    int mid = low + ((high - low) >> 1);
    //if (arr[mid] == key) return mid;

    //Attention here, we go left for lower_bound when meeting equal values
    if (arr[mid] >= key) 
        return lower_bound(arr, key, low, mid - 1);
    else
        return lower_bound(arr, key, mid + 1, high);
}

int upper_bound(int arr[], int key, int low, int high)
{
    if (low > high)
        //return -1;
        return low;

    int mid = low + ((high - low) >> 1);
    //if (arr[mid] == key) return mid;

    //Attention here, we go right for upper_bound when meeting equal values
    if (arr[mid] > key) 
        return upper_bound(arr, key, low, mid - 1);
    else
        return upper_bound(arr, key, mid + 1, high);
}

Hope it's helpful :)