Find the number of elements greater than x in a given range

If we know all queries before hand, we can solve this problem by making use of Fenwick tree.

First, we need to sort all elements in array and queries together, based on their values.

So, assuming that we have array [5, 4, 2, 1, 3] and queries (0, 1, 6) and (2, 5, 2), we will have following result after sorting : [1, 2, 2, 3 , 4 , 5, 6]

Now, we will need to process each element in descending order:

  • If we encounter an element which is from the array, we will update its index in the Fenwick tree, which take O(log n)

  • If we encounter a queries, we need to check, in this range of the query, how many elements have been added in the tree, which take O(log n).

For above example, the process will be:

  1st element is a query for value 6, as Fenwick tree is empty -> result is 0
  2nd is element 5 -> add index 0 into Fenwick tree
  3rd element is 4 -> add index 1 into tree.
  4th element is 3 -> add index 4 into tree.
  5th element is 2 -> add index 2 into tree.
  6th element is query for range (2, 5), we query the tree and get answer 2.
  7th element is 1 -> add index 3 into tree. 
  Finish.

So, in total, the time complexity for our solution is O((m + n) log(m + n)) with m and n is the number of queries and number of element from input array respectively.


Previous answer describes an offline solution using Fenwick tree, but this problem could be solved online (and even when doing updates to the array) with slightly worse complexity. I'll describe such a solution using segment tree and AVL tree (any self-balancing BST could do the trick).

First lets see how to solve this problem using segment tree. We'll do this by keeping the actual elements of the array in every node by range that it covers. So for array A = [9, 4, 5, 6, 1, 3, 2, 8] we'll have:

[9   4   5   6   1   3   2   8] Node 1
[9   4   5   6] [1   3   2   8] Node 2-3
[9   4] [5   6] [1   3] [2   8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15

Since height of our segment tree is log(n) and at every level we keep n elements, total amount of memory used is n log(n).

Next step is to sort these arrays which looks like this:

[1   2   3   4   5   6   8   9] Node 1
[4   5   6   9] [1   2   3   8] Node 2-3
[4   9] [5   6] [1   3] [2   8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15

NOTE: You first need to build the tree and then sort it to keep the order of elements in original array.

Now we can start our range queries and that works basically the same way as in regular segment tree, except when we find a completely overlapping interval, we then additionally check for number of elements greater than X. This can be done with binary search in log(n) time by finding the index of first element greater than X and subtracting it from number of elements in that interval.

Let's say our query was (0, 5, 4), so we do a segment search on interval [0, 5] and end up with arrays: [4, 5, 6, 9], [1, 3]. We then do a binary search on these arrays to see number of elements greater than 4 and get 3 (from first array) and 0 (from second) which brings to total of 3 - our query answer.

Interval search in segment trees can have up to log(n) paths, which means log(n) arrays and since we're doing binary search on each of them, brings complexity to log^2(n) per query.

Now if we wanted to update the array, since we are using segment trees its impossible to add/remove elements efficiently, but we can replace them. Using AVL trees (or other binary trees that allow replacement and lookup in log(n) time) as nodes and storing the arrays, we can manage this operation in same time complexity (replacement with log(n) time).


That is possible only if you got the array sorted. In that case binary search the smallest value passing your condition and compute the count simply by sub-dividing your index range by its found position to two intervals. Then just compute the length of the interval passing your condition.

If array is not sorted and you need to preserve its order you can use index sort . When put together:

  1. definitions

    Let <i0,i1> be your used index range and x be your value.

  2. index sort array part <i0,i1>

    so create array of size m=i1-i0+1 and index sort it. This task is O(m.log(m)) where m<=n.

  3. binary search x position in index array

    This task is O(log(m)) and you want the index j = <0,m) for which array[index[j]]<=x is the smallest value <=x

  4. compute count

    Simply count how many indexes are after j up to m

     count = m-j;
    

As you can see if array is sorted you got O(log(m)) complexity but if it is not then you need to sort O(m.log(m)) which is worse than naive approach O(m) which should be used only if the array is changing often and cant be sorted directly.

[Edit1] What I mean by Index sort

By index sort I mean this: Let have array a

a[] = { 4,6,2,9,6,3,5,1 }

The index sort means that you create new array ix of indexes in sorted order so for example ascending index sort means:

a[ix[i]]<=a[ix[i+1]]

In our example index bubble sort is is like this:

// init indexes
a[ix[i]]= { 4,6,2,9,6,3,5,1 }
ix[]    = { 0,1,2,3,4,5,6,7 }
// bubble sort 1st iteration
a[ix[i]]= { 4,2,6,6,3,5,1,9 }
ix[]    = { 0,2,1,4,5,6,7,3 }
// bubble sort 2nd iteration
a[ix[i]]= { 2,4,6,3,5,1,6,9 }
ix[]    = { 2,0,1,5,6,7,4,3 }
// bubble sort 3th iteration
a[ix[i]]= { 2,4,3,5,1,6,6,9 }
ix[]    = { 2,0,5,6,7,1,4,3 }
// bubble sort 4th iteration
a[ix[i]]= { 2,3,4,1,5,6,6,9 }
ix[]    = { 2,5,0,7,6,1,4,3 }
// bubble sort 5th iteration
a[ix[i]]= { 2,3,1,4,5,6,6,9 }
ix[]    = { 2,5,7,0,6,1,4,3 }
// bubble sort 6th iteration
a[ix[i]]= { 2,1,3,4,5,6,6,9 }
ix[]    = { 2,7,5,0,6,1,4,3 }
// bubble sort 7th iteration
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
ix[]    = { 7,2,5,0,6,1,4,3 }

So the result of ascending index sort is this:

//      ix: 0 1 2 3 4 5 6 7
a[]     = { 4,6,2,9,6,3,5,1 }
ix[]    = { 7,2,5,0,6,1,4,3 }

Original array stays unchanged only the index array is changed. Items a[ix[i]] where i=0,1,2,3... are sorted ascending.

So now if x=4 on this interval you need to find (bin search) which i has the smallest but still a[ix[i]]>=x so:

//      ix: 0 1 2 3 4 5 6 7
a[]     = { 4,6,2,9,6,3,5,1 }
ix[]    = { 7,2,5,0,6,1,4,3 }
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
//                *
i = 3; m=8; count = m-i = 8-3 = 5;

So the answer is 5 items are >=4

[Edit2] Just to be sure you know what binary search means for this

i=0;  // init value marked by `*`
j=4;  // max power of 2 < m , i+j is marked by `^`
//      ix: 0 1 2 3 4 5 6 7    i j i+j a[ix[i+j]]
a[ix[i]]= { 1,2,3,4,5,6,6,9 }  0 4  4  5>=4          j>>=1;
            *       ^            
a[ix[i]]= { 1,2,3,4,5,6,6,9 }  0 2  2  3< 4 -> i+=j; j>>=1;
            *   ^                  
a[ix[i]]= { 1,2,3,4,5,6,6,9 }  2 1  3  4>=4          j>>=1;
                * ^                  
a[ix[i]]= { 1,2,3,4,5,6,6,9 }  2 0 -> stop 
                *
a[ix[i]] < x -> a[ix[i+1]] >= x -> i = 2+1 = 3 in O(log(m))

so you need index i and binary bit mask j (powers of 2). At first set i with zero and j with the biggest power of 2 still smaller then n (or in this case m). Fro example something like this:

i=0; for (j=1;j<=m;j<<=1;); j>>=1;

Now in each iteration test if a[ix[i+j]] suffice search condition or not. If yes then update i+=j else leave it as is. After that go to next bit so j>>=1 and if j==0 stop else do iteration again. at the end you found value is a[ix[i]] and index is i in log2(m) iterations which is also the number of bits needed to represent m-1.

In the example above I use condition a[ix[i]]<4 so the found value was biggest number still <4 in the array. as we needed to also include 4 then I just increment the index once at the end (I could use <=4instead but was too lazy to rewrite the whole thing again).

The count of such items is then just number of element in array (or interval) minus the i.

Tags:

Algorithm