Count inversions in two arrays

You can find inversions between two arrays using merge sort idea!

consider you have two arrays of same size, lets call them A, B if we denote first half of A, A1 and second half of A, A2 and B1 and B2 respectively for B then we can conclude that answer is sum of:

  1. inversions between A1 and B1
  2. inversions between A2 and B2
  3. inversions between A1 and B2

first two elements can be supported by calling the function recursively, but how to calculate third element?

the idea is to go through A1 and B2 from left to right, any where element in B1 is greater than element in A1 , then elements in A1 which are not visited yet should be add up to number of inversions, and at the end we just sort A1 and A2 to A and B1 and B2 to B

here is the code in python:

def find_inv(A, B):

if len(A) <= 1:
    return 0
mid = len(A) // 2
A1 = A[:mid]
A2 = A[mid:]
B1 = B[:mid]
B2 = B[mid:]

if len(A1) >= 1 and len(B2) >= 1:
    ans = find_inv(A1, B1) + find_inv(A2, B2)
else:
    A.sort()
    B.sort()
    ans = 0
len_A = len(A1)
index_A = 0
len_B = len(B2)
index_B = 0

for k in range(len_A + len_B):

    if A1[index_A] <= B2[index_B]:

        index_A += 1
        if index_A == len_A:
            merge(A1, A2, A)
            merge(B1, B2, B)

            return ans
    else:
        index_B += 1
        ans += (len_A - index_A)

        if index_B == len_B:
            merge(A1, A2, A)
            merge(B1, B2, B)
            return ans

def merge(A1, A2, dest):
i = 0
j = 0


while i < len(A1) and j < len(A2):
    if A1[i] < A2[j]:
        dest[i+j] = A1[i]
        i += 1
    else:
        dest[i+j] = A2[j]
        j += 1
while i < len(A1):
        dest[i+j] = A1[i]
        i += 1

while j < len(A2):
        dest[i+j] = A2[j]
        j += 1

I've written in the past about how to count inversions using a Fenwick tree, which is a very efficient type of binary tree that lets you compute prefix aggregations on a sequence.

Here is an adhoc modifcation for your scenario:

long long inversions(const vector<int>& a, const vector<int>& b) {
  int n = a.size();
  vector<int> values(a);
  for (int x: b) values.push_back(x);
  sort(begin(values), end(values));
  vector<int> counts(2*n + 1);
  long long res = 0;
  for (int i = n - 1; i >= 0; --i) {
    // compute sum of prefix 1..rank(a[i]) - 1
    for (int v = lower_bound(begin(values), end(values), a[i]) - begin(values);
         v; 
         v -= v & -v)
      res += counts[v];
    //add 1 to point rank(b[i])
    for (int v = lower_bound(begin(values), end(values), b[i]) - begin(values) + 1;
         v <= 2*n;
         v += v & -v)
      counts[v]++;
  }
  return res;
}

Basically we walk through the arrays from right to left, maintaining a data structure that represents the values of a we have already seen in the suffix. For every element b[i], we add to the final result the number of of elements x in the data structure with x <= b[i] - 1. Then we add a[i] to the data structure.

The array values is used to compress the range of values to 1..2n because Fenwick trees take space linear in the range size. We could avoid that step by choosing a more fullfeatured data structure like a balanced bjnary search tree with subtree size augmentation.

The complexity is O(n log n), and the constant factor is very low.