Find duplicate element in array in time O(n)

This can be done in O(n) time and O(1) space.

(The algorithm only works because the numbers are consecutive integers in a known range):

In a single pass through the vector, compute the sum of all the numbers, and the sum of the squares of all the numbers.

Subtract the sum of all the numbers from N(N-1)/2. Call this A.

Subtract the sum of the squares from N(N-1)(2N-1)/6. Divide this by A. Call the result B.

The number which was removed is (B + A)/2 and the number it was replaced with is (B - A)/2.

Example:

The vector is [0, 1, 1, 2, 3, 5]:

  • N = 6

  • Sum of the vector is 0 + 1 + 1 + 2 + 3 + 5 = 12. N(N-1)/2 is 15. A = 3.

  • Sum of the squares is 0 + 1 + 1 + 4 + 9 + 25 = 40. N(N-1)(2N-1)/6 is 55. B = (55 - 40)/A = 5.

  • The number which was removed is (5 + 3) / 2 = 4.

  • The number it was replaced by is (5 - 3) / 2 = 1.

Why it works:

  • The sum of the original vector [0, ..., N-1] is N(N-1)/2. Suppose the value a was removed and replaced by b. Now the sum of the modified vector will be N(N-1)/2 + b - a. If we subtract the sum of the modified vector from N(N-1)/2 we get a - b. So A = a - b.

  • Similarly, the sum of the squares of the original vector is N(N-1)(2N-1)/6. The sum of the squares of the modified vector is N(N-1)(2N-1)/6 + b2 - a2. Subtracting the sum of the squares of the modified vector from the original sum gives a2 - b2, which is the same as (a+b)(a-b). So if we divide it by a - b (i.e., A), we get B = a + b.

  • Now B + A = a + b + a - b = 2a and B - A = a + b - (a - b) = 2b.


We have the original array int A[N]; Create a second array bool B[N] too, of type bool=false. Iterate the first array and set B[A[i]]=true if was false, else bing!


You can do it in O(N) time without any extra space. Here is how the algorithm works :

Iterate through array in the following manner :

  1. For each element encountered, set its corresponding index value to negative. Eg : if you find a[0] = 2. Got to a[2] and negate the value.

    By doing this you flag it to be encountered. Since you know you cannot have negative numbers, you also know that you are the one who negated it.

  2. Check if index corresponding to the value is already flagged negative, if yes you get the duplicated element. Eg : if a[0]=2 , go to a[2] and check if it is negative.

Lets say you have following array :

int a[]  = {2,1,2,3,4};

After first element your array will be :

int a[] = {2,1,-2,3,4};

After second element your array will be :

int a[] = {2,-1,-2,3,4};

When you reach third element you go to a[2] and see its already negative. You get the duplicate.