Using XOR operator for finding duplicate elements in a array fails in many cases

From original question:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice.

It basically says, that algorithm only works when you have consecutive integers, starting with 1, ending with some N.

If you want to modify it to more general case, you have to do following things:

Find minimum and maximum in array. Then calculate expected output (xor all integers between minimum and maximum). Then calculate xor of all elements in array. Then xor this two things and you get an output.


Remember these two properties of XOR operator :

(1) If you take xor of a number with 0 ( zero ) , it would return the same number again.

Means , n ^ 0 = n

(2) If you take xor of a number with itself , it would return 0 ( zero ).

Means , n ^ n = 0

Now , Coming to the problem :

   Let    Input_arr = { 23 , 21 , 24 , 27 , 22 , 27 , 26 , 25 }    

   Output should be 27 ( because 27 is the duplicate element in the Input_arr ).

Solution :

Step 1 : Find “min” and “max” value in the given array. It will take O(n).

Step 2 : Find XOR of all integers from range “min” to “max” ( inclusive ).

Step 3 : Find XOR of all elements of the given array.

Step 4 : XOR of Step 2 and Step 3 will give the required duplicate number.

Description :

Step1 : min = 21 , max = 27

Step 2 : Step2_result = 21 ^ 22 ^ 23 ^ 24 ^ 25 ^ 26 ^ 27 = 20

Step 3 : Step3_result = 23 ^ 21 ^ 24 ^ 27 ^ 22 ^ 27 ^ 26 ^ 25 = 15

Step 4 : Final_Result = Step2_result ^ Step3_result = 20 ^ 15 = 27

But , How Final_Result calculated the duplicate number ?

Final_Result = ( 21 ^ 22 ^ 23 ^ 24 ^ 25 ^ 26 ^ 27 ) ^ ( 23 ^ 21 ^ 24 ^ 27 ^ 22 ^ 27 ^ 26 ^ 25 )

Now , Remember above two properties : n ^ n = 0 AND n ^ 0 = n

So , here ,

Final_Result = ( 21 ^ 21 ) ^ ( 22 ^ 22 ) ^ ( 23 ^ 23 ) ^ ( 24 ^ 24 ) ^ ( 25 ^ 25 ) ^ ( 26 ^ 26 ) ^ ( 27 ^ 27 ^ 27 )

             = 0 ^ 0 ^ 0 ^ 0 ^ 0 ^ 0 ^ ( 27 ^ 0 ) ( property applied )

             = 0 ^ 27 ( because we know 0 ^ 0 = 0 )

             = 27 ( Required Result )

A XOR statement has the property that 'a' XOR 'a' will always be 0, that is they cancel out, thus, if you know that your list has only one duplicate and that the range is say x to y, 601 to 607 in your case, it is feasible to keep the xor of all elements from x to y in a variable, and then xor this variable with all the elements you have in your array. Since there will be only one element which will be duplicated it will not be cancelled out due to xor operation and that will be your answer.

void main()
{
    int a[8]={601,602,603,604,605,605,606,607};
    int k,i,j=601;

    for(i=602;i<=607;i++)
    {
        j=j^i;
    }

    for(k=0;k<8;k++)
    {
        j=j^a[k];
    }

    printf("%d",j);
}

This code will give the output 605, as desired!