Distinguishing extra element from two arrays?

This can be solved quickly from just the sum and the sum of the squares of the two sequences. And calculating these sums will certainly be faster than the hashes that are suggested, and doesn't involve any comparisons between the sequence items.

Here's how to do it: If the two sets are {ai} and {bi}, then call A and B their sums, and A2 and B2 are the sum of the squares, i.e. A2 = Sum({ai2}), and for convenience, D=A-B, and D2=A2-B2. Therefore, D=a-b and D2=a2-b2, where a and b are the two elements that are different, and from this we see

a = (D2+D2)/(2*D)
b = a - D

This works out because, from algebra, a2-b2=(a+b)(a-b) or D2=(a+b)D, so a+b=D2/D, and since we also know a-b, we can find a and b.

An example in Python may be more convincing

a, b = 5, 22   # the initial unmatched terms
x, y = range(15), range(15)
y[a] = b
print "x =", x  # x = [0, 1, 2, 3, 4,  5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
print "y =", y  # y = [0, 1, 2, 3, 4, 22, 6, 7, 8, 9, 10, 11, 12, 13, 14]

D = sum(x) - sum(y)
D2 = sum([i**2 for i in x]) - sum([i**2 for i in y])  #element-wise squaring
a = (D2+D*D)/(2*D)
b = a - D

print "a=%i, b=%i" % (a, b)
#prints a=5, b=22  which is correct

(Of course, this is somewhat similar to jk's answer, except it doesn't require the multiplication of all the terms and the huge numbers that would result, but thanks to jk for the idea of a mathematical approach.)


If you need this to scale, then I would use one of the many Set implementations in the world. For example, Java's HashSet.

Throw all of the first array in the Set. Then, for each member of the second array, if it is contained in the Set, remove it; otherwise mark it as Unique #2. After this procedure, the last remaining member of the Set is Unique #1.

I'd probably do it this way, even on an interview, and even for simple ten-element arrays. Life is too short to spend trying to find the clever way to scale a wall when there's a perfectly good door in it.


Technically, you can do it in constant time, since the arrays (and values within them) are limited. For the generalized problem, we have to make out something trickier.

Here's a linear-time solution.

First, we should build a hash based on one array. Value lookup in a hash table takes O(1 + k/n) comparisons [1], where k is the length of hash table. Thus iteration over the first array (that contains n elements) and lookups for each value takes O(n+k).

Then we iterate the other one, looking up for each element in the hash. When an element is not found -- that's unique one from the other array. (O(n+k) again). Then we iterate hash to look for the second unique element (O(k)).

Total time is O(n+k). Since it doesn't make sense to let k be more than n, it's the linear solution.

Perl code for that:

sub unique
{
  my ($arr, $brr) = @_;
  my %hash = map{$_ => 1} @$arr;
  %hash{$_}-- for @$brr;
  return grep {$_} keys %hash;
}

Here is a mathematical approach, inspired by Kevin's answer and its comments.

Let's call the arrays A and B and let their unique elements be a and b, respectively. First, take the sums of both arrays and subtract one from the other; since everything else cancels, sum(A) - sum(B) = a - b = s. Then, multiply the elements of both arrays and divide one by the other. Again, things cancel, so mult(A) / mult(B) = a / b = r. Now, from these, we get a = rb, so rb - b = s or b = s / (r - 1) and then a = rs / (r - 1).

I'm calling this mathematical because multiplying things out might not be a reasonable thing to do in a real program. The key is to have two different operations that both individually allow the canceling behavior and so that one distributes over the other. This latter property is used when going from rb - b = s to b = s / (r - 1), and that won't work, say, with addition and XOR, which was my first attempt.