Java, find intersection of two arrays

The simplest solution would be to use sets, as long as you don't care that the elements in the result will have a different order, and that duplicates will be removed. The input arrays array1 and array2 are the Integer[] subarrays of the given int[] arrays corresponding to the number of elements that you intend to process:

Set<Integer> s1 = new HashSet<Integer>(Arrays.asList(array1));
Set<Integer> s2 = new HashSet<Integer>(Arrays.asList(array2));
s1.retainAll(s2);

Integer[] result = s1.toArray(new Integer[s1.size()]);

The above will return an Integer[], if needed it's simple to copy and convert its contents into an int[].


With duplicate elements in array finding intersection.

    int [] arr1 = {1,2,2,2,2,2,2,3,6,6,6,6,6,6,};
    int [] arr2 = {7,5,3,6,6,2,2,3,6,6,6,6,6,6,6,6,};

    Arrays.sort(arr1);
    Arrays.sort(arr2);
    ArrayList result = new ArrayList<>();
    int i =0 ;
    int j =0;
    while(i< arr1.length && j<arr2.length){
    if (arr1[i]>arr2[j]){
        j++;

    }else if (arr1[i]<arr2[j]){
        i++;

    }else {
        result.add(arr1[i]);
        i++;
        j++;
    }
    }
    System.out.println(result);

If you are fine with java-8, then the simplest solution I can think of is using streams and filter. An implementation is as follows:

public static int[] intersection(int[] a, int[] b) {
    return Arrays.stream(a)
                 .distinct()
                 .filter(x -> Arrays.stream(b).anyMatch(y -> y == x))
                 .toArray();
}

General test

The answers provide several solutions, so I decided to figure out which one is the most effective.

Solutions

  • HashSet based by Óscar López
  • Stream based by Bilesh Ganguly
  • Foreach based by Ruchira Gayan Ranaweera
  • HashMap based by ikarayel

What we have

  • Two String arrays that contain 50% of the common elements.
  • Every element in each array is unique, so there are no duplicates

Testing code

public static void startTest(String name, Runnable test){
    long start = System.nanoTime();
    test.run();
    long end = System.nanoTime();
    System.out.println(name + ": " + (end - start) / 1000000.  + " ms");
}
With use:
startTest("HashMap", () -> intersectHashMap(arr1, arr2));
startTest("HashSet", () -> intersectHashSet(arr1, arr2));
startTest("Foreach", () -> intersectForeach(arr1, arr2));
startTest("Stream ", () -> intersectStream(arr1, arr2));

Solutions code:

HashSet
public static String[] intersectHashSet(String[] arr1, String[] arr2){
    HashSet<String> set = new HashSet<>(Arrays.asList(arr1));
    set.retainAll(Arrays.asList(arr2));
    return set.toArray(new String[0]);
}
Stream
public static String[] intersectStream(String[] arr1, String[] arr2){
    return Arrays.stream(arr1)
            .distinct()
            .filter(x -> Arrays.asList(arr2).contains(x))
            .toArray(String[]::new);
}
Foreach
public static String[] intersectForeach(String[] arr1, String[] arr2){
    ArrayList<String> result = new ArrayList<>();
    for(int i = 0; i < arr1.length; i++){
        for(int r = 0; r < arr2.length; r++){
            if(arr1[i].equals(arr2[r]))
                result.add(arr1[i]);
        }
    }
    return result.toArray(new String[0]);
}
HashMap
public static String[] intersectHashMap(String[] arr1, String[] arr2){
    HashMap<String, Integer> map = new HashMap<>();
    for (int i = 0; i < arr1.length; i++)
        map.put(arr1[i], 1);

    ArrayList<String> result = new ArrayList<>();
    for(int i = 0; i < arr2.length; i++)
        if(map.containsKey(arr2[i]))
            result.add(arr2[i]);
    return result.toArray(new String[0]);
}

Testing process


Let's see what happens if we give the methods an array of 20 elements:

HashMap: 0.105 ms
HashSet: 0.2185 ms
Foreach: 0.041 ms
Stream : 7.3629 ms

As we can see, the Foreach method does the best job. But the Stream method is almost 180 times slower.


Let's continue the test with 500 elements:

HashMap: 0.7147 ms
HashSet: 4.882 ms
Foreach: 7.8314 ms
Stream : 10.6681 ms

In this case, the results have changed dramatically. Now the most efficient is the HashMap method.


Next test with 10 000 elements:

HashMap: 4.875 ms
HashSet: 316.2864 ms
Foreach: 505.6547 ms
Stream : 292.6572 ms

The fastest is still the HashMap method. And the Foreach method has become quite slow.


Results

If there are < 50 elements, then it is best to use the Foreach method. He strongly breaks away in speed in this category.

In this case, the top of the best will look like this:

  1. Foreach
  2. HashMap
  3. HashSet
  4. Stream - Better not to use in this case

But if you need to process big data, then the best option would be use the HashMap based method.

So the top of the best look like this:

  1. HashMap
  2. HashSet
  3. Stream
  4. Foreach