codility absolute distinct count from an array

You should have pay attention to the fact that the array is sorted in ascending order.

Lets assume that there are only positive numbers, or the question was not about absolute distinct.

The you could count the Number by iterating trough the list, and increment the counter by one, if the actual element is different from the last. (and +1 for the first element)

If you understand that, you can add the absolute distinct constraint. For example by improving the algorithm by two pointers, one starting from the begin, one from the end. Then you have also to take care that both pointers works like in parallel, so that both pointers will end at 0 or the absoulte lowest number (positive/negative) - This will complicate the whole stuff a bit, but it is possible.


Here is a simple solution for this.

public class test{

public static int dis(Integer[] arr) {
    out.println(Arrays.asList(arr));
    if (arr.length == 0) {
        return 0;
    }
    int i = 0;
    int j = arr.length - 1;
    int c = 0;
    while (i <= j) {
        if ((j != arr.length - 1) && (Math.abs(arr[j]) == Math.abs(arr[j + 1]))) {
            out.println("skipping J==" + j);
            j--; continue;
        }
        if ((i != 0) && (Math.abs(arr[i]) == Math.abs(arr[i - 1]))) {
            out.println("skipping I==" + i);
            i++; continue;
        }
        if (Math.abs(arr[i]) < Math.abs(arr[j])) {
            j--;
            c++;
        }
        else if (Math.abs(arr[i]) > Math.abs(arr[j])) {
            i++; c++;
        }
        else {
            i++; j--; c++;
        }

        out.println("I=" + i + " J=" + j + " C=" + c);
    }
    return c;
}




public static void main(String[] arg){

//Integer[] a = {34,2,3,4,3,-2,3};
//out.println("distinct elements are" + dis(a));
Integer[] aa={-5,-3,0,1,3};
out.println("distinct elements count " + dis(aa));
Integer[] ab={-5,-3,0,1,3, 4, 6, 7};
out.println("distinct elements count " + dis(ab));
Integer[] ac={-5};
out.println("distinct elements count " + dis(ac));
Integer[] acc={9};
out.println("distinct elements count " + dis(acc));
Integer[] ad={9,9,9};
out.println("distinct elements count " + dis(ad));
Integer[] ae={-5,-5};
out.println("distinct elements count " + dis(ae));
Integer[] aee={1,5,5,5,5};
out.println("distinct elements count " + dis(aee));
Integer[] af={-9, -6, -5, -5, -5, -5, -3, -3, 0, 0, 1, 5, 6, 7, 7, 8};
out.println("distinct elements count " + dis(af));

}

}

out put is

[-5, -3, 0, 1, 3]
distinct elements count 4
[-5, -3, 0, 1, 3, 4, 6, 7]
distinct elements count 7
[-5]
distinct elements count 1
[9]
distinct elements count 1
[9, 9, 9]
distinct elements count 1
[-5, -5]
distinct elements count 1
[1, 5, 5, 5, 5]
distinct elements count 2
[-9, -6, -5, -5, -5, -5, -3, -3, 0, 0, 1, 5, 6, 7, 7, 8]
distinct elements count 8

If the array is sorted you can find duplicates by looking a neightbours. To compare absolute values to need to start at both the start and the end. This avoid creating a new structure.

EDIT: IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function. I would have thought not creating object which be much much faster but appears to be only 4x fast on my machine.

In summary, you can see that using a Set is simpler, clearer and easier to maintain. It is still very fast and would be the best solution in 98% of cases.

public static void main(String[] args) throws Exception {
    for (int len : new int[]{100 * 1000 * 1000, 10 * 1000 * 1000, 1000 * 1000, 100 * 1000, 10 * 1000, 1000}) {
        int[] nums = new int[len];
        for (int i = 0; i < len; i++)
            nums[i] = (int) (Math.random() * (Math.random() * 2001 - 1000));
        Arrays.sort(nums);

        long timeArray = 0;
        long timeSet = 0;
        int runs = len > 1000 * 1000 ? 10 : len >= 100 * 1000 ? 100 : 1000;
        for (int i = 0; i < runs; i++) {
            long time1 = System.nanoTime();
            int count = countDistinct(nums);
            long time2 = System.nanoTime();
            int count2 = countDistinctUsingSet(nums);
            long time3 = System.nanoTime();
            timeArray += time2 - time1;
            timeSet += time3 - time2;
            assert count == count2;
        }
        System.out.printf("For %,d numbers, using an array took %,d us on average, using a Set took %,d us on average, ratio=%.1f%n",
                len, timeArray / 1000 / runs, timeSet / 1000 / runs, 1.0 * timeSet / timeArray);
    }
}

private static int countDistinct(int[] nums) {
    int lastLeft = Math.abs(nums[0]);
    int lastRight = Math.abs(nums[nums.length - 1]);
    int count = 0;
    for (int a = 1, b = nums.length - 2; a <= b;) {
        int left = Math.abs(nums[a]);
        int right = Math.abs(nums[b]);
        if (left == lastLeft) {
            a++;
            lastLeft = left;
        } else if (right == lastRight) {
            b--;
            lastRight = right;
        } else if (lastLeft == lastRight) {
            a++;
            b--;
            lastLeft = left;
            lastRight = right;
            count++;
        } else if (lastLeft > lastRight) {
            count++;
            a++;
            lastLeft = left;
        } else {
            count++;
            b--;
            lastRight = right;
        }
    }
    count += (lastLeft == lastRight ? 1 : 2);
    return count;
}

private static int countDistinctUsingSet(int[] nums) {
    Set<Integer> s = new HashSet<Integer>();
    for (int n : nums)
        s.add(Math.abs(n));
    int count = s.size();
    return count;
}

prints

For 100,000,000 numbers, using an array took 279,623 us on average, using a Set took 1,270,029 us on average, ratio=4.5

For 10,000,000 numbers, using an array took 28,525 us on average, using a Set took 126,591 us on average, ratio=4.4

For 1,000,000 numbers, using an array took 2,846 us on average, using a Set took 12,131 us on average, ratio=4.3

For 100,000 numbers, using an array took 297 us on average, using a Set took 1,239 us on average, ratio=4.2

For 10,000 numbers, using an array took 42 us on average, using a Set took 156 us on average, ratio=3.7

For 1,000 numbers, using an array took 8 us on average, using a Set took 30 us on average, ratio=3.6


On @Kevin K's point, even Integer can have collision even through it's hash values are unique, it can map to the same bucket as the capacity is limited.

public static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

public static void main(String[] args) throws Exception {
    Map<Integer, Integer> map = new HashMap<Integer, Integer>(32, 2.0f);
    for (int i = 0; i < 10000 && map.size() < 32 * 2; i++) {
        if (hash(i) % 32 == 0)
            map.put(i, i);
    }
    System.out.println(map.keySet());
}

prints

[2032, 2002, 1972, 1942, 1913, 1883, 1853, 1823, 1763, 1729, 1703, 1669, 1642, 1608, 1582, 1548, 1524, 1494, 1456, 1426, 1405, 1375, 1337, 1307, 1255, 1221, 1187, 1153, 1134, 1100, 1066, 1032, 1016, 986, 956, 926, 881, 851, 821, 791, 747, 713, 687, 653, 610, 576, 550, 516, 508, 478, 440, 410, 373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]

The values are in reverse order because the HashMap has generated into a LinkedList.