Java HashMap performance optimization / alternative

As many people pointed out the hashCode() method was to blame. It was only generating around 20,000 codes for 26 million distinct objects. That is an average of 1,300 objects per hash bucket = very very bad. However if I turn the two arrays into a number in base 52 I am guaranteed to get a unique hash code for every object:

public int hashCode() {       
    // assume that both a and b are sorted       
    return a[0] + powerOf52(a[1], 1) + powerOf52(b[0], 2) + powerOf52(b[1], 3) + powerOf52(b[2], 4);
}

public static int powerOf52(byte b, int power) {
    int result = b;
    for (int i = 0; i < power; i++) {
        result *= 52;
    }
    return result;
}

The arrays are sorted to ensure this methods fulfills the hashCode() contract that equal objects have the same hash code. Using the old method the average number of puts per second over blocks of 100,000 puts, 100,000 to 2,000,000 was:

168350.17
109409.195
81344.91
64319.023
53780.79
45931.258
39680.29
34972.676
31354.514
28343.062
25562.371
23850.695
22299.22
20998.006
19797.799
18702.951
17702.434
16832.182
16084.52
15353.083

Using the new method gives:

337837.84
337268.12
337078.66
336983.97
313873.2
317460.3
317748.5
320000.0
309704.06
310752.03
312944.5
265780.75
275540.5
264350.44
273522.97
270910.94
279008.7
276285.5
283455.16
289603.25

Much much better. The old method tailed off very quickly while the new one keeps up a good throughput.


One thing I notice in your hashCode() method is that the order of the elements in the arrays a[] and b[] don't matter. Thus (a[]={1,2,3}, b[]={99,100}) will hash to the same value as (a[]={3,1,2}, b[]={100,99}). Actually all keys k1 and k2 where sum(k1.a)==sum(k2.a) and sum(k1.b)=sum(k2.b) will result in collisions. I suggest assigning a weight to each position of the array:

hash = hash * 5381 + (c0*a[0] + c1*a[1]);
hash = hash * 5381 + (c0*b[0] + c1*b[1] + c3*b[2]);

where, c0, c1 and c3 are distinct constants (you can use different constants for b if necessary). That should even out things a bit more.