why IdentityHashMap uses linear probing for collision resolution

When you build an identity hash map, there is no chance of finding two instances that are equal to each other yet are not the same object. It also uses System.identityHashCode, which has a chance of collisions that is known upfront to the designers of IdentityHashMap, and is known to be very small. Under these "laboratory" conditions, linear probing appears to be a better choice in terms of performance.

I suspect that the reason the designers of the class library used chaining rather than linear probing in "regular" hash maps is their desire to maintain decent performance even when hash functions are suboptimal.

This may shed some light (taken from the Oracle website):

Implementation note: This is a simple linear-probe hash table, as described for example in texts by Sedgewick and Knuth. The array alternates holding keys and values. (This has better locality for large tables than does using separate arrays.) For many JRE implementations and operation mixes, this class will yield better performance than HashMap (which uses chaining rather than linear-probing).

Although chaining may be better for most implementations, it is not so for every implementation.

EDIT Also found this, perhaps it's less trivial (taken from here):

The motivation for using probing is that it is somewhat faster than following a linked list, but that is only true when a reference to the value can be placed directly in the array. That isn't practical for all other hash-based collections, because they store the hash code as well as the value. This is for reasons of efficiency: a get operation must check whether it has found the right key, and since equality is an expensive operation, it makes sense to check first whether it even has the right hash code. Of course, this reasoning doesn't apply to IdentityHashMap, which checks object identity rather than object equality.

As background/clarification, an IdentityHashMap differs from an ordinary HashMap in that two keys are considered equal only if they are physically the same object: identity, rather than equals, is used for key comparison.

EDIT: discussion that helps in finding the answer (from comments below):

Trying:

but that is only true when a reference to the value can be placed directly in the array. That isn't practical for all other hash-based collections, because they store the hash code as well as the value. I have a doubt that why can't hashMap put the key, value and hash code in the array and use linear probing if linked list traversal is more costly then direct array?

wlyles:

likely because of space usage. That would take up more data in each slot. And I should point out that, while traversal is less costly for linear probing, the total find operation could be more costly (and less predictable) because linear probing is often plagued by clustering, where many keys have the same hash value. As said by @delnan in another comment, For example, if keys 1..20 hash to consecutive slots, and the 21st hashes to the same slot as the 1st, lookup for it (or for a not-present key that hashes to the 1st slot) needs 20 probes. Using a list would take fewer probes. For further clarification: because of the way that IdentityHashMap compares key values, the chance of collisions is very small. Thus, the main weakness of linear probing - collisions that lead to clumping - is largely avoided, making it more desirable in this implementation.

For further clarification: because of the way that IdentityHashMap compares key values, the chance of collisions is very small. Thus, the main weakness of linear probing - collisions that lead to clumping - is largely avoided, making it more desirable in this implementation

why IdentityHashMap uses linear probing for collision resolution

Tags:

Algorithm

Java

Collections

Data Structures

Map

Related

Recent Posts