How is hashCode() calculated in Java

A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.

The value returned by hashCode() is by no means guaranteed to be the memory address of the object. I'm not sure of the implementation in the Object class, but keep in mind most classes will override hashCode() such that two instances that are semantically equivalent (but are not the same instance) will hash to the same value. This is especially important if the classes may be used within another data structure, such as Set, that relies on hashCode being consistent with equals.

There is no hashCode() that uniquely identifies an instance of an object no matter what. If you want a hashcode based on the underlying pointer (e.g. in Sun's implementation), use System.identityHashCode() - this will delegate to the default hashCode method regardless of whether it has been overridden.

Nevertheless, even System.identityHashCode() can return the same hash for multiple objects. See the comments for an explanation, but here is an example program that continuously generates objects until it finds two with the same System.identityHashCode(). When I run it, it quickly finds two System.identityHashCode()s that match, on average after adding about 86,000 Long wrapper objects (and Integer wrappers for the key) to a map.

public static void main(String[] args) {
    Map<Integer,Long> map = new HashMap<>();
    Random generator = new Random();
    Collection<Integer> counts = new LinkedList<>();

    Long object = generator.nextLong();
    // We use the identityHashCode as the key into the map
    // This makes it easier to check if any other objects
    // have the same key.
    int hash = System.identityHashCode(object);
    while (!map.containsKey(hash)) {
        map.put(hash, object);
        object = generator.nextLong();
        hash = System.identityHashCode(object);
    }
    System.out.println("Identical maps for size:  " + map.size());
    System.out.println("First object value: " + object);
    System.out.println("Second object value: " + map.get(hash));
    System.out.println("First object identityHash:  " + System.identityHashCode(object));
    System.out.println("Second object identityHash: " + System.identityHashCode(map.get(hash)));
}

Example output:

Identical maps for size:  105822
First object value: 7446391633043190962
Second object value: -8143651927768852586
First object identityHash:  2134400190
Second object identityHash: 2134400190

If you want to know how they are implmented, I suggest you read the source. If you are using an IDE you can just + on a method you are interested in and see how a method is implemented. If you cannot do that, you can google for the source.

For example, Integer.hashCode() is implemented as

   public int hashCode() {
       return value;
   }

and String.hashCode()

   public int hashCode() {
       int h = hash;
       if (h == 0) {
           int off = offset;
           char val[] = value;
           int len = count;

           for (int i = 0; i < len; i++) {
               h = 31*h + val[off++];
           }
           hash = h;
       }
       return h;
   }

The hashCode() method is often used for identifying an object. I think the Object implementation returns the pointer (not a real pointer but a unique id or something like that) of the object. But most classes override the method. Like the String class. Two String objects have not the same pointer but they are equal:

new String("a").hashCode() == new String("a").hashCode()

I think the most common use for hashCode() is in Hashtable, HashSet, etc..

Java API Object hashCode()

Edit: (due to a recent downvote and based on an article I read about JVM parameters)

With the JVM parameter -XX:hashCode you can change the way how the hashCode is calculated (see the Issue 222 of the Java Specialists' Newsletter).

HashCode==0: Simply returns random numbers with no relation to where in memory the object is found. As far as I can make out, the global read-write of the seed is not optimal for systems with lots of processors.

HashCode==1: Counts up the hash code values, not sure at what value they start, but it seems quite high.

HashCode==2: Always returns the exact same identity hash code of 1. This can be used to test code that relies on object identity. The reason why JavaChampionTest returned Kirk's URL in the example above is that all objects were returning the same hash code.

HashCode==3: Counts up the hash code values, starting from zero. It does not look to be thread safe, so multiple threads could generate objects with the same hash code.

HashCode==4: This seems to have some relation to the memory location at which the object was created.

HashCode>=5: This is the default algorithm for Java 8 and has a per-thread seed. It uses Marsaglia's xor-shift scheme to produce pseudo-random numbers.

How is hashCode() calculated in Java

Tags:

Java

Hashcode

Related

Recent Posts