Java: Getting a unique hash value of an object

// Very important edit...

Gjorgji, I know you accepted the answer below as correct, but I have found it to be incorrect.

If you have a class like this:

class tiny {
    int a;
    public int hashCode() { return a; }
}

You have already maxed out all possible hash codes. (If this isn't clear why, please say so.)

So, if you add ANY more information to the object, if you want that information represented in the hashCode, you're going to have a collision somewhere.

But, for that matter, you don't really want to set out to get a hashCode that's 100% unique to an object. That's really not the point of hashCode!

The point of hashCode is to give you a "unique enough" identifier to the object so you can place it in a hash bucket. It's not for identification so much as it is for classification. The idea is if you have a whole bunch of objects, you're probably not going to have many collisions, so you are probably going to have pretty fast access to what you're looking for if you grouped items by their hashCode.

If this means you deselect my answer as correct, that's okay. It really isn't correct for what you're looking for. My hope is that you realize this explanation of hashCode leads you to the correct usage, thereby keeping the correctness. But as Mark clearly pointed out, this doesn't actually solve the issue you stated.

Below is the old answer:

===========================================================

A good article on it is found here, from Effective Java (hands down the best "I want to learn how to be a good Java developer" book out there.)

http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ313_029.htm

class Gjorgji {
    boolean a;
    boolean b;
    boolean c;
    int x;
    int y;

    // EDIT: I almost forgot a VERY important rule...
    // WHEN YOU OVERRIDE hashCode, OVERRIDE EQUALS (and vice versa)
    public int equals(Object o) {
        if(!(o instanceof Gjorgji) return false;
        Gjorgji g = (Gjorgji)o;
        return a == g.a && b == g.b && c == g.c && x == g.x && y == g.y;

    }

    public int hashCode() {
        int hash = x ^ y;
        hash *= a ? 31 : 17; // pick some small primes
        hash *= b ? 13 : 19;
        hash *= c ? 11 : 29;
        return hash;
    }

}

This is not possible in general, you must guarantee that if a.equals(b), then a.hashCode() == b.hashCode(). You cannot guarantee the reverse: you can always have collisions because the hashCode method only has a 32bit space and your JVM can have a 64bit space for identity hashcodes.


You can do this if you can limit the number of instances of your class to under 232. Here's one way:

import java.util.concurrent.atomic.AtomicInteger;

class UniqueHash {
    private static AtomicInteger NEXT_HASH_CODE = new AtomicInteger();
    private final int hashCode;

    UniqueHash() {
        while (true) {
            int nextHashCode = NEXT_HASH_CODE.get();
            if (nextHashCode == -1) {
                throw new RuntimeException("Too many instances!");
            }
            if (NEXT_HASH_CODE.compareAndSet(nextHashCode, nextHashCode + 1)) {
                hashCode = nextHashCode;
                break;
            }
        }
    }

    public int hashCode() {
        return hashCode;
    }
}

Edit 1: this was assuming that by "a == b" you meant a == b in the sense of object identity. You mention in the comments that you actually mean if the fields are equal. See the responses by @Mark Peters and @sjr.

Edit 2: fixed bug pointed out by @Tom Hawtin - tackline, left other bad practice in place. :)

Edit 3: there was a race in my "fix". Fixed the race.

Tags:

Java