hashcode implementation on boolean fields

You have a couple of options:

Option 1: Bit flagging

The best way to guarantee that there can never be collisions between boolean hashes is to use a technique similar to the one used in bit flagging, whereby you have each boolean occupy its own bit. For example:

// `byte` can be replaced with `short`, `int`, or `long` to fit all of your variables.
byte = 0;
if(bool1) booleans += 1;  // 0001
if(bool2) booleans += 2;  // 0010
if(bool3) booleans += 4;  // 0100
if(bool4) booleans += 8;  // 1000
...

However, this approach quickly becomes inefficient with a large number of booleans and is highly dependent on the size of the target array. For example, if you have a target array of size 16, only the first 4 have an effect on the hash value (since the maximum index is 1111).

The two solutions to this are to either increase the size of your target array (which might not be under your control), or ensure that the order of your booleans goes from most to least variadic. Neither of these are optimal, and so this method is quick and easy, but not very effective in practice.

Option 2: Base-changing hash

The design that Pham Trung shows in his answer expands on Option 1 as an easier way to accomodate multiple fields. As Adrian Shum commented, this answer provides an overview of a "general hashing algorithm" which is designed to be effective independent of what you are trying to hash.

The basic idea is to multiply a simplified hash value for each type by some arbitrarily large prime number to ensure that each hash is unique (though the proof for this eludes me). For example:

int result = 0;
result = 31*result + bool1 ? 1 : 0;
result = 31*result + bool2 ? 1 : 0;
...

For an even more sparse hash distribution, you can combine this with Boolean.hashCode, as the other answers show:

int result = 0;
result += 31*result + bool1.hashCode();
result += 31*result + bool2.hashCode();
...

What's great about this solution is that it can be applied to other types, like you already have in your sample code:

...
result = 31*result + i;
result = 31*result + (a != null ? a.hashCode() : 0);
result = 31*result + my_complex_object.hashCode();

Note: In these examples, 31 is just some arbitrary prime. You could just have easily used 37, 113, or 23456789. However, there are trade-offs for using larger multiplicands, namely that your hash will more quickly exceed Integer.MAX_VALUE and invalidate your hash.

When you have two or even more booleans, the hashcode algorithm is already taken care of that.

Look a little bit closer:

// Very simple example
public int hashCode() {
    int result = 31;

    for(boolean val : booleanList) {
        // This is the important part:
        result = 31 * result + Boolean.hashCode(val);
    }

    return result;
}

Notice the main part of the for loop, in this case, we treat each boolean differently, as we always multiply the result by 31 (or any good prime number) before adding it into the result.

If we visualize the whole hashcode as a number of base 31, so we can understand that the position and the value of each boolean are all taken into account in the final result. Each boolean can be treated as a digit in the hashcode, so for case (true, false) and case (false, true), they will have two different hashcodes.

hashcode implementation on boolean fields

Option 1: Bit flagging

Option 2: Base-changing hash

Tags:

Java

Hashcode

Related

Recent Posts