String hashCode() documentation vs implementation

The implementation is correct, with the caveat that integer overflow may occur (which is ok here, it doesn't harm anything). It's using Horner's method for polynomial evaluation.

Here's the steps on a sample string "CAT".

h = 0

First loop:

i = 0
h = 31 * 0 + 'C' (67) = 67

Second loop:

i = 1
h = 31 * 67 + 'A' (65) = 2142

Third loop:

i = 2
h = 31 * 2142 + 'T' (84) = 66486

Let's derive the formula from the code. Here, n is the index of i into the string s. Each iteration of the for loop performs this formula.

hn = 31hn-1 + sn

h0 /* after loop i = 0 */ = s[0]
h1 /* after loop i = 1 */ = 31*h0 + s[1] = 31*s[0] + s[1]
h2 /* after loop i = 2 */ = 31*h1 + s[2] = 31*(31*s[0] + s[1]) + s[2]
h = 31*31*s[0] + 31*s[1] + s[2]

The exponents you see for the powers of 31 arise because each loop multiplies in another factor of 31 before adding the value of the next character.


It is easiest to see what happens with some example. Let's take a String s of length n and all notation as above. We will analyze the loop iteration for iteration. We will call h_old the value h has at the beginning of the current iteration and h_new the value h has at the end of the current iteration. It is easy to see that h_new of iteration i will be h_old of iteration i + 1.

╔═════╦════════════════════════════╦═════════════════════════════════════════════════╗
║ It. ║ h_old                      ║ h_new                                           ║
╠═════╬════════════════════════════╬═════════════════════════════════════════════════╣
║ 1   ║ 0                          ║ 31*h_old + s[0] =                               ║
║     ║                            ║          s[0]                                   ║
║     ║                            ║                                                 ║
║ 2   ║ s[0]                       ║ 31*h_old + s[1] =                               ║
║     ║                            ║ 31      *s[0] +          s[1]                   ║
║     ║                            ║                                                 ║
║ 3   ║ 31  *s[0] +    s[1]        ║ 31^2    *s[0] + 31      *s[1] +    s[2]         ║
║     ║                            ║                                                 ║
║ 4   ║ 31^2*s[0] + 31*s[1] + s[2] ║ 31^3    *s[0] + 31^2    *s[1] + 31*s[2] + s[3]  ║
║ :   ║ :                          ║ :                                               ║
║ n   ║ ...                        ║ 31^(n-1)*s[0] + 31^(n-2)*s[1] + ... + 31^0*s[n] ║
╚═════╩════════════════════════════╩═════════════════════════════════════════════════╝

(Table generated with Senseful)

The powers of 31 are created through the loop and the constant multiplication of h with 31 (making use of the distributivity of the multiplication). As we can see in the last row of the table, this is exactly what the documentation said it would be.