HMAC and key size

An HMAC is a very simple construction. You could have come up with it yourself, except that HMAC is standardised which means that a lot of people looked at it and found it to be resistant against attacks. Because of the simplicity, we can just dive into the details:

hmac = hash(
    (key xor opad) +
    hash((key xor ipad) xor message)
)

So it's just some hash functions and xorring values together, nothing else. Block ciphers typically need a fixed-length key, but cryptographic hash functions "[map] data of arbitrary size to a bit string of a fixed size (a hash)". So the key that you input, your password, can be of any size. The hash() function in the snippet above can be any good cryptographic hash function, like SHA-3 or BLAKE2b[1].

Of course, longer typically gives more strength. A seven-character password, even if it is random and has symbols and digits and everything, can always be broken with modern computers. But the strength is not so much in the length, as in the method of generating it: a 22-character password that is just the string "mary had a little lamb" is not as secure as a 14-character password that was randomly generated. So be careful with how randomly you choose your passwords: a random phrase is one of a few billion options. A randomly generated seven character password already has 47 trillion possible values, and even that is insecure, though quite hard to remember for humans, so user passwords (memorized secrets) are likely to be weaker than that.

When you ask the user for a password, always run it through a "slow hash" function first. See a question like How to securely hash passwords for advice on the right algorithm and parameters to use. Only after you did that, should you use the password for anything. So before you use it as key for an HMAC, it is already strengthened and that random seven-character password would be impossible to break for the foreseeable future. This way, attacks cannot be done at the speed of that hash() function like SHA-3, but have to be done at the speed of your slow hash, which will not be billions of guesses per second but hundreds. Going through 47 trillion options at a rate of a hundred per second takes some time!


[1] I don't mention SHA-2 because it's slowly getting old: it's vulnerable to length extension attacks, whereas SHA-3 and BLAKE2b are not. HMAC prevents length extension attacks, so HMAC with SHA-2 is fine, but I'd still rather recommend newer algorithms (so long as they are battle-tested; they're not that new) than something I know is vulnerable for an attack in a slightly different configuration.


You might want to take a look at the RFC 2104:

2. Definition of HMAC

The definition of HMAC requires a cryptographic hash function, which we denote by H, and a secret key K. We assume H to be a cryptographic hash function where data is hashed by iterating a basic compression function on blocks of data.

We denote by B the byte-length of such blocks (B=64 for all the above mentioned examples of hash functions), and by L the byte-length of hash outputs (L=16 for MD5, L=20 for SHA-1). The authentication key K can be of any length up to B, the block length of the hash function. Applications that use keys longer than B bytes will first hash the key using H and then use the resultant L byte string as the actual key to HMAC. In any case the minimal recommended length for K is L bytes (as the hash output length). See section 3 for more information on keys.

3. Keys

The key for HMAC can be of any length (keys longer than B bytes are first hashed using H). However, less than L bytes is strongly discouraged as it would decrease the security strength of the function. Keys longer than L bytes are acceptable but the extra length would not significantly increase the function strength. (A longer key may be advisable if the randomness of the key is considered weak.)

Keys need to be chosen at random (or using a cryptographically strong pseudo-random generator seeded with a random seed), and periodically refreshed. (Current attacks do not indicate a specific recommended frequency for key changes as these attacks are practically infeasible. However, periodic key refreshment is a fundamental security practice that helps against potential weaknesses of the function and keys, and limits the damage of an exposed key.)

At the top of the page you even get a Link to RFC 6151, which might be interesting as well:

2. Security Considerations

MD5 was published in 1992 as an Informational RFC. Since that time, MD5 has been extensively studied and new cryptographic attacks have been discovered. Message digest algorithms are designed to provide collision, pre-image, and second pre-image resistance. In addition, message digest algorithms are used with a shared secret value for message authentication in HMAC, and in this context, some people may find the guidance for key lengths and algorithm strengths in [SP800-57] and [SP800-131] useful.

MD5 is no longer acceptable where collision resistance is required such as digital signatures. It is not urgent to stop using MD5 in other ways, such as HMAC-MD5; however, since MD5 must not be used for digital signatures, new protocol designs should not employ HMAC-MD5. Alternatives to HMAC-MD5 include HMAC-SHA256 [HMAC] [HMAC-SHA256] and [AES-CMAC] when AES is more readily available than a hash function.