Salted hashes vs HMAC?

HMAC is a Message Authentication Code, which is meant for verifying integrity. This is a totally different kind of beast.

However, it so happens that HMAC is built over hash functions, and can be considered as a "keyed hash" -- a hash function with a key. A key is not a salt (keys are secret, salts are not). But the unique characteristics of HMAC make it a reasonable building block for other functions, which is what happens in PBKDF2: that's a key derivation function, commonly subverted into password hashing (a role for which it appears to be adequate). PBKDF2 includes a salt, and internally invokes HMAC (several times).

Good password hashing must be slow (in a configurable way) and include a salt. These characteristics are not easily obtained, and you will not get them out of a lone HMAC (which is fast and does not use a salt).


Short answer:

Kind of, but not really. A salt is simply random data added to the message before it is hashed, with the object of making the hash produced by a salted message different from anything an attacker may have already computed on his own with the same but unsalted message (or with any other salt, for that matter). Usually, salts must be public, in order to legitimately compute the correct hash of a secret message on demand. A "secret salt" for a public message, to produce a tamper-proof authenticated hash would work in theory, but in practice our hash functions are imperfect, and when used in this way they are vulnerable to attacks that add data to the original message and change the hash in a predictable way (a "length extension attack").

An HMAC incorporates additional secret data in the form of a key. This key is combined with the message in a much deeper way than a salt, and may be used in addition to a simple salt of the message. The end result is not vulnerable to the same attack that a simple hash with a secret salt would be, and so is a much stronger method for producing an authenticated hash.

Longer answer:

The basic purpose of a hash, any hash, is to emulate a "random oracle"; a conceptual black box that can be given any message and will produce a truly random, truly unique digest value to identify it, and will produce the same random value if ever given the same message. There is a need for this type of function in many cryptographic applications, all using the random oracle in a subtly different way. Unfortunately, true random oracles don't exist, so hash functions, with known and inherent weaknesses, have to be good enough.

A "hashed salted password" is one use. The idea is that the message (the password) is a secret. The hashed value allows someone to prove to a system that they know the correct password, without the system having to remember the actual password (making it vulnerable). Theoretically, just giving the message to a random oracle and getting back a truly unique value for it would be all that's necessary, but because we don't have random oracles and must use hash functions instead, the number of possible digest values a hash function can produce is finite, so theoretically one could simply find a message, any message, that produces each desired hash, and remember all of these so they can look up an observed hash and find a message that will produce it. Salts make this theory much more difficult, by requiring an attacker to compute a hash of every possible message with every possible salt in order to guarantee that he has the correct combination of message and salt to produce the observed hash. The salt value must be public information, because the correct salt must be used when legitimately hashing a secret message; this makes it different from a key, which must be kept secret to ensure the security of the scheme.

Thomas's observation of password hashing, however, is very correct; passwords chosen by users are inherently low-entropy, and so are relatively easy to guess by brute force, even when salted. To discourage this, additional security is required, which typically comes in the form of "key-stretching"; a computationally-expensive transformation is added, to make producing a single hash slower (not so slow that it would affect legitimate users). The added work for each hash makes repeating the hash calculation hundreds of thousands of times to brute-force a password an untenable situation for an attacker. Salting is still required, because the easiest way to avoid doing this work on-demand is to do it beforehand and remember the result, as with a "faster" hash algorithm.

An HMAC has a subtly different purpose. The idea is that the message may not be secret; therefore, there may well be no secrets involved if a basic hash were used. This is not an unheard-of state of affairs; file checksums are freely available, as are the files used to generate them, but the checksum hash is typically protected against modification. However, if the digest cannot be guaranteed to be "read-only", for instance if it sent along with the message, then the hash has no value because it is trivially changed to match any message it's sent with. A secret must be added back in to the process to ensure that only someone who knows the secret could use the message (or any message) to produce a matching hash. An HMAC combines the message (which can still be salted) with a key, giving us such a secret.

Now, it's fair to ask why a salt value could not simply be kept secret as a key, and concatenated to the message before hashing it with a primitive secure hash function. The answer ties back to the "we don't have random oracles" point. If we did have a random oracle, then we could keep some of the message secret, and the oracle would never produce the same value (or even a predictable one) given the rest of the message, unless we recombined it with the exact secret we have withheld. However, with most of our real-world "secure" hash functions, there is a known vulnerability allowing an attack called "length extension". Given a hash digest of arbitrary length, and knowledge of the hashing primitive that produced it, it is possible in many cases to produce a modified hash that would correspond to appending known data onto the end of the message, without knowing the entire original message (which in our case would include a concatenated or XORed "secret salt"). HMAC was specifically designed to defeat this, by combining the key with the message multiple times in a nested hashing operation, resulting in a digest that a length extension cannot be trivially performed on.