Why are KDFs slow? Is using a KDF more secure than using the original secret?

The confusion here is that there are two distinct kinds of key generation function, and people often say "key derivation function" without being explicit which one they mean (or even understanding there are two):

Key-based key derivation functions
Password-based key derivation functions

A key-based derivation function like HKDF presupposes that the inputs may be biased or partially predictable, but that otherwise it has enough min-entropy to be reliably unguessable. The shared secret produced by a Diffie-Hellman exchange is one of the textbook examples.

Password-based functions, on the other hand, assume that the inputs have low entropy, and thus they're designed to impose as high a cost as reasonable to a guessing attacks (without becoming unbearably costly to the honest parties). Slowing down the computation by having a high number of iterations is the classic technique, but newer functions like scrypt and Argon2 go beyond this and aim to be memory-hard:

The canonical algorithm for computing them uses a large, tunable amount of memory;
Any algorithm for computing the function that uses less memory than the canonical one should pay a very high time penalty (adverse time-memory tradeoff).

Not all KDFs are slow! Something like HKDF is extremely fast, and only involves a handful of invocations to the underlying PRF.

KDFs are only slow when they're intended to convert a potentially low-entropy input—like a password—to a high-entropy output such as an encryption key or a password verifier. In this scenario, such functions are designed to be slow in order to add computation time as if the attacker were trying to brute force a secret with higher entropy than the one actually used.

For something like a shared secret after a Curve25519 key exchange, you would generally prefer a fast KDF. For instance, the Noise protocol framework uses HDKF to generate encryption keys from a shared secret derived from curve multiplication. While you can use a raw shared secret as a key directly, most protocols in practice use some form of a KDF to allow for features like forward secrecy.

The reason why you use a KDF, or a secure hash for that matter, on a curve25519 shared key is that the bits are not distributed randomly. You have 32 bytes of "point data" which contain roughly 126 bits of "security".

So... which bits do you choose? Take the first 126 bits and leave the remaining 2 bits of your 128-bit key zero? Or take the last 126 bits? Or just take 128 bits out of the middle? Some other strategy? How do you know you picked the right bits? How do you know there are no exploitable patterns?
Using a secure hash or KDF solves all these issues. Something-something-input gives 128-bits of pretty-much-perfectly-random output (or rather, random-looking). Or, any other amount of bits that you want. You do not waste entropy, you need not worry whether you picked the "good" bits, and you do not risk having possibly exploitable, obvious patterns that come from the ECC's calculations (which are not perfectly "random looking"). Of course, entropy will not "magically" be added if you stretch, but the point is, an external observer cannot tell where it is. The KDF or hash does not need to be slow (and most of the time should not be).

The reason why you use a slow hash or KDF on passwords or any other user input is that anything that comes from a human has embarrassingly poor entropy, and is subject to being brute-forced with the aid of a dictionary (plus obvious permutations). Modern computers can literally do hundreds of millions of simple hashes per second, so that's a problem if your password database gets stolen. The attacker may not break the complete database, but getting a few users' passwords is a matter of a fraction of a second if no deliberately slow function is used. The longer it takes a prospective attacker, the better. More work needed to break a password means you have more of a time window to react and inform users in case of a breach.

The same goes for e.g. access to your encrypted disk or your Keepass file. If an attacker can try 100-200 million passwords per second, you may as well not encrypt at all, it doesn't matter how much care you put into choosing a good password.
If an attacker can try 3-4 passwords per second because that's just how long it takes to run the KDF, your password is basically "unbreakable" because it takes forever to find a match at that rate.

Certainly, this also makes it more expensive for you to unlock your volume. However, you only do it once, an attacker must do it many times.

Why are KDFs slow? Is using a KDF more secure than using the original secret?

Tags:

Key Management

Pbkdf2

Kdf

Related

Recent Posts