Confused about (password) entropy

The Wikipedia article explains mathematical entropy, which isn't identical to what people mean when they talk about password entropy. Password entropy is more about how hard it is to guess a password under certain assumptions which is different from the mathematical concept of entropy.

A and B are not different concepts of password entropy, they're just using different assumptions as how a password is built.

A treats correcthorsebatterystaple as a string of English words and assumes that words are randomly selected from a collection of 2048 words. Based on these assumptions each word gives exactly 11 bits of entropy and 44 bits of entropy for correcthorsebatterystaple.

B treats correcthorsebatterystaple as a string of characters and assumes that the probability of any character to appear is the same as it is in the English language. Based on these assumptions correcthorsebatterystaple has 84 bits of entropy.

So which definition you use really depends on what assumptions you make about the password. If you assume the password is an XKCD-style password (and that each word indeed has a chance of one in 2048 to appear in the password) then A is the correct way to calculate entropy. If you don't assume the password is built as a collection of words but do assume that the probability of any character to appear to be equal to the probability of it's appearance in the English language then B is the correct way to calculate entropy.

In the real world none of these assumptions are correct. So if you have a "requirement that specifies that a string needs to have 20 bits of entropy" and this is for user generated passwords it's very difficult to give a precise definition of entropy. For more on this see Calculating password entropy?.

If, on the other hand, you can use computer generated strings (and are using a good PRNG) then each alphanumeric character (a-z, A-Z, 0-9) will give almost 6 bits of entropy.


What it means

Coin toss entropy assumes that from one toss to the next, the result of the previous toss will not affect the result of the next toss. So, each toss adds one bit of entropy.

Shannon entropy assumes that the value of the next letter is in fact partially determined by the value of the previous letter (and perhaps others). Facts like "h" often follows "t" and "e" often follow "h" are taken into consideration so common patterns are assigned a lower entropy value. So with an english dictionary, the string the would have a much lower Shannon entropy value than the string exu.

What it means to you

The direct implication of this with respect to passwords is pretty insignificant. The real (and only) important question with respect to passwords is this:

What dictionary is your password in?

That is to say, if you were to construct a list of potential passwords to conduct a brute-force attack, how big would the dictionary have to be to contain your password?

For example:

  • Your password is in the top 500 most commonly-used passwords
  • Your password is in the dictionary of lowercase English words
  • Your password is in the list of lowercase or title-case English words with a one-digit or two-digit suffix
  • Your password is in the list of random-case English words with haxor numeric substitutions (i.e. A=>4, L=>1, S=>5)
  • Your password is in the list of all strings 8 characters or less using numbers and upper and lower case letters.

All of the above are examples of frequently used real-world password cracking dictionaries.

In other words

The purpose of password complexity is to stand up against a brute-force attack. The size of the smallest available dictionary that contains your password determines the amount of time required to crack your password. We can guess at what dictionaries will be available to the attacker, but we can't know for certain. Therefore, as a proxy for dictionary size, we instead use entropy. It's a poor substitute because it doesn't reflect the actual attack mechanics, but it's potentially better than nothing.

Comparisons of passwords based on entropy calculations may potentially be fruitful, but you should be careful to avoid ascribing too much value to a number which is, in the end, only indirectly related to how well the password will hold up.


I suppose the simplest way to illustrate it is with an example.

Let's say we have a random number generator has a provable output entropy of 3 bits per digit of output. That generator's "toss" entropy is 3 bits. Now, let's say you run that for 20 digits, and despite the ridiculously small probability, every number in the stream comes out as 6. The "toss" entropy is still 3 bits per digit, so 60 bits. The actual "result" entropy of the password is tiny - one could argue that it's as low as 3 or 4 bits.

The difference is that the "toss" entropy represents the expected entropy of the output, based on probabilistic modelling of the generator, whereas the "result" entropy represents the actual information entropy of the data it produced in a real case.