Calculating password entropy?

There are equations for when the password is chosen randomly and uniformly from a given set; namely, if the set has size N then the entropy is N (to express it in bits, take the base-2 logarithm of N).

For instance, if the password is a sequence of exactly 8 lowercase letters, such that all sequences of 8 lowercase characters could have been chosen and no sequence was to be chosen with higher probability than any other, then entropy is N = 268 = 208827064576, i.e. about 37.6 bits (because this value is close to 237.6).

Such a nice formula works only as long as uniform randomness occurs, and, let's face it, uniform randomness cannot occur in the average human brain. For human-chosen passwords, we can only do estimates based on surveys (have a look at that for some pointers).

What must be remembered is that entropy qualifies the password generation process, not the password itself. By definition, "password meter" applications and Web sites do not see the process, only the result, and uniformly return poor results (e.g. they will tell you that "BillClinton" is a good password). When the process is an in-brain one, anything goes.

(I generate my passwords with a computer, not with my head, and I encourage people to do the same.)


Joseph Bonneau from the University of Cambridge has done extensive research in the area of user chosen passwords. In a recent paper (PDF) Bonneau proposed using "statistical metrics for individual password strength". In this paper he describes

several possible metrics for measuring the strength of an individual password or any other secret drawn from a known, skewed distribution. In contrast to previous ad hoc approaches which rely on textual properties of passwords, we consider the problem without any knowledge of password structure. This enables rating the strength of a password given a large sample distribution without assuming anything about password semantics

When we talk about the entropy of a password, we're really interested in how hard it is to guess it. Bonneau's paper describes how this can be measured based on statistical information of actual passwords.


From a purely combinatorial mathematical point of view 0123456789 is no more less weak than any other 10 character string. Such equations that you are referring to are based in combinatorial math.

However from a statistical point of view it's weaker because people commonly use it as it is easier to remember, therefore attackers building common password dictionaries include those ones first and as a such it's likely to be one of the first passwords the attacker cracks as it's near the beginning of the list for him to try. You could create some slightly more complex equations or just say, well, as you only used numerical digits even though I allow more I'm going to calculate the strength using just numerical digits as the character set. This will help estimate the statistical issues but will not perfectly match the real situation.

You could also actually check against a password cracking dictionary and check if word is in there and how close the most similar word is, but this then only gives a strength based on that particular dictionary and another attacker would use a different dictionary.