Specific character based policy for passwords

The fundamental issue is that entropy can only be estimated from the password itself, and that estimate can be very very wrong. The entropy is determined by the password generation method. You can't measure the entropy of the method from a single password.

Let's look at a practical example. I find it easiest to memorize very long passwords generated from a small password space, so I'm going to use numbers only and make it very long. Your algorithm looks at my password, sees that it only contains numbers (aka character set size is 10) and that it is 20 characters long. This gives it an entropy of:

log2(10^20) = 66.4

It passes your test! However let's stop and look at the password:

01234567890123456789

Hmmm... turns out that the actual entropy is pretty much zero.

I could get a lot more technical but in this case I think it's better to keep the answer simple. I believe this example should provide a sufficient answer to your question.


The tests for any policy are:

  • people know about it
  • people understand it
  • people know if they are complying with it
  • people know how to comply with it

Your approach is about 2 out of 4 on that scale for the average user.

The better option is to demand randomly generated passwords. That's easy to understand, easy to implement, and easy to provide processes and tools for ("just use this password manager").

With your approach, you are basically trying to get people to be their own random generator. This is going to result in a lot of trial and error as people try to figure out what password will pass the test. This will result in frustration and confusion.

But that's assuming that you are writing a policy for the average user and assuming your calculation of entropy is valid (which seems beside the point of your question right now, and I have some serious reservations about it).


A key thing to understand when selecting a password policy (or a password) is that entropy isn't a property of the password. It's a property of the method used to generate it. More generally, it's a property of probability distributions that tells us roughly how much additional information you would need to uniquely identify an element drawn from that distribution if you know what the distribution is. I go into a bit more detail in a previous answer if you're interested, but for passwords, this roughly means that if there are 2^n passwords that you might have generated, you have an entropy of n.

If the users generate their own passwords, you can't know what method they used. You can only set policies that make it more likely that the users will select a method that has high entropy. When doing so, you should keep in mind that users will generally find the laziest way of complying with a policy, which is why requiring that a password must contain capital letters and numbers is basically the same as requiring that the first letter be capitalized and that there be a single digit at the end.

The best password policy I've seen is Stanford's, which makes the special character requirements less onerous the longer the password is, to encourage the use of long passphrases instead of Password1$. If the password contains fewer than 12 characters, it requires every sort of character type. This restriction is relaxed as length increases, and once the password contains at least 20 characters, there are no additional restrictions. (There is also no upper bound for password length. Nothing is more annoying than a password policy that forces me to use short passwords in the name of security.) It then suggests randomly selecting 4 words as an easy way to get passwords that long, which is a password generation method with high entropy.

Under this policy, the good approach is also the laziest one, which means the users might actually do it.

Stanford Password Policy