Is using haveibeenpwned to validate password strength rational?

"Strong" has always had the intention of meaning "not guessable". Length and complexity help to make a password more "not guessable", but a long, complex, but commonly used password is just as weak as Pa$$w0rd.

If a password is in the HIBP list, then attackers know that the password has a higher likelihood of being chosen by people, hence, might be used again. So those lists will be hit first.

So, if your password is on the list, then it is "guessable".

If your password is not on the list, then from a dictionary attack approach, it is less guessable and not what others have chosen, and by implication (for as much as that's worth), is "less guessable". Many other factors, of course, can make your password "more guessable", even if it is not on the HIBP list.

As always, a randomly generated password is the most "unguessable" and a maximum length and randomly generated password is extremely difficult to bruteforce. And if you are randomly generating it, then why not go max length?


To answer this question properly, you need to think like the hacker who wants to work out your password.

But to avoid having to dive straight into a mathsy way of thinking, let's start instead by thinking about a competitor on the Lego Movie game show "Where are my pants?"

Obviously, when the competitor wants to find their clothes, the first thing they'll do is go to their wardrobe. If that doesn't prove fruitful, they might check their drawers, followed by the chair in the corner of the room, followed by the laundry basket, and perhaps the dog's basket if the dog is of the naughty pants-stealing sort. That'll all happen before they start looking in the fridge.

What's going on here is of course that the competitor will look in the most likely places first. They could have systematically worked through every square foot of the house in a grid, in which case they would on average have to check half the house. On the other hand with this strategy they have a good chance of getting it on the first go, and certainly wouldn't expect to cover half the house.

A hacker ideally wants to do the same thing. Suppose they know that the password they are after is 8 lowercase letters long. They could try working through them one at a time, but there are 208,827,064,576 possible options, so a given completely random guess has about a 1 in 208 billion chance of being right. On the other hand, it's well known that "password" is the most common password. (except when it's banned) In fact looking at the data from haveibeenpwned, the chance of the right answer being "password" is about 1 in 151. Not 151 billion, just 151. So that's over a billion times more likely than some random guess, and they'd be stupid not to start with it. (And obviously, since you want your password not to be found, you want to avoid picking what they'd start with)

Now, the question is whether that generalises beyond "password." Is it worth their while working through a list of leaked passwords? For a bit of information, consider this quote from the original release write up.

I moved on to the Anti Public list which contained 562,077,488 rows with 457,962,538 unique email addresses. This gave me a further 96,684,629 unique passwords not already in the Exploit.in data. Looking at it the other way, 83% of the passwords in that set had already been seen before.

What that tells us is that, roughly speaking, a randomly selected password has a better than 80% chance of featuring in the list. The list has a few hundred million entries, compared with a few hundred billion options for random 8 letter passwords. So, roughly speaking our hacker trying 8 letter passwords would have a 0.1% chance without the list in the time they could get an 80% chance with the list. Obviously they'd want to use it. And again, you might as well avoid it. After all, you still have hundreds of billions of options to choose from, and you can get thousands of billions by just going to nine letters!

That's the justification for checking the list.

Now your first worry is that "there will always be very easy to crack passwords that aren't on the list." That may be true. For example, "kvym" is not on the list. It's only 4 letters. There are only half a million passwords that are 4 lowercase letters or shorter, so if people are likely to prefer short passwords then a hacker would blaze through them in a fraction of the time it would take to finish the leaks list. It's likely that they'd try both.

The answer to that is obvious. Use both rules. Don't use a password that has appeared in a breach, and don't use a password that is very short. If you have a random password of any significant length, you have more than enough options that a hacker has no shortcut way to find.


It's definitely one of your validation steps, but can't be fully relied on.

Given the fact that most users reuse passwords, and build passwords using a relatively small base of words, a dictionary attack is a particularly effective means of guessing passwords. Since HIBP is regularly updated, it will have many passwords in frequent use, and thus probable candidates that a dictionary attacker would try. Thus, it is a good starting point to check. However, just because your password is not in the list, it doesn't mean your password won't be guessed easily. It's just that known passwords would be high on their list of passwords to try along with text mined from the internet, combinations of words with digits/symbols, transpositions, etc. As more password leaks happen, HIBP and other such tools become more useful, and hackers' lists of passwords to try become more effective to them as well.

I was quite surprised to see some passwords I know are quite easily guessed and are definitely being used in multiple sites, not on the HIBP list, so I can vouch for it not being the determinant of password strength (just like the example in the question). However, if I have come up with what I think is a strong password, and it's on the list, I would definitely not use it.