Passwords - any statistics on user behavior?

The big web password leaks (particularly RockYou, where the leak was of plaintext passwords, so even the strongest passwords are visible) have been analysed several times. See e.g. Imperva and Troy Hunt -- or just get hold of some of the password lists and do your own analysis to calculate entropy etc.

Troy and a group from Cambridge both found significant password re-use across sites of around 70%.

An interesting CISCO, RedJack and Florida State paper gives some stats on the charset mix/entropy of leaked passwords -- longer passwords tend to also have a greater charset mix -- and the effect of password policies such as "must contain a digit" on password strength. This analysis shows that most (>70%) users faced with a "must contain a digit" policy will use a simple numeric pre/suffix, with many of the remainder using l33t-speak substitutions (neither of which provides much protection from JtR-type tools); simimlarly, 30% of passwords containing a "special character" have just one, at the end. The paper also shows that the NIST "entropy model" is a poor indicator of password crackability in the wild, because it fails to account for the use of common words as the basis for the vast majority of passwords.

That paper references another, which showed what we all know -- that password expiration policies result in users making small incremental changes to generate a new password each expiry -- and that this knowledge could be used by attackers to break "new" passwords given a previous one much faster than brute-force or dictionary attacks would allow. That paper tentatively recommends non-expiring passwords with much stricter length/complexity requirements (e.g. a dicewords-style passphrase).

In their OWASP 2011 presentation KoreLogic showed a slide with the "proportion cracked" for various (hashed/encrypted) password leaks, which suggests that less than 10%, and probably <2%, of users have passwords that are complex enough and long enough to resist a combination of dictionary, rainbow and brute-force attacks. We can also infer that brute-force attacks do noticeably worse than rainbow attacks -- the two examples on that slide that include salt have significantly lower proportions cracked than the plain MD5/no-salt cases.

Re: Do people use more secure passwords for their banking etc. accounts:

The KoreLogic analysis indicates that "corporate" passwords are rather more complex than typical "web passwords". This difference appears to be due to typical corporate password policies (e.g. mandating minimum length and charset usage) which both makes the typical password more "complex" but also leads to some commonly repeated password derivation patterns. I don't think we can assume that passwords on banking/financial sites will be any more complex in the absence of corporate-style policy enforcement.

The "blanked out" entry on the Hash EXchange screenshot in the KoreLogic presentation presumably relates to the "unnamed financial site". That might not be a bank, but the 70% cracked proportion gives us an indication that, while users might be using somewhat stronger passwords there (compared with gawker etc.), a large majority still use weak passwords.


This blog post from Troy Hunt gives an interesting analysis based on data from the Sony, Gawker, and other breaches.


One of the sites that seems useful is PasswordResearch.com - they have analysis sorted into:

  • User Password Practices
  • Authentication Policies, Practices, or Procedures
  • Password Lifetime Policies or Practices
  • Password Length Policies or Practices
  • Password Character Usage Policies or Practices
  • Authentication Related Criminal Incidents
  • Opinions on Authentication
  • Market Use of Authentication Technologies
  • Costs Associated with Authentication
  • Authentication Business Impacts