How can I encrypt data with a password, but allow said password to be reset

When you say that you are willing to reset the password using some piece of automated authentication, then what you're really saying is that there are two passwords: the "normal" password and the "authentication" password. The solution to your problem is to encrypt the file using a random key and then encrypt the key using each of the passwords.

As a concrete example:

  • User provides a password "Aw3som1"
  • User also provides his high school mascot: "the Chipmunks"

Just to make it really complete, let's assume your authentication scheme is very forgiving like many are. You would accept just "chipmunks" (or "CHIPMUNKS" or maybe even "chipmunk") rather than "the Chipmunks." But whatever it is, your scheme must be deterministic. Every possible answer you will accept must resolve to the same hash. In this case, I'm going to assume that you lowercase the security answer, remove articles, and reduce to singular if it's plural. Then you prepend the class of question. So your secondary password is something like "mascot:chipmunk".

You now make up a random 16-bytes and use that key to encrypt the data. You then use normal password-based encryption techniques (e.g. PBKDF2) to encrypt the key with each of your passwords above. Then you throw away the key and the passwords.

When you need to do a reset, decrypt the real key with the provided password ("mascot:chipmunk") and re-encrypt the key with the new password (and "mascot:chipmunk" again).

The one usability issue is that a password reset invalidates all the other security answers, and the user must reconfigure them all. If that's a problem, you could put all the security answers into a bundle and encrypt that using the same technique as the data. (i.e. the security answers are encrypted against all of the security answers.)

This approach of course creates two (or more) passwords that can unlock the data, and so dramatically drops brute-force search time. You should consider that when scaling things. That said, your safety margins should generally be several orders of magnitude, so even a few passwords should be workable for many situations. Remember also that the security questions live in a tiny key space, particularly things like "mascot" or "make of car" which probably only have a few dozen likely values (unless you went to my high school which had a truly bizarre mascot…) That just means that aggressively tuning PBKDF2 is even more important.

Of course the existence of any kind of password reset system is going to make targeted attacks much easier. But that's true no matter how you implement the encryption.


But what if there isn't even a security question? What if you'll reset the password based on an email address alone? Well, then the email address is the password. That's problematic in the face of a stolen database, and it's hard to fix because there is no real secret (and encryption requires a secret). But all is not completely lost. One thing you can do is to never include the actual email address in the database. Think of the email address as a password (since it is in this case). Store a hash. You can't use a random salt on it, but you could still use some single salt for the whole database.

So my email is [email protected]. You treat that as a password, salt it with "mygreatservice" and run it through PBKDF2. To login, just treat it like a password again.

An attacker, though, has to guess email addresses one at a time. That's not incredibly hard, but he can't just decrypt the entire database in one go, and he'll never get the data from emails he doesn't guess. (And you did choose a really easy reset mechanism.)

Tags:

Encryption