How to search a MySQL database with encrypted fields

Obviously they are not meant to be viewed, therefore searching on them would be problematic.

One trick I have used in the past is to hash the encrypted data before encrypting it, and storing the hash in an indexed column. Of course, this only works if you are searching on the whole value; partial values will not have the same hash.

You could probably extend this by making a "full text" index of hashes, if you needed to, but it could get complicated really fast.

ADDENDUM

It's been suggested that I add a footnote to my answer per a fairly lengthy debate in chat about vulnerability to dictionary attacks, so I will discuss this potential security risk to the above approach.

Dictionary Attack: A dictionary attack is when someone pre-hashes a list of known values, and compares the hashes to your hashed column in the database. If they can find a match, it's likely that the known value is actually what is being hashed (It's not definite though, because hashes are not guaranteed to be unique). This is usually mitigated by hashing the value with a random "salt" appended or prepended so the hash will not match the dictionary, but the above answer cannot use a salt because you lose the searchability.

This attack is dangerous when dealing with things like passwords: if you create a dictionary of popular password hashes, you can then quickly search the table for that hash value and identify a user that has such a password and effectively extract credentials to steal that user's identity.

It is less dangerous for items with a high degree of cardinality, like SSN's, credit card numbers, GUIDs, etc. (but there are different risks [read: legal] associated with storing these, so I am not inclined to advise on storing them).

The reason for this is in order for a dictionary attack to work, you need to have pre-built a dictionary of possible values and their hashes. You could, in theory, build a dictionary of all possible SSNs (a billion rows, assuming all formatting permutations are removed; multiple dozens of trillions of entries for credit cards)... but that's not usually the point of a dictionary attack, and basically becomes comparable to a brute-force attack where you are systematically investigating every value.

You could also look for a specific SSN or credit card number, if you're trying to match a SSN to a person. Again, usually not the point of a dictionary attack, but possible to do, so if this is a risk you need to avoid, my answer is not a good solution for you.

So there you have it. As with all encrypted data, it's usually encrypted for a reason, so be aware of your data and what you are trying to protect it from.


You may want to take a look at CryptDB. It's a front end for MySQL and PostgreSQL that allows transparent storage and querying of encrypted data. It works by encrypting and decrypting data as it passes between the application and the database, rewriting queries to operate on the encrypted data. and by dynamically adjusting the encryption mode of each column to expose only as much information as needed for the queries the application uses.

The various encryption methods used by CryptDB include:

  • RND, a fully IND-CPA secure encryption scheme which leaks no information about the data (except its presence and, for variable-length types, length) but only allows storage and retrieval, no queries.

  • DET, a variant of RND which is deterministic, so that two identical values (in the same column) encrypt to the same ciphertext. Supports equality queries of the form WHERE column = 'constant'.

  • OPE, an order-preserving encryption scheme that supports inequality queries such as WHERE column > 'constant'.

  • HOM, a partially homomorphic encryption scheme (Paillier) which allows adding encrypted values together by multiplying the ciphertexts. Supports SUM() queries, addition and incrementing.

  • SEARCH, a scheme that supports keyword searches of the form WHERE column LIKE '% word %'.

  • JOIN and OPE-JOIN, variants of DET and OPE that allow values in different columns to be compared with each other. Support equality and range joins respectively.

The real power of CryptDB is that it adapts the encryption method of each column dynamically to the queries it sees, so that the slower and/or less secure schemes are only used for columns which require them. There are also various other useful features, such as chaining encryption keys to user passwords.

If you're interested, you're well advised to take a look at the papers linked from the CryptDB website, particularly "CryptDB: Protecting Confidentiality with Encrypted Query Processing" by Popa, Redfield, Zeldovich and Balakrishnan (SOSP 2011). Those papers also describe the various security and performance tradeoffs involved in supporting different query types in more detail.


I don't understand why the current answers haven't questioned the requirements fully, so I'll ask and leave it as an answer.

What are the business reasons? What data do you need to encrypt and why? If you're looking for PCI compliance, I could write an essay.

Questions about your requirement:

  • Will you need to return a exists/not exists as a result, or the actual data?
  • Do you require a LIKE '%OMG_SEKRIT%' capability?
  • Who cannot see the data and why?

RDBMS security is normally done on a permissions basis that is enforced by user/role. The data is normally encrypted by the RDBMS on disk, but not in the columnar data itself, as that doesn't really make any sense for an application designed to efficiently store and retrieve data.

Restrict by user/role/api. Encrypt on disk. If you're storing more important data I'd love to know why you're using MySQL.

Tags:

Mysql