Encrypting IP addresses in a MySQL database

Public / private key encryption is slow. It's ideal for write it & forget it, but it's not a fast process.

For the sake of speed, consider this: use a randomly generated symmetric key that is chosen when the daemon starts and possibly is rotated out every hourly, daily, weekly, or monthly (depending on your traffic and security needs). Leave that key in memory only and encrypt that symmetric key to a database table with the public key of your keypair. Then you can recover the key, but an attacker can't get it from except by dumping the processes memory. Even then, the attacker can only use the most recent key. You will likely want to use something to link the keys to the data or you'll have to brute-force your list of keys against each record. That recording could be as simple as start / stop time if your logs are timestamped, or sequence numbers, or starting encrypted entries with a key id field, etc.

Regardless of what method you use, pad your IP before encrypting it. I see that since you updated your answer this is mentioned, but it's definitely important for anybody referencing this. That will prevent indexing, but it will also prevent any "pigeonhole" issues and the same IP always having the same encrypted output.


Your scheme is a good solution. Nicely done!

A possible performance optimization. If you need a way to make it more efficient, there are ways to improve the efficiency. For instance, here is one. The application can pick a random symmetric key K, encrypt K under the public key, then encrypt the IP address under K, and store the encrypted IP address and the encrypted value of K in the database.

Later when you export the logs, if you want to ban an IP address, since you have the private key, you can recover the corresponding value of K, then decrypt using K to recover the IP address, and add that IP address in cleartext to the ban list.

How does this improve performance? The key trick is that you can reuse the same value of K for many different IP addresses. When your launches, it can pick K, encrypt it under the public key, and cache the value of K and the encryption of K. Now it can reuse that for many log entries. You can periodically clear the cache and create a new K if you want. (Just make sure to store the cached value of K only in memory, not in the database.) This way, you only have to do a single public-key encryption every so often. Each time you want to log a new IP address, just need to do a symmetric-key encryption of it under K, which should be very efficient.

Why the optimization may be unnecessary. That said, if you choose your public-key encryption algorithm properly, even your basic scheme might already be very efficient. For instance, if you use RSA encryption using public exponent e=3, then encryption is very fast, and the scheme you propose might be plenty fast enough.


Your description is not complete, because you do not tell who/what is going to "validate IP addresses against the encrypted form". From your update, I guess that you want:

  • to be able to log incoming IP addresses, in a way such that an attacker gaining full read access on the server files would not be able to rebuild the logged IP addresses, but such that an authorized administrator (you) could do it;
  • to maintain a set of "banned" IP addresses, possibly including ranges, and the server must be able to match an IP address against that set.

Since you (understandably) do not want to type a passphrase or insert a key every time the server restarts, you must assume that an attacker gaining full read access on the complete server files knows everything that the server knows right after a fresh start. In particular, the attacker can then simulate the server on his own machine. Since such a server can decide whether a given IP address is banned or not, the attacker can easily obtain the same information with his simulation (there are only four billions of IPv4 addresses, after all). Corollary: you may as well keep the banned (ranges of) addresses in cleartext, since you cannot make it more secure anyway.

For the logging use, asymmetric encryption looks like the right tool. Proper asymmetric encryption already includes random padding (to avoid exhaustive search on the data itself -- there really are only four billions or so of IPv4 addresses). RSA-4096 is complete overkill, though; 1024 bits would be enough for low-value information such as IP addresses (see this answer for details). Yet again, RSA-1024 means that each encrypted IP address will use 128 bytes, and you might want to optimize things a bit, to save space.

You could, for instance, do hybrid encryption with Elliptic Curve Diffie-Hellman: to encrypt an IPv4 address, do an ECDH with regards to a public key (same principle than your RSA key: the administrator alone keeps the private key), and use the resulting shared secret to encrypt the IPv4 address. Use Format Preserving Encryption to encrypt the IPv4 address into four bytes (e.g. with the Thorp shuffle, or, simpler, with a stream cipher initialized with the DH key). For "128-bit security", you need a 256-bit elliptic curve, and an ECDH public key can then be encoded over 32 bytes. Together with the encrypted IP address, that's 36 bytes per address.

To reduce it further, apply @D.W.'s advice: do the ECDH thing once upon server start, and mutualize the shared key usage. You would have a table of ECDH public keys, which grows slowly (one new entry per reboot), and each encrypted IP address would just reference the key to use (a 32-bit field would be sufficient). Note, though, that in this case, you must take care to randomize the symmetric encryption of the IP addresses. For instance, pad the IP address (4 bytes) with 12 bytes of random, and encrypt the 16 bytes as one block with AES. Together with the public key reference, this lower cost downs to about 20 bytes per IP. Without randomization, an attacker could deduce which IP addresses are repeated through the logs. Since this scheme reuses the symmetric key, the encryption must be done with more care, too (e.g. a stream cipher like RC4 would become a very bad idea).

About CPU performance: as @D.W. points out, this is a non-issue. With ECDH, you are below 0.1ms per IP address. RSA encryption will be even below such costs. Which is why I concentrate on space performance above, not CPU.