Why do you need message authentication in addition to encryption?

Encryption DOES NOT automatically protect the data against modification.

For example, let's say we have a stream cipher that is simply a PRNG (random number generator), where the key is the seed. Encryption works by generating random numbers in sequence (the keystream) and exclusing-or'ing them with the plaintext. If an attacker knows some plaintext and ciphertext bytes at a particular point, he can xor them together to recover the keystream for those bytes. From there, he can simply pick some new plaintext bytes and xor them with the keystream.

Often the attacker need not know the plaintext to achieve something. Let's take an example where an attacker simply needs to corrupt one particular field in a packet's internal data. He does not know what its value is, but he doesn't need to. Simply by replacing those bytes of ciphertext with random numbers, he has changed the plaintext.

This is particularly interesting in block ciphers where padding is used, as it opens us up to padding oracle attacks. These attacks involve tweaking ciphertext in a way that alters the padding string, and observing the result. Other attacks such as BEAST and the Lucky Thirteen Attack involve modification of ciphertext in a similar way. These tend to rely on the fact that some implementations blindly decrypt data before performing any kind of integrity checks.

Additionally, it may be possible to re-send an encrypted packet, which might cause some behaviour on the client or server. An example of this might be a command to toggle the enabled state of the firewall. This is called a replay attack, and encryption on its own will not protect against it. In fact, integrity checks often don't fix this problem either.

There are, in fact, three primary properties that are desirable in a secure communications scheme:

  • Confidentiality - The ability to prevent eavesdroppers from discovering the plaintext message, or information about the plaintext message (e.g. hamming weight).
  • Integrity - The ability to prevent an active attacker from modifying the message without the legitimate users noticing. This is usually provided via a Message Integrity Code (MIC).
  • Authenticity - The ability to prove that a message was generated by a particular party, and prevent forgery of new messages. This is usually provided via a Message Authentication Code (MAC). Note that authenticity automatically implies integrity.

The fact that the MAC and MIC can be provided by a single appropriately chosen HMAC hash scheme (sometimes called a MAIC) in certain circumstances is completely incidental. The semantic difference between integrity and authenticity is a real one, in that you can have integrity without authenticity, and such a system may still present problems.

The real distinction between integrity and authenticity is a tricky one to define, as Thomas Pornin pointed out to me in chat:

There's a tricky definition point there. Integrity is that you get the "right data", but according to what notion of "right" ? How comes the data from the attacker is not "right" ? If you answer "because that's from the attacker, not from the right client" then you are doing authentication...

It's a bit of a grey-area, but either way we can all agree that authentication is important.

An alternative to using a separate MAC / MIC is to use an authenticated block cipher mode, such as Gallois/Counter Mode (GCM) or EAX mode.


Encryption and decryption just transforms bytes. You say that when the password is wrong "the message simply would not decrypt", but that's not true: the result will just not be the same as the original. Try some online encryption tool like this one. If you encrypt "example" with the password testonetwothree1, you get Rg2iS8PvYsIUgmEynHP62g== as result. If you now decrypt the same ciphertext with the password testonetwothree4, you get "JÙ] i.¦WÆÏ*q" as result.

So why does that matter? If the decryption is garbage, how is that useful to the attacker?

Imagine you have a message that says "attack at 10:00". Encrypted with Caesar encryption, it is something like aWJiaWtzIGliIDEzOjAw. Your enemy might know that you are sending the attack message, but does not know the password. What they can do, though, is change a byte. If the message is changed to aWJiaWtzIGliIDazOjAw (the E is replaced by a), it suddenly says: "attack at 03:00". Similar scenarios could be encrypted commands, which an attacker can use to change the behaviour of the receiving computer in certain ways.

As you see, modifying the ciphertext, even blindly, can be an advantage to the attacker. You want to have authenticated encryption, where a modification is detected.

This does not just apply to Caesar: some encryption methods allow you to change individual bytes (stream ciphers or a block cipher in CTR mode). Others allow you to change only a whole block (most other block cipher modes). This is not a flaw in the encryption, but a result of not authenticating your encryption. If you can't check what it should have been, you can't blame the encryption (which just provides confidentiality, not integrity) if the result turns out to be different. If you want both, you need to use an algorithm that does both.

There are two ways of doing this:

  • Authenticated encryption algorithms do everything at once. You give them a key and some data, and out comes a piece of ciphertext that, when changed, will be detected. Examples include GCM mode and OCB mode.
  • Adding an authentication code, such as an HMAC. Note that there is a certain order in which you should do things, and that you should be careful with timing attacks, so this is a little trickier than the dedicated algorithms. The reason I mention it is because it's the traditional way of doing things and newer algorithms might not be available in your favourite crypto libraries.

My understanding is that simply encrypting the data, even using a symmetric shared key, with something like AES or 3DES should be sufficient to verify the data has not been tampered with in transit. If it had been, the message simply would not decrypt.

Your understanding is wrong.

Why would a message not decrypt if someone flipped a bit?

Even this Wikipedia page clearly says:

The block cipher modes ECB, CBC, OFB, CFB, CTR, and XTS provide confidentiality, but they do not protect against accidental modification or malicious tampering.

I'm linking to that page because the answer does depend on the mode of operation, and some (like GCM) do have build-in MACs.

More specifically, unless there is a checksum or signature, an encryption algorithm is basically converting bytes into other bytes. Just like the caesar cipher would turn A => K and B => L, modern encryption does basically the same thing, just more complex. And if someone changes the K in the ciphertext to an L, then the decryption will happily decrypt it to a B instead of the original A.

Malicious attacks of this kind are often unfeasible simply because without knowledge of the plaintext and the key it is very hard for Eve to know which bits to flip in order to get a predictable change in the decrypted message, but without a MAC, nothing stops her from flipping bits at random and hoping for the best.