Why AES is not used for secure hashing, instead of SHA-x?

A block cipher has a key; the secrecy of the key is what the cipher security builds on. On the other hand, a hash function has no key at all, and there is no "secret data" on which security of the hash function is to be built.

A block cipher is reversible: if you know the key, you can decrypt what was encrypted. Technically, for a given key, a block cipher is a permutation of the space of possible block values. Hash functions are meant to be non-reversible, and they are not permutations in any way.

A block cipher operates on fixed-sized blocks (128-bit blocks for AES), both for input and output. A hash function has a fixed-sized output, but should accept arbitrarily large inputs.

So block ciphers and hash functions are really different animals; rather than trying to differentiate them, it is easier to see what they have in common: namely, that the people who know how to design a block cipher are also reasonably good at designing hash functions, because the analysis mathematical tools are similar (quite a lot of linear algebra and boolean functions, really).

Let's go for more formal definitions:

A block cipher is a family of permutations selected by a key. We consider the space B of n-bit blocks for some fixed value of n; the size of B is then 2n. Keys are values from a space K, usually another space of sequences of m bits (m is not necessarily equal to n). A key k selects a permutation among the 2n! possible permutations of B.

A block cipher is deemed secure as long as it is computationally indistinguishable from a permutation which has been chosen uniformly and randomly among the 2n! possible permutations. To model that, imagine a situation where an attacker is given access to two black box, one implementing the block cipher with a key that the attacker does not know, and the other being a truly random permutation. The goal of the attacker is to tell which is which. He can have each box encrypt or decrypt whatever data he wishes. On possible attack is to try all possible keys (there are 2m such keys) only one is found, which yields the same values than one of the boxes; this has average cost 2m-1 invocations of the cipher. A secure block cipher is one such that this generic attack is the best possible attack.

The AES is defined over 128-bit blocks (n = 128) and 128-, 192- and 256-bit keys.

A hash function is a single, fully defined, computable function which takes as input bit sequences of arbitrary length, and outputs values of a fixed length r (e.g. r = 256 bits for SHA-256). There is no key, no family of function, just a unique function which anybody can compute.

A hash function h is deemed secure if:

  • It is computationally infeasible to find preimages: given a r-bit value x, it is not feasible to find m such that h(m) = x.
  • It is computationally infeasible to find second preimages: given m, it is not feasible to find m' distinct from m, such that h(m) = h(m').
  • It is computationally infeasible to find collisions: it is not feasible to find m and m', distinct from each other, such that h(m) = h(m').

There are generic attacks which can find preimages, second preimages or collisions, with costs, respectively, 2r, 2r, and 2r/2. So actual security can be reached only if r is large enough so that 2r/2 is an overwhelmingly huge cost. In practice, this means that r = 128 (a 128-bit hash function such as MD5) is not enough.

In an informal way, it is good if the hash function "looks like" it has been chosen randomly and uniformly among the possible functions which accept the same inputs. But this is an ill-defined property since we are talking about a unique function (probabilities are always implicitly about averages and repeated experiences; you cannot really have probabilities with one single function). Also, being a random function is not exactly the same as being resistant to collisions and preimages; this is the debate over the Random Oracle Model.


Nevertheless, it is possible to build a hash function out of a block cipher. This is what the Merkle-Damgård construction does. This entails using the input message as the key of the block cipher; so the block cipher is not used at all as it was meant to be. With AES, this proves disappointing:

  • It results in a hash function with a 128-bit output, which is too small for security against technology available in 2011.
  • The security of the hash function then relies on the absence of related-key attacks on the block cipher. Related-key attacks do not really have any practical significance on a block cipher when used for encryption; hence, AES was not designed to resist such attacks, and, indeed, AES has a few weaknesses in that respect -- not a worry for encryption, but a big worry if AES is to be used in a Merkle-Damgård construction.
  • The performance will not be good.

The Whirlpool hash function is a design which builds on a block cipher inspired from the AES -- not the real one. That block cipher has a much improved (and heavier) key schedule, which resists related-key attacks and makes it usable as the core of a hash function. Also, that block cipher works on 512-bit blocks, not 128-bit blocks. Whirlpool is believed secure. Whirlpool is known to be very slow, so nobody uses it.

Some more recent hash function designs have attempted to reuse parts of the AES -- to be precise, to use an internal operation which maps well on the AES-NI instructions which recent Intel and AMD processors feature. See for instance ECHO and SHAvite-3; these two functions both received quite a bit of exposure as part of the SHA-3 competition and are believed "reasonably secure". There are very fast on recent Intel and AMD processors. On other weaker architectures, were hash function performance has some chance to actually matter, these functions are quite slow.

There are other constructions which can make a hash function out of a block cipher, e.g. the one used in Skein; but they also tend to require larger blocks than what the AES is defined over.

Summary: not only are block ciphers and hash functions quite different; but the idea of building a hash function out of the AES turns out to be of questionable validity. It is not easy, and the limited AES block size is the main hindrance.


The basic answer is that they are different types of algorithms. AES is a symmetric key algorithm. You can't use it in the same role as RSA (a public key algorithm), or SHA-256 (a hashing algorithm). They are different systems designed with very different properties and weaknesses.

Yet, I paused and though seriously about this idea to explain it besides just saying, "It's this way." After all, a hash in the universal sense is a repeatable representation of data in a fixed or reduced size. AES can provide that via CBC mode. Yet, there are more properties to a secure hash than simple reduction.

A secure hashing algorithm is a one-way system. AES encrypts and decrypts the same way (symmetric cipher), and you can make a 1-1 mapping for each block what will happen with a given key. Unless the data is chained and thus lossy, you can simply decrypt the AES "hash" to the source data.

One can't reasonably reverse a SHA process other than to just try different input data. For the reasons that you can't use SHA-x to encrypt something, you can't use AES to hash something.


Is there a simple explanation of the real difference between hash functions and symmetric cyphers?

  • A cipher is reversible, a hash function is not
  • The length of the output of a cipher depends on the length of the input; a hash function produces the same length output regardless of the input
  • A single bit change anywhere in a cryptographic hash function produces a cascading (dramatic) change in the hash output. As a rule, this is not true for ciphers.
  • A cipher requires a key, a hash does not

The design and purpose of the two are fundamentally different and they are not interchangeable.