Implement Rijndael's S-box

x86-64 Machine code - 23 22 20 19 bytes

Uses the AES-NI instruction set

66 0F 6E C1          movd        xmm0,ecx
66 0F 38 DD C1       aesenclast  xmm0,xmm1
0F 57 C1             xorps       xmm0,xmm1  
66 0F 3A 14 C0 00    pextrb      eax,xmm0,0
C3                   ret

Using Windows calling conventions, takes in a byte and outputs a byte. It is not necessary to reverse the ShiftRows as it does not affect the first byte.


GolfScript, 60 characters

{[[0 1{.283{1$2*.255>@*^}:r~^}255*].@?~)={257r}4*99]{^}*}:S;

This code defines a function named S that takes in a byte and applies the Rijndael S-box to it. (It also uses an internal helper function named r to save a few chars.)

This implementation uses a logarithm table to compute the GF(28) inverses, as suggested by Thomas Pornin. To save a few chars, the whole logarithm table is recalculated for each input byte; even so, and despite GolfScript being a very slow language in general, this code takes only about 10 ms to process a byte on my old laptop. Precalculating the logarithm table (as L) speeds it up to about 0.5 ms per byte, at the modest cost of three more chars:

[0 1{.283{1$2*.255>@*^}:r~^}255*]:L;{[L?~)L={257r}4*99]{^}*}:S;

For convenience, here's a simple test harness that calls the function S, as defined above, to compute and print out the whole S-box in hex like on Wikipedia:

"0123456789abcdef"1/:h; 256, {S .16/h= \16%h= " "++ }% 16/ n*

Try this code online.

(The online demo precalculates the logarithm table to avoid taking too much time. Even so, the online GolfScript site may sometimes randomly time out; this is a known issue with the site, and a reload usually fixes it.)

Explanation:

Let's start with the logarithm table calculation, and specifically with the helper function r:

{1$2*.255>@*^}:r

This function takes two inputs on the stack: a byte and a reduction bitmask (a constant between 256 and 511). It duplicates the input byte, multiplies the copy by 2 and, if the result exceeds 255, XORs it with the bitmask to bring it back under 256.

Within the log-table generating code, the function r is called with the reduction bitmask 283 = 0x11b (which corresponds to the Rijndael GF(28) reduction polynomial x8 + x4 + x3 + x + 1), and result is XORed with the original byte, effectively multiplying it by 3 (= x + 1, as a polynomial) in the Rijndael finite field. This multiplication is repeated 255 times, starting from the byte 1, and the results (plus an initial zero byte) are collected into a 257-element array L that looks like this (middle part omitted):

[0 1 3 5 15 17 51 85 255 26 46 ... 180 199 82 246 1]

The reason why there are 257 elements is that, with the prepended 0 and with 1 occurring twice, we can find the modular inverse of any given byte simply by looking up its (zero-based) index in this array, negating it, and looking up the byte at the negated index in the same array. (In GolfScript, as in many other programming languages, negative array indexes count backwards from the end of the array.) Indeed, this is exactly what the code L?~)L= at the beginning of the function S does.

The rest of the code calls the helper function r four times with the reduction bitmask 257 = 28 + 1 to create four bit-rotated copies of the inverted input byte. These are all collected into an array, together with the constant 99 = 0x63, and XORed together to produce the final output.


Ruby, 161 characters

R=0..255
S=R.map{|t|t=b=R.select{|y|x=t;z=0;8.times{z^=y*(x&1);x/=2;y*=2};r=283<<8;8.times{r/=2;z^r<z/2&&z^=r};z==1}[0]||0;4.times{|r|t^=b<<1+r^b>>4+r};t&255^99}

In order to check the output you can use the following code to print it in tabular form:

S.map{|x|"%02x"%x}.each_slice(16){|l|puts l*' '}