Reverse Bit Order of 32-bit Integers

MMIX assembly (28 Bytes)

64 bit numbers

rbit:
    SETH $1,#0102 # load matrix in 16-byte steps
    ORMH $1,#0408
    ORML $1,#1020
    ORL  $1,#4080
    MOR  $0,$1,$0 # multiplication 1
    MOR  $0,$0,$1 # multiplication 2
    POP  1,0      # return

This assembles to:

rbit:
    E0010102 # SETH $1,#0102
    E9010408 # ORMH $1,#0408
    EA011020 # ORML $1,#1020
    EB014080 # ORL  $1,#4080
    DC000100 # MOR  $0,$1,$0
    DC000001 # MOR  $0,$0,$1
    F8010000 # POP  1,0

How does it work?

The MOR instruction performs a matrix multiplication on two 64-bit quantities used as two 8x8 matrices of booleans. A boolean number with digits abcdefghklmnopqr2 is used as a matrix like this:

/ abcd \
| efgh |
| klmn |
\ opqr /

The MOR instruction multiplies the matrices represented by their arguments where multiplication is and and addition is or. It is:

/ 0001 \      / abcd \      / opqr \
| 0010 |  \/  | efgh |  --  | klmn |
| 0100 |  /\  | klmn |  --  | efgh |
\ 1000 /      \ opqr /      \ abcd /

and furthermore:

/ opqr \      / 0001 \      / rqpo \
| klmn |  \/  | 0010 |  --  | nmlk |
| efgh |  /\  | 0100 |  --  | hgfe |
\ abcd /      \ 1000 /      \ dcba /

which is the reverse order of bits of the original number.

32 bit numbers

If you just want the reverse of a 32 bit number instead of a 64 bit number, you can use this modified method:

rbit:
    SETL   $1,#0408 # load first matrix in two steps
    ORML   $1,#0102
    MOR    $1,$1,$0 # apply first matrix
    SLU    $2,$1,32 # compile second matrix
    16ADDU $1,$2,$1
    MOR    $1,$0,$1 # apply second matrix
    POP    1,0      # return

assembled:

rbit:
    E3010408 # SETL   $1,#0408
    EA010102 # ORML   $1,#0102
    DC010001 # MOR    $1,$1,$0
    3B020120 # SLU    $2,$1,32
    2E010201 # 16ADDU $1,$2,$1
    DC010001 # MOR    $1,$0,$1
    F8010000 # POP    1,0

The first matrix multiplication basically works like this:

/ 0000 \      / 0000 \      / 0000 \
| 0000 |  \/  | 0000 |  --  | 0000 |
| 0001 |  /\  | abcd |  --  | efgh |
\ 0010 /      \ efgh /      \ abcd /

the corresponding octabyte is #0000000001020408 which we load in the first two instructions. The second multiplication looks like this:

/ 0000 \      / 0001 \      / 0000 \
| 0000 |  \/  | 0010 |  --  | 0000 |
| efgh |  /\  | 0100 |  --  | hgfe |
\ abcd /      \ 1000 /      \ dcba /

The corresponding octabyte is #0102040810204080 which we create from the first matrix like this:

SLU $2,$1,#32   # $2 = #0102040800000000
16ADDU $1,$2,$1 # $2 = $2 + $1 << 4
                     = $2 + #0000000010204080
                #    = #0102040810204080

The second multiplication is business as usual, the resulting code has the same length (28 bytes).


80386 assembly (13 12 bytes)

As a function in AT&T syntax using the cdecl calling convention.

    # reverse bits of a 32 bit word
    .text
    .globl rbit
    .type rbit,@function
rbit:
    push $32       # prepare loop counter
    pop %ecx
0:  shrl 4(%esp)   # shift lsb of argument into carry flag
    adc  %eax,%eax # shift carry flag into lsb
    loop 0b        # decrement %ecx and jump until ecx = 0
    ret            # return

This function assembles to the following byte sequence:

6a 20 59 d1 6c 24 04 11 c0 e2 f8 c3

Broken down into instructions:

6a 20       push $32
59          pop %ecx
d1 6c 24 04 shrl 0x4(%esp)
11 c0       adc  %eax,%eax
e2 f8       loop .-6
c3          ret    

It works like this: In each of the 32 iterations of the loop, the argument, which is located at 4(%esp), is right shifted by one position. The last bit is implicitly shifted into the carry flag. The adc instruction adds two values and adds the value of the carry flag to the result. If you add a value to itself, i.e. %eax, you effectively left-shift it by one position. This makes adc %eax,%eax a convenient way to left shift %eax by one position while shifting the content of the carry flag into the low-order bit.

I repeat this process 32 times so that the entire content of 4(%esp) is dumped into %eax. I never explicitly initialize %eax as its previous contents are shifted out during the loop.


C,   63    52   48

Original version:

int r(int n){int r=0,i=32;for(;i--;r=r<<1|n&1,n>>=1);return r;}

Updated version (with changes suggeted by Allbeert, es1024, and Dennis):

r(n){int r,i=32;for(;i--;r=r*2|n&1,n>>=1);return r;}

Note: Since the second version omits setting r=0, the code is assuming that an int is 32 bits. If this assumption is false, the function will most likely produce an incorrect result, depending on the initial state of r on entry to the function.


Final version (with further changes suggested by Dennis and Alchymist):

r(n,r,i){for(;32-i++;r=r*2|n&1,n>>=1);return r;}

Note: This puts the declaration of the work variables r and i into the parameter list. Parameters are as follows: n is the number to be bit-reversed. r and i are work variables that must be passed as 0.