# Reverse Bit Order of 32-bit Integers

## MMIX assembly (28 Bytes)

### 64 bit numbers

rbit:
SETH $1,#0102 # load matrix in 16-byte steps ORMH$1,#0408
ORML $1,#1020 ORL$1,#4080
MOR  $0,$1,$0 # multiplication 1 MOR$0,$0,$1 # multiplication 2
POP  1,0      # return


This assembles to:

rbit:
E0010102 # SETH $1,#0102 E9010408 # ORMH$1,#0408
EA011020 # ORML $1,#1020 EB014080 # ORL$1,#4080
DC000100 # MOR  $0,$1,$0 DC000001 # MOR$0,$0,$1
F8010000 # POP  1,0


### How does it work?

The MOR instruction performs a matrix multiplication on two 64-bit quantities used as two 8x8 matrices of booleans. A boolean number with digits abcdefghklmnopqr2 is used as a matrix like this:

/ abcd \
| efgh |
| klmn |
\ opqr /


The MOR instruction multiplies the matrices represented by their arguments where multiplication is and and addition is or. It is:

/ 0001 \      / abcd \      / opqr \
| 0010 |  \/  | efgh |  --  | klmn |
| 0100 |  /\  | klmn |  --  | efgh |
\ 1000 /      \ opqr /      \ abcd /


and furthermore:

/ opqr \      / 0001 \      / rqpo \
| klmn |  \/  | 0010 |  --  | nmlk |
| efgh |  /\  | 0100 |  --  | hgfe |
\ abcd /      \ 1000 /      \ dcba /


which is the reverse order of bits of the original number.

### 32 bit numbers

If you just want the reverse of a 32 bit number instead of a 64 bit number, you can use this modified method:

rbit:
SETL   $1,#0408 # load first matrix in two steps ORML$1,#0102
MOR    $1,$1,$0 # apply first matrix SLU$2,$1,32 # compile second matrix 16ADDU$1,$2,$1
MOR    $1,$0,$1 # apply second matrix POP 1,0 # return  assembled: rbit: E3010408 # SETL$1,#0408
EA010102 # ORML   $1,#0102 DC010001 # MOR$1,$1,$0
3B020120 # SLU    $2,$1,32
2E010201 # 16ADDU $1,$2,$1 DC010001 # MOR$1,$0,$1
F8010000 # POP    1,0


The first matrix multiplication basically works like this:

/ 0000 \      / 0000 \      / 0000 \
| 0000 |  \/  | 0000 |  --  | 0000 |
| 0001 |  /\  | abcd |  --  | efgh |
\ 0010 /      \ efgh /      \ abcd /


the corresponding octabyte is #0000000001020408 which we load in the first two instructions. The second multiplication looks like this:

/ 0000 \      / 0001 \      / 0000 \
| 0000 |  \/  | 0010 |  --  | 0000 |
| efgh |  /\  | 0100 |  --  | hgfe |
\ abcd /      \ 1000 /      \ dcba /


The corresponding octabyte is #0102040810204080 which we create from the first matrix like this:

SLU $2,$1,#32   # $2 = #0102040800000000 16ADDU$1,$2,$1 # $2 =$2 + $1 << 4 =$2 + #0000000010204080
#    = #0102040810204080


The second multiplication is business as usual, the resulting code has the same length (28 bytes).

## 80386 assembly (13 12 bytes)

As a function in AT&T syntax using the cdecl calling convention.

    # reverse bits of a 32 bit word
.text
.globl rbit
.type rbit,@function
rbit:
push $32 # prepare loop counter pop %ecx 0: shrl 4(%esp) # shift lsb of argument into carry flag adc %eax,%eax # shift carry flag into lsb loop 0b # decrement %ecx and jump until ecx = 0 ret # return  This function assembles to the following byte sequence: 6a 20 59 d1 6c 24 04 11 c0 e2 f8 c3  Broken down into instructions: 6a 20 push$32
59          pop %ecx
d1 6c 24 04 shrl 0x4(%esp)
e2 f8       loop .-6
c3          ret


It works like this: In each of the 32 iterations of the loop, the argument, which is located at 4(%esp), is right shifted by one position. The last bit is implicitly shifted into the carry flag. The adc instruction adds two values and adds the value of the carry flag to the result. If you add a value to itself, i.e. %eax, you effectively left-shift it by one position. This makes adc %eax,%eax a convenient way to left shift %eax by one position while shifting the content of the carry flag into the low-order bit.

I repeat this process 32 times so that the entire content of 4(%esp) is dumped into %eax. I never explicitly initialize %eax as its previous contents are shifted out during the loop.

## C,   63  52   48

Original version:

int r(int n){int r=0,i=32;for(;i--;r=r<<1|n&1,n>>=1);return r;}


Updated version (with changes suggeted by Allbeert, es1024, and Dennis):

r(n){int r,i=32;for(;i--;r=r*2|n&1,n>>=1);return r;}


Note: Since the second version omits setting r=0, the code is assuming that an int is 32 bits. If this assumption is false, the function will most likely produce an incorrect result, depending on the initial state of r on entry to the function.

Final version (with further changes suggested by Dennis and Alchymist):

r(n,r,i){for(;32-i++;r=r*2|n&1,n>>=1);return r;}


Note: This puts the declaration of the work variables r and i into the parameter list. Parameters are as follows: n is the number to be bit-reversed. r and i are work variables that must be passed as 0.