# Reverse Bit Order of 32-bit Integers

## MMIX assembly (28 Bytes)

### 64 bit numbers

```
rbit:
SETH $1,#0102 # load matrix in 16-byte steps
ORMH $1,#0408
ORML $1,#1020
ORL $1,#4080
MOR $0,$1,$0 # multiplication 1
MOR $0,$0,$1 # multiplication 2
POP 1,0 # return
```

This assembles to:

```
rbit:
E0010102 # SETH $1,#0102
E9010408 # ORMH $1,#0408
EA011020 # ORML $1,#1020
EB014080 # ORL $1,#4080
DC000100 # MOR $0,$1,$0
DC000001 # MOR $0,$0,$1
F8010000 # POP 1,0
```

### How does it work?

The `MOR`

instruction performs a matrix multiplication on two 64-bit quantities used as two 8x8 matrices of booleans. A boolean number with digits *abcdefghklmnopqr _{2}* is used as a matrix like this:

```
/ abcd \
| efgh |
| klmn |
\ opqr /
```

The `MOR`

instruction multiplies the matrices represented by their arguments where multiplication is `and`

and addition is `or`

. It is:

```
/ 0001 \ / abcd \ / opqr \
| 0010 | \/ | efgh | -- | klmn |
| 0100 | /\ | klmn | -- | efgh |
\ 1000 / \ opqr / \ abcd /
```

and furthermore:

```
/ opqr \ / 0001 \ / rqpo \
| klmn | \/ | 0010 | -- | nmlk |
| efgh | /\ | 0100 | -- | hgfe |
\ abcd / \ 1000 / \ dcba /
```

which is the reverse order of bits of the original number.

### 32 bit numbers

If you just want the reverse of a 32 bit number instead of a 64 bit number, you can use this modified method:

```
rbit:
SETL $1,#0408 # load first matrix in two steps
ORML $1,#0102
MOR $1,$1,$0 # apply first matrix
SLU $2,$1,32 # compile second matrix
16ADDU $1,$2,$1
MOR $1,$0,$1 # apply second matrix
POP 1,0 # return
```

assembled:

```
rbit:
E3010408 # SETL $1,#0408
EA010102 # ORML $1,#0102
DC010001 # MOR $1,$1,$0
3B020120 # SLU $2,$1,32
2E010201 # 16ADDU $1,$2,$1
DC010001 # MOR $1,$0,$1
F8010000 # POP 1,0
```

The first matrix multiplication basically works like this:

```
/ 0000 \ / 0000 \ / 0000 \
| 0000 | \/ | 0000 | -- | 0000 |
| 0001 | /\ | abcd | -- | efgh |
\ 0010 / \ efgh / \ abcd /
```

the corresponding octabyte is `#0000000001020408`

which we load in the first two instructions. The second multiplication looks like this:

```
/ 0000 \ / 0001 \ / 0000 \
| 0000 | \/ | 0010 | -- | 0000 |
| efgh | /\ | 0100 | -- | hgfe |
\ abcd / \ 1000 / \ dcba /
```

The corresponding octabyte is `#0102040810204080`

which we create from the first matrix like this:

```
SLU $2,$1,#32 # $2 = #0102040800000000
16ADDU $1,$2,$1 # $2 = $2 + $1 << 4
= $2 + #0000000010204080
# = #0102040810204080
```

The second multiplication is business as usual, the resulting code has the same length (28 bytes).

## 80386 assembly (~~13~~ 12 bytes)

As a function in AT&T syntax using the cdecl calling convention.

```
# reverse bits of a 32 bit word
.text
.globl rbit
.type rbit,@function
rbit:
push $32 # prepare loop counter
pop %ecx
0: shrl 4(%esp) # shift lsb of argument into carry flag
adc %eax,%eax # shift carry flag into lsb
loop 0b # decrement %ecx and jump until ecx = 0
ret # return
```

This function assembles to the following byte sequence:

```
6a 20 59 d1 6c 24 04 11 c0 e2 f8 c3
```

Broken down into instructions:

```
6a 20 push $32
59 pop %ecx
d1 6c 24 04 shrl 0x4(%esp)
11 c0 adc %eax,%eax
e2 f8 loop .-6
c3 ret
```

It works like this: In each of the 32 iterations of the loop, the argument, which is located at `4(%esp)`

, is right shifted by one position. The last bit is implicitly shifted into the carry flag. The `adc`

instruction adds two values and adds the value of the carry flag to the result. If you add a value to itself, i.e. `%eax`

, you effectively left-shift it by one position. This makes `adc %eax,%eax`

a convenient way to left shift `%eax`

by one position while shifting the content of the carry flag into the low-order bit.

I repeat this process 32 times so that the entire content of `4(%esp)`

is dumped into `%eax`

. I never explicitly initialize `%eax`

as its previous contents are shifted out during the loop.

## C, ~~ 63 ~~ ~~ 52 ~~ 48

**Original version:**

```
int r(int n){int r=0,i=32;for(;i--;r=r<<1|n&1,n>>=1);return r;}
```

**Updated version** (with changes suggeted by *Allbeert*, *es1024*, and *Dennis*):

```
r(n){int r,i=32;for(;i--;r=r*2|n&1,n>>=1);return r;}
```

Note: Since the second version omits setting `r=0`

, the code is assuming that an `int`

is 32 bits. If this assumption is false, the function will most likely produce an incorrect result, depending on the initial state of `r`

on entry to the function.

**Final version** (with further changes suggested by *Dennis* and *Alchymist*):

```
r(n,r,i){for(;32-i++;r=r*2|n&1,n>>=1);return r;}
```

Note: This puts the declaration of the work variables `r`

and `i`

into the parameter list. Parameters are as follows: `n`

is the number to be bit-reversed. `r`

and `i`

are work variables that must be passed as 0.