64/32-bit division on a processor with 32/16-bit division

My copy of Knuth (The Art of Computer Programming) is at work, so I can't check it until Monday, but that would be my first source. It has a whole section on arithmetic.


edit: your post about "16/16 division and 32/16 division which both take 18 cycles." -- dsPICs have a conditional subtract operation in assembly. Consider using this as your computational primitive.

Also note that if X = XH * 232 + XL and D = DH * 216 + DL, then if you are looking for

(Q,R) = X/D where X = Q * D + R

where Q = QH * 216 + QL, R = RH * 216 + RL, then

XH * 232 + XL = DH * QH * 232 + (DL * QH + DH * QL) * 216 + (DL * QL) + RH * 216 + RL

This suggests (by looking at terms that are the high 32 bits) to use the following procedure, akin to long division:

  1. (QH, R0) = XH / (DH+1) -> XH = QH * (DH+1) + R0 [32/16 divide]
  2. R1 = X - (QH * 216) * D [requires a 16*32 multiply, a shift-left by 16, and a 64-bit subtract]
  3. calculate R1' = R1 - D * 216
  4. while R1' >= 0, adjust QH upwards by 1, set R1 = R1', and goto step 3
  5. (QL, R2) = (R1 >> 16) / (DH+1) -> R1 = QL * (DH+1) + R2 [32/16 divide]
  6. R3 = R1 - (QL * D) [requires a 16*32 multiply and a 48-bit subtract]
  7. calculate R3' = R3 - D
  8. while R3' >= 0, adjust QL upwards by 1, set R3 = R3', and goto step 7

Your 32-bit quotient is the pair (QH,QL), and 32-bit remainder is R3.

(This assumes that the quotient is not larger than 32-bit, which you need to know ahead of time, and can easily check ahead of time.)


See "Hacker's Delight", multiword division (pages 140-145).

The basic concept (going back to Knuth) is to think of your problem in base-65536 terms. Then you have a 4 digit by 2 digit division problem, with 2/1 digit division as a primitive.

The C code is here: https://github.com/hcs0/Hackers-Delight/blob/master/divmnu.c.txt