Fastest way to get integer mod 10 and integer divide 10?

Heres a binary to BCD algorithm I used several years ago based on one found here. I was using an external BCD to 7 seg display driver so the result could be written to the proper ports directly as packed BCD for output.

This is fairly fast if you have a hardware multiplier in the PIC, I was using a PIC18F97J60. If you don't have a hardware multiplier on your PIC, consider using shift + add for the multiplication.

This takes in an unsigned 16-bit int and returns packed BCD with 5 digits, it could be modified and made faster for 4 digits. It uses shift + additions to approximate division by 10 but given the limited input range it is exact for this use. You may want to pack the result differently as well to line up with how your using the result.

void intToPackedBCD( uint16_t n, uint8_t *digits ) {
    
    uint8_t d4, d3, d2, d1, d0, q;  //d4 MSD, d0 LSD

    d1 = (n>>4)  & 0xF;
    d2 = (n>>8)  & 0xF;
    d3 = (n>>12) & 0xF;

    d0 = 6*(d3 + d2 + d1) + (n & 0xF);
    q = (d0 * 0xCD) >> 11;
    d0 = d0 - 10*q;

    d1 = q + 9*d3 + 5*d2 + d1;
    q = (d1 * 0xCD) >> 11;
    d1 = d1 - 10*q;

    d2 = q + 2*d2;
    q = (d2 * 0x1A) >> 8;
    d2 = d2 - 10*q;

    d3 = q + 4*d3;
    d4 = (d3 * 0x1A) >> 8;
    d3 = d3 - 10*d4;

    digits[0] = (d4<<4) | (d3);
    digits[1] = (d2<<4) | (d1);
    digits[2] = (d0<<4);
}

Assuming unsigned integers, division and multiplication can be formed from bit shifts. And from (integer) division and multiplication, modulo can be derived.

To multiply by 10:

y = (x << 3) + (x << 1);

To divide by 10 is more difficult. I know of several division algorithms. If I recall correctly, there is a way to divide by 10 quickly using bit shifts and subtraction, but I can't remember the exact method. If that's not true, then this is a divide algorithm which manages <130 cycles. I'm not sure what micro you're using, but you can use that in some way, even if you have to port it.

EDIT: Somebody says over at Stack Overflow, if you can tolerate a bit of error and have a large temporary register, this will work:

temp = (ms * 205) >> 11;  // 205/2048 is nearly the same as /10

Assuming you have division and multiplication, modulo is simple:

mod = x - ((x / z) * z)

You can convert from binary to packed BCD without any division using double dabble algorithm. It uses only shift and add 3.

For example convert 24310 = 111100112 to binary

0000 0000 0000   11110011   Initialization
0000 0000 0001   11100110   Shift
0000 0000 0011   11001100   Shift
0000 0000 0111   10011000   Shift
0000 0000 1010   10011000   Add 3 to ONES, since it was 7
0000 0001 0101   00110000   Shift
0000 0001 1000   00110000   Add 3 to ONES, since it was 5
0000 0011 0000   01100000   Shift
0000 0110 0000   11000000   Shift
0000 1001 0000   11000000   Add 3 to TENS, since it was 6
0001 0010 0001   10000000   Shift
0010 0100 0011   00000000   Shift
   2    4    3
       BCD

This algorithm is very efficient when there's no hardware divisor available. More over only left shift by 1 is used, so it's fast even when a barrel shifter is not available