Is there any alternative to using % (modulus) in C/C++?

Ah, the joys of bitwise arithmetic. A side effect of many division routines is the modulus - so in few cases should division actually be faster than modulus. I'm interested to see the source you got this information from. Processors with multipliers have interesting division routines using the multiplier, but you can get from division result to modulus with just another two steps (multiply and subtract) so it's still comparable. If the processor has a built in division routine you'll likely see it also provides the remainder.

Still, there is a small branch of number theory devoted to Modular Arithmetic which requires study if you really want to understand how to optimize a modulus operation. Modular arithmatic, for instance, is very handy for generating magic squares.

So, in that vein, here's a very low level look at the math of modulus for an example of x, which should show you how simple it can be compared to division:


Maybe a better way to think about the problem is in terms of number bases and modulo arithmetic. For example, your goal is to compute DOW mod 7 where DOW is the 16-bit representation of the day of the week. You can write this as:

 DOW = DOW_HI*256 + DOW_LO

 DOW%7 = (DOW_HI*256 + DOW_LO) % 7
       = ((DOW_HI*256)%7  + (DOW_LO % 7)) %7
       = ((DOW_HI%7 * 256%7)  + (DOW_LO%7)) %7
       = ((DOW_HI%7 * 4)  + (DOW_LO%7)) %7

Expressed in this manner, you can separately compute the modulo 7 result for the high and low bytes. Multiply the result for the high by 4 and add it to the low and then finally compute result modulo 7.

Computing the mod 7 result of an 8-bit number can be performed in a similar fashion. You can write an 8-bit number in octal like so:

  X = a*64 + b*8 + c

Where a, b, and c are 3-bit numbers.

  X%7 = ((a%7)*(64%7) + (b%7)*(8%7) + c%7) % 7
      = (a%7 + b%7 + c%7) % 7
      = (a + b + c) % 7

since 64%7 = 8%7 = 1

Of course, a, b, and c are

  c = X & 7
  b = (X>>3) & 7
  a = (X>>6) & 7  // (actually, a is only 2-bits).

The largest possible value for a+b+c is 7+7+3 = 17. So, you'll need one more octal step. The complete (untested) C version could be written like:

unsigned char Mod7Byte(unsigned char X)
{
    X = (X&7) + ((X>>3)&7) + (X>>6);
    X = (X&7) + (X>>3);

    return X==7 ? 0 : X;
}

I spent a few moments writing a PIC version. The actual implementation is slightly different than described above

Mod7Byte:
       movwf        temp1        ;
       andlw        7        ;W=c
       movwf        temp2        ;temp2=c
       rlncf   temp1,F        ;
       swapf        temp1,W ;W= a*8+b
       andlw   0x1F
       addwf        temp2,W ;W= a*8+b+c
       movwf        temp2   ;temp2 is now a 6-bit number
       andlw   0x38    ;get the high 3 bits == a'
       xorwf        temp2,F ;temp2 now has the 3 low bits == b'
       rlncf   WREG,F  ;shift the high bits right 4
       swapf   WREG,F  ;
       addwf        temp2,W ;W = a' + b'

 ; at this point, W is between 0 and 10


       addlw        -7
       bc      Mod7Byte_L2
Mod7Byte_L1:
       addlw        7
Mod7Byte_L2:
       return

Here's a liitle routine to test the algorithm

       clrf    x
       clrf    count

TestLoop:
       movf        x,W
       RCALL   Mod7Byte
       cpfseq count
        bra    fail

       incf        count,W
       xorlw   7
       skpz
        xorlw        7
       movwf   count

       incfsz        x,F
       bra        TestLoop
passed:

Finally, for the 16-bit result (which I have not tested), you could write:

uint16 Mod7Word(uint16 X)
{
 return Mod7Byte(Mod7Byte(X & 0xff) + Mod7Byte(X>>8)*4);
}

Scott



There is an overhead most of the time in using modulo that are not powers of 2. This is regardless of the processor as (AFAIK) even processors with modulus operators are a few cycles slower for divide as opposed to mask operations.

For most cases this is not an optimisation that is worth considering, and certainly not worth calculating your own shortcut operation (especially if it still involves divide or multiply).

However, one rule of thumb is to select array sizes etc. to be powers of 2.

so if calculating day of week, may as well use %7 regardless if setting up a circular buffer of around 100 entries... why not make it 128. You can then write % 128 and most (all) compilers will make this & 0x7F


Unless you really need high performance on multiple embedded platforms, don't change how you code for performance reasons until you profile!

Code that's written awkwardly to optimize for performance is hard to debug and hard to maintain. Write a test case, and profile it on your target. Once you know the actual cost of modulus, then decide if the alternate solution is worth coding.


If you are calculating a number mod some power of two, you can use the bit-wise and operator. Just subtract one from the second number. For example:

x % 8 == x & 7
x % 256 == x & 255

A few caveats:

  1. This only works if the second number is a power of two.
  2. It's only equivalent if the modulus is always positive. The C and C++ standards don't specify the sign of the modulus when the first number is negative (until C++11, which does guarantee it will be negative, which is what most compilers were already doing). A bit-wise and gets rid of the sign bit, so it will always be positive (i.e. it's a true modulus, not a remainder). It sounds like that's what you want anyway though.
  3. Your compiler probably already does this when it can, so in most cases it's not worth doing it manually.