x86 MUL Instruction from VS 2008/2010

There's three different types of multiply instructions on x86. The first is MUL reg, which does an unsigned multiply of EAX by reg and puts the (64-bit) result into EDX:EAX. The second is IMUL reg, which does the same with a signed multiply. The third type is either IMUL reg1, reg2 (multiplies reg1 with reg2 and stores the 32-bit result into reg1) or IMUL reg1, reg2, imm (multiplies reg2 by imm and stores the 32-bit result into reg1).

Since in C, multiplies of two 32-bit values produce 32-bit results, compilers normally use the third type (signedness doesn't matter, the low 32 bits agree between signed and unsigned 32x32 multiplies). VC++ will generate the "long multiply" versions of MUL/IMUL if you actually use the full 64-bit results, e.g. here:

unsigned long long prod(unsigned int a, unsigned int b)
{
  return (unsigned long long) a * b;
}

The 2-operand (and 3-operand) versions of IMUL are faster than the one-operand versions simply because they don't produce a full 64-bit result. Wide multipliers are large and slow; it's much easier to build a smaller multiplier and synthesize long multiplies using Microcode if necessary. Also, MUL/IMUL writes two registers, which again is usually resolved by breaking it into multiple instructions internally - it's much easier for the instruction reordering hardware to keep track of two dependent instructions that each write one register (most x86 instructions look like that internally) than it is to keep track of one instruction that writes two.


imul (signed) and mul (unsigned) both have a one-operand form that does edx:eax = eax * src. i.e. a 32x32b => 64b full multiply (or 64x64b => 128b).

186 added an imul dest(reg), src(reg/mem), immediate form, and 386 added an imul r32, r/m32 form, both of which which only compute the lower half of the result. (According to NASM's appendix B, see also the x86 tag wiki)

When multiplying two 32-bit values, the least significant 32 bits of the result are the same, whether you consider the values to be signed or unsigned. In other words, the difference between a signed and an unsigned multiply becomes apparent only if you look at the "upper" half of the result, which one-operand imul/mul puts in edx and two or three operand imul puts nowhere. Thus, the multi-operand forms of imul can be used on signed and unsigned values, and there was no need for Intel to add new forms of mul as well. (They could have made multi-operand mul a synonym for imul, but that would make disassembly output not match the source.)

In C, results of arithmetic operations have the same type as the operands (after integer promotion for narrow integer types). If you multiply two int together, you get an int, not a long long: the "upper half" is not retained. Hence, the C compiler only needs what imul provides, and since imul is easier to use than mul, the C compiler uses imul to avoid needing mov instructions to get data into / out of eax.

As a second step, since C compilers use the multiple-operand form of imul a lot, Intel and AMD invest effort into making it as fast as possible. It only writes one output register, not e/rdx:e/rax, so it was possible for CPUs to optimize it more easily than the one-operand form. This makes imul even more attractive.

The one-operand form of mul/imul is useful when implementing big number arithmetic. In C, in 32-bit mode, you should get some mul invocations by multiplying unsigned long long values together. But, depending on the compiler and OS, those mul opcodes may be hidden in some dedicated function, so you will not necessarily see them. In 64-bit mode, long long has only 64 bits, not 128, and the compiler will simply use imul.