Latency of CPU instructions on x86 and x64 processors

In general, each of these operations takes a single clock cycle as well to execute if the arguments are in registers at the various stages of the pipeline.

What do you mean by latency? How many cycles an operation spends in the ALU?

You might find this table useful: http://www.agner.org/optimize/instruction_tables.pdf

Since modern processors are super scalar and can execute out of order, you can often get total instructions per cycle that exceed 1. The arguments for the macro command are the most important, but the operation also matters since divides take longer than XOR (<1 cycle latency).

Many x86 instructions can take multiple cycles to complete some stages if they are complex (REP commands or worse MWAIT for example).


Calculating the efficiency of assembly code is not the best way to go in these days of Out of Order Execution Super Scalar pipelines. It'll vary by processor type. It'll vary on instructions both before and after (you can add extra code and have it run faster sometimes!). Some operations (division notably) can have a range of execution times even on older more predictable chips. Actually timing of lots of iterations is the only way to go.


You can find information on intel cpu at intel software developer manuals. For instance the latency is 1 cycle for an integer addition and 3 cycles for an integer multiplication.

I don't know about multiplication, but I expect addition to always take one cycle.