Acceptability of regular usage of r10 and r11

The x86-64 System V ABI doesn't call its calling convention "cdecl". It's just the x86-64 SysV calling convention. The string "cdecl" doesn't appear in the ABI doc.

r11 is a temporary, aka call-clobbered register.

r10 is also a call-clobbered register. The ABI says "used for passing a function’s static chain pointer", but C doesn't use this and code generated by gcc and clang does freely use r10 without saving/restoring it. The ABI's table of register usage lists r10 as not preserved across function calls so a leaf function can always clobber it. (Which registers to use as temporaries when writing AMD64 SysV assembly?)

gcc does use r10 as part of its trampoline for function pointers to GNU C nested functions, for a pointer to the stack frame of the outer scope. The trampoline of machine code on the stack is a hack, but this is indeed a static chain pointer; languages with proper support for nested functions would probably have the caller aware of it (like a lambda / closure) and passing a value in r10 when using using pointer to a nested function.

Non-leaf functions do not need to pass on their incoming r10 to their children unless they're "nested functions" in a language that supports that sort of thing (not C or C++). Therefore r10 is also a pure temporary in normal circumstances.

r10 and r11 are not arg-passing registers, unlike the other call-clobbered registers, so "wrapper" functions can use them (especially r11) without saving/restoring anything.

In a normal function, RBX, RBP, and RSP are call-preserved, along with R12..R15. All others can be clobbered without saving/restoring. (That includes xmm/ymm0..15 and zmm0..31, and the x87 stack, and the condition codes in RFLAGS).

Note that r8..15 need a REX prefix, even with 32-bit operand-size (like xor r10d, r10d). If you have some 64-bit non-pointer integers, then sure keep them in r8..r11 because you always need a REX prefix for 64-bit operand-size any time you use those values anyway.

Smaller code-size is usually not worse, and sometimes helps with decode and uop-cache density, and L1i cache density. RAX, RCX,RDX, RSI,RDI should be your first choices for scratch regs. (And use 32-bit operand-size unless you need 64-bit. e.g. xor eax,eax is the correct way to zero RAX. Silvermont doesn't recognize xor r10,r10 as a zeroing idiom, so use xor r10d,r10d even though it doesn't save code size.)

If you do run out of low registers, ideally use r10 / r11 for things that will normally be used with 64-bit operand-size (or VEX prefixes) anyway. e.g. pointers to 64-bit data or pointers to pointers. mov eax, [r10] needs a REX prefix while mov eax, [rdi] doesn't. But mov rax, [rdi] and mov r8, [r10] are the same size.

It's hard to gain much because you often need to use different values together in different combinations, like eventually using cmp eax, r10d or whatever, but if you want to go all-out on optimizing, then think about code-size. Maybe also think about where the instruction boundaries are and how it will fit into the uop cache.

See the x86 tag wiki, and especially http://agner.org/optimize/ for tips on writing efficient code.

You can use r10 and r11 as freely as rcx and rdx.

Acceptability of regular usage of r10 and r11

Tags:

Assembly

Calling Convention

Cpu Registers

X86 64

Abi

Related

Recent Posts