ASM x86_64 AVX: xmm and ymm registers differences

According to wikipedia, in AVX:

YMM registers are 256 bits long.

XMM registers are 128 bits long and represent the lower 128 bits of the YMM registers.

The YMM and XMM registers are overlapping and XMM are contained in YMM.

Diagram from wikimedia:

https://commons.wikimedia.org/wiki/File:AVX_registers.svg


xmm0 is the low half of ymm0, exactly like eax is the low half of rax.

Writing to xmm0 (with a VEX-coded instruction, not legacy SSE) zeros the upper lane of ymm0, just like writing to eax zeros the upper half of rax to avoid false dependencies. Lack of zeroing the upper bytes for legacy SSE instructions is why there's a penalty for mixing AVX and legacy SSE instructions.

Most AVX instructions are available with either 128-bit or 256-bit size. e.g. vaddps xmm0, xmm1, xmm2 or vaddps ymm0, ymm1, ymm2. (The 256-bit versions of most integer instructions are only available in AVX2, with AVX only providing the 128-bit version. There are a couple exceptions, like vptest ymm, ymm in AVX1. And vmovdqu if you count that as an "integer" instruction).

Scalar instructions like vmovd, vcvtss2si, and vcvtsi2ss are only available with XMM registers. Reading a YMM register is not logically different from reading an XMM register, but writing the low element (and leaving the other elements unmodified, like the poorly-designed vcvtsi2ss does) would be different for XMM vs. YMM, because the YMM version would leave the upper lane not zeroed.


But scalar with ymm doesn't exist in the machine-code encoding, even for instructions where it would be really useful like vpinsrd / vpextrd (insert / extract a scalar).

Note that even though reading an XMM register and taking only the low scalar element is logically the same as YMM, for the actual implementation it would not be the same. Reading a YMM register implies an AVX-256 instruction, which would have to transition the CPU out of the "saved upper" state (for an Intel CPU with SSE/AVX transitions / states).

In any case, vcvtss2si rax, ymm0 is not encodeable, and the assembler doesn't magically assemble it as vcvtss2si rax, xmm0. If you're writing in asm, you're supposed to know exactly what you're doing. (Although some assemblers will optimize mov rax, 1 to mov eax, 1 for you, so letting you get away with writing ymm as a source register would work. But letting you write ymm as a destination register for vcvtsi2ss would change the meaning, so for consistency it's better that it doesn't do either).