Is casting to simd-type undefined behaviour in C++?
Edit: Please look at the answer in the duplicate (and/or Peter's answer here). What I write below is technically correct but not really relevant in practice.
Yes, that would be undefined behavior based on the C++ standard. Your compiler might still handle it correctly as an extension (seeing as SIMD types and intrinsics are not part of the C++ standard in the first place).
To do this safely and correctly without compromising speed, you would use the intrinsic for loading 4 floats directly from memory into a 128 bit register:
__m128 reg = _mm_load_ps(a);
See the Intel Intrinsics Guide for the important alignment constraint:
__m128 _mm_load_ps (float const* mem_addr)
Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into
mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
Intel's intrinsics API does define the behaviour of casting to
__m128* and dereferencing: it's identical to
_mm_load_ps on the same pointer.
double*, the load/store intrinsics basically exist to wrap this reinterpret cast and communicate alignment info to the compiler.
_mm_load_ps() is supported, the implementation must also define the behaviour of the code in the question.
I don't know if this is actually documented anywhere; maybe in an Intel tutorial or whitepaper, but it's the agreed-upon behaviour of all compilers and I think most people would agree that a compiler that didn't define this behaviour didn't fully support Intel's intrinsics API.
__m128 types are defined as
may_alias1, so like
char* you can point a
__m128* at anything, including
int or an arbitrary struct, and load or store through it without violating strict-aliasing. (As long as it's aligned by 16, otherwise you do need
_mm_loadu_ps, or a custom vector type declared with something like GNU C's
__attribute__((vector_size(16), may_alias)) in GNU C, and MSVC doesn't do type-based alias analysis.