Is it legal to access struct members via offset pointers from other struct members?

Introduction: The standard is inadequate in this area, and there is decades of history of argument on this topic and strict aliasing with no convincing resolution or proposal to fix.

This answer reflects my view rather than any imposition of the Standard.


Firstly: it's generally agreed that the code in your first code sample is undefined behaviour due to accessing outside the bounds of an array via direct pointer arithmetic.

The rule is C11 6.5.6/8 . It says that indexing from a pointer must remain within "the array object" (or one past the end). It doesn't say which array object but it is generally agreed that in the case int *p = &foo.a; then "the array object" is foo.a, and not any larger object of which foo.a is a subobject.

Relevant links: one, two.


Secondly: it's generally agreed that both of your union examples are correct. The standard explicitly says that any member of a union may be read; and whatever the contents of the relevant memory location are are interpreted as the type of the union member being read.


You suggest that the union being correct implies that the first code should be correct too, but it does not. The issue is not with specifying the memory location read; the issue is with how we arrived at the expression specifying that memory location.

Even though we know that &foo.a + 1 and &foo.b are the same memory address, it's valid to access an int through the second and not valid to access an int through the first.

It's generally agreed that you can access the int by computing its address in other ways that don't break the 6.5.6/8 rule, e.g.:

((int *)((char *)&foo + offsetof(foo, b))[0]

or

((int *)((uintptr_t)&foo.a + sizeof(int)))[0]

Relevant links: one, two


It's not generally agreed on whether ((int *)&foo)[1] is valid. Some say it's basically the same as your first code, since the standard says "a pointer to an object, suitably converted, points to the element's first object". Others say it's basically the same as my (char *) example above because it follows from the specification of pointer casting. A few even claim it's a strict aliasing violation because it aliases a struct as an array.

Maybe relevant is N2090 - Pointer provenance proposal. This does not directly address the issue, and doesn't propose a repeal of 6.5.6/8.


According to C11 draft N1570 6.5p7, an attempt to access the stored value of a struct or union object using anything other than an lvalue of character type, the struct or union type, or a containing struct or union type, invokes UB even if behavior would otherwise be fully described by other parts of the Standard. This section contains no provision that would allow an lvalue of a non-character member type (or any non-character numeric type, for that matter) to be used to access the stored value of a struct or union.

According to the published Rationale document, however, the authors of the Standard recognized that different implementations offered different behavioral guarantees in cases where the Standard imposed no requirements, and regarded such "popular extensions" as a good and useful thing. They judged that questions of when and how such extensions should be supported would be better answered by the marketplace than by the Committee. While it may seem weird that the Standard would allow an obtuse compiler to ignore the possibility that someStruct.array[i] might affect the stored value of someStruct, the authors of the Standard recognized that any compiler whose authors aren't deliberately obtuse will support such a construct whether the Standard mandates or not, and that any attempt to mandate any kind of useful behavior from obtusely-designed compilers would be futile.

Thus, a compiler's level of support for essentially anything having to do with structures or unions is a quality-of-implementation issue. Compiler writers who are focused on being compatible with a wide range of programs will support a wide range of constructs. Those which are focused on maximizing the performance of code that needs only those constructs without which the language would be totally useless, will support a much narrower set. The Standard, however, is devoid of guidance on such issues.

PS--Compilers that are configured to be compatible with MSVC-style volatile semantics will interpret that qualifier as a indicating that an access to the pointer may have side-effects that interact with objects whose address has been taken and that aren't guarded by restrict, whether or not there is any other reason to expect such a possibility. Use of such a qualifier when accessing storage in "unusual" ways may make it more obvious to human readers that the code is doing something "weird" at the same time as it will thus ensure compatibility with any compiler that uses such semantics, even if such compiler would not otherwise recognize that access pattern. Unfortunately, some compiler writers refuse to support such semantics at anything other than optimization level 0 except with programs that demand it using non-standard syntax.