Is type-punning through a union unspecified in C99, and has it become specified in C11?

The behavior of type punning with union changed from C89 to C99. The behavior in C99 is the same as C11.

As Wug noted in his answer, type punning is allowed in C99 / C11. An unspecified value that could be a trap is read when the union members are of different size.

The footnote was added in C99 after Clive D.W. Feather Defect Report #257:

Finally, one of the changes from C90 to C99 was to remove any restriction on accessing one member of a union when the last store was to a different one. The rationale was that the behaviour would then depend on the representations of the values. Since this point is often misunderstood, it might well be worth making it clear in the Standard.

[...]

To address the issue about "type punning", attach a new footnote 78a to the words "named member" in 6.5.2.3#3: 78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

The wording of Clive D.W. Feather was accepted for a Technical Corrigendum in the answer by the C Committee for Defect Report #283.

The original C99 specification left this unspecified.

One of the technical corrigenda to C99 (TR2, I think) added footnote 82 to correct this oversight:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

That footnote is retained in the C11 standard (it's footnote 95 in C11).

This has always been "iffy". As others have noted a footnote was added to C99 via a Technical Corregendum. It reads as follows:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

However, footnotes are specified in the Foreword as non-normative:

Annexes D and F form a normative part of this standard; annexes A, B, C, E, G, H, I, J, the bibliography, and the index are for information only. In accordance with Part 3 of the ISO/IEC Directives, this foreword, the introduction, notes, footnotes, and examples are also for information only.

That is, the footnotes cannot proscribe behaviour; they should only clarify the existing text. It's an unpopular opinion, but the footnote quoted above actually fails in this regard - there is no such behaviour proscribed in the normative text. Indeed, there are contradictory sections, such as 6.7.2.1:

... The value of at most one of the members can be stored in a union object at any time

In conjunction with 6.5.2.3 (regarding accessing union members with the "." operator):

The value is that of the named member

I.e. if the value of only one member can be stored, the value of another member is non-existent. This strongly implies that type punning via a union should not be possible; the member access yields a non-existent value. The same text still exists in the C11 document.

However, it's clear that the purpose of adding the footnote was to allow for type-punning; it's just that the committee seemingly broke the rules on footnotes not containing normative text. To accept the footnote, you really have to disregard the section that says footnotes aren't normative, or otherwise try to figure out how to interpret the normative text in such a way that supports the conclusion of the footnote (which I have tried, and failed, to do).

About the best we can do to ratify the footnote is to make some assumptions about the definition of a union as a set of "overlapping objects", from 6.2.5:

A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type

Unfortunately there is no elaboration on what is meant by "overlapping". An object is defined as a (3.14) "region of data storage in the execution environment, the contents of which can represent values" (that the same region of storage can be identified by two or more distinct objects is implied by the "overlapping objects" definition above, that is, objects have an identity which is separate to their storage region). The reasonable assumption seems to be that union members (of a particular union instance) use the same storage region.

Even if we ignore 6.7.2.1/6.5.2.3 and allow, as the footnote suggests, that reading any union member returns the value that would be represented by the contents of the corresponding storage region—which would therefore allow for type punning—the ever-problematic strict-aliasing rule in 6.5 disallows (with certain minor exceptions) accessing an object other than by its type. Since an "access" is an (3.1) "〈execution-time action〉 to read or modify the value of an object", and since modifying one of a set of overlapping objects necessarily modifies the others, then the strict-aliasing rule could potentially be violated by writing to a union member (regardless of whether it is then read through another, or not).

For example, by the wording of the standard, the following is illegal:

union {
   int a;
   float b;
} u;

u.a = 0; // modifies a float object by an lvalue of type int
int *pa = &u.a;
*pa = 1; // also modifies a float object, without union lvalue involved

(Specifically, the two commented lines break the strict-aliasing rule).

Strictly speaking, the footnote speaks to a separate issue, that of reading an inactive union member; however the strict-aliasing rule in conjunction with other sections as noted above seriously limits its applicability and in particular means that it does not allow type-punning in general (but only for specific combinations of types).

Frustratingly, the committee responsible for developing the standard seem to intend for type-punning to generally be possible via a union, and yet do not appear to be troubled that the text of the standard still disallows it.

Worth noting also is that the consensus understanding (by compiler vendors) seems to be that type punning via a union is allowed, but "access must be via the union type" (eg the first commented line in the example above, but not the second). It's a little unclear whether this should apply to both read and write accesses, and is in no way supported by the text of the standard (disregarding the footnote).

In conclusion: while it is largely accepted that type punning via a union is legal (most consider it allowed only if the access is done "via the union type", so to speak), the wording of the standard prohibits it in all but certain trivial cases.

The section you quote:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

... has to be read carefully, though. "The bytes of the object representation that do not correspond to that member" is referring to bytes beyond the size of the member, which isn't itself an issue for type punning (except that you cannot assume writing to a union member will leave the "extra" part of any larger member untouched).

Is type-punning through a union unspecified in C99, and has it become specified in C11?

Tags:

C

Type Punning

C99

Unions

C11

Related

Recent Posts