Is there a way to enforce specific endianness for a C or C++ struct?

A bit late to the party but with current GCC (tested on 6.2.1 where it works and 4.9.2 where it's not implemented) there is finally a way to declare that a struct should be kept in X-endian byte order.

The following test program:

#include <stdio.h>
#include <stdint.h>

struct __attribute__((packed, scalar_storage_order("big-endian"))) mystruct {
    uint16_t a;
    uint32_t b;
    uint64_t c;
};


int main(int argc, char** argv) {
    struct mystruct bar = {.a = 0xaabb, .b = 0xff0000aa, .c = 0xabcdefaabbccddee};

    FILE *f = fopen("out.bin", "wb");
    size_t written = fwrite(&bar, sizeof(struct mystruct), 1, f);
    fclose(f);
}

creates a file "out.bin" which you can inspect with a hex editor (e.g. hexdump -C out.bin). If the scalar_storage_order attribute is suppported it will contain the expected 0xaabbff0000aaabcdefaabbccddee in this order and without holes. Sadly this is of course very compiler specific.


The way I usually handle this is like so:

#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>

class be_uint16_t {
public:
        be_uint16_t() : be_val_(0) {
        }
        // Transparently cast from uint16_t
        be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
        }
        // Transparently cast to uint16_t
        operator uint16_t() const {
                return ntohs(be_val_);
        }
private:
        uint16_t be_val_;
} __attribute__((packed));

Similarly for be_uint32_t.

Then you can define your struct like this:

struct be_fixed64_t {
    be_uint32_t int_part;
    be_uint32_t frac_part;
} __attribute__((packed));

The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The be_uint16_t object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:

be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form

In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.

With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:

be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast

...but for most code, you can just use them as if they were built-in types.