Opinions on type-punning in C++?

Ignoring efficiency, for simplicity of code I'd do:

#include <numeric>
#include <vector>
#include <cstring>

uint32_t compute_checksum(const char *data, size_t size) {
    std::vector<uint32_t> intdata(size/sizeof(uint32_t));
    std::memcpy(&intdata[0], data, size);
    return std::accumulate(intdata.begin(), intdata.end(), 0);
}

I also like litb's last answer, the one that shifts each char in turn, except that since char might be signed, I think it needs an extra mask:

checksum += ((data[i] && 0xFF) << shift[i % 4]);

When type punning is a potential issue, I prefer not to type pun rather than to try to do so safely. If you don't create any aliased pointers of distinct types in the first place, then you don't have to worry what the compiler might do with aliases, and neither does the maintenance programmer who sees your multiple static_casts through a union.

If you don't want to allocate so much extra memory, then:

uint32_t compute_checksum(const char *data, size_t size) {
    uint32_t total = 0;
    for (size_t i = 0; i < size; i += sizeof(uint32_t)) {
        uint32_t thisone;
        std::memcpy(&thisone, &data[i], sizeof(uint32_t));
        total += thisone;
    }
    return total;
}

Enough optimisation will get rid of the memcpy and the extra uint32_t variable entirely on gcc, and just read an integer value unaligned, in whatever the most efficient way to do that is on your platform, straight out of the source array. I'd hope the same is true of other "serious" compilers. But this code is now bigger than litb's, so there's not much to be said for it other than mine is easier to turn into a function template that will work just as well with uint64_t, and mine works as native endian-ness rather than picking little-endian.

This is of course not completely portable. It assumes that the storage representation of sizeof(uint32_t) chars corresponds to the storage representation of a uin32_t in the way we want. This is implied by the question, since it states that one can be "treated as" the other. Endian-ness, whether a char is 8 bits, and whether uint32_t uses all bits in its storage representation can obviously intrude, but the question implies that they won't.


As far as the C++ standard is concerned, litb's answer is completely correct and the most portable. Casting const char *data to a const uint3_t *, whether it be via a C-style cast, static_cast, or reinterpret_cast, breaks the strict aliasing rules (see Understanding Strict Aliasing). If you compile with full optimization, there's a good chance that the code will not do the right thing.

Casting through a union (such as litb's my_reint) is probably the best solution, although it does technically violate the rule that if you write to a union through one member and read it through another, it results in undefined behavior. However, practically all compilers support this, and it results in the the expected result. If you absolutely desire to conform to the standard 100%, go with the bit-shifting method. Otherwise, I'd recommend going with casting through a union, which is likely to give you better performance.