Are there performance issues when using pragma pack(1)?

Memory access is fastest when it can take place at word-aligned memory addresses. The simplest example is the following struct (which @Didier also used):

struct sample {
   char a;
   int b;
};

By default, GCC inserts padding, so a is at offset 0, and b is at offset 4 (word-aligned). Without padding, b isn't word-aligned, and access is slower.

How much slower?

  • For 32-bit x86, according to the Intel 64 and IA32 Architectures Software Developer's Manual:
    The processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.
    As with most performance questions, you'd have to benchmark your application to see how much of an issue this is in practice.
  • According to Wikipedia, x86 extensions like SSE2 require word alignment.
  • Many other architectures require word alignment (and will generate SIGBUS errors if data structures aren't word-aligned).

Regarding portability: I assume that you're using #pragma pack(1) so that you can send structs across the wire and to and from disk without worrying about different compilers or platforms packing structs differently. This is valid, however, there are a couple of issues to keep in mind:

  • This does nothing to handle big endian versus little endian issues. You can handle these by calling the htons family of functions on any ints, unsigned, etc. in your structs.
  • In my experience, working with packed, serializable structs in application code isn't a lot of fun. They're very difficult to modify and extend without breaking backwards compatibility, and as already noted, there are performance penalties. Consider transferring your packed, serializable structs' contents into equivalent non-packed, extensible structs for processing, or consider using a full-fledged serialization library like Protocol Buffers (which has C bindings).

Yes. There absolutely are.

For instance, if you define a struct:

struct dumb {
    char c;
    int  i;
};

then whenever you access the member i, the CPU is slowed, because the 32 bits value i is not accessible in a native, aligned way. To make it simple, imagine that the CPU has to get 3 bytes from memory, and then 1 other byte from the next location to transfer the value from the memory to the CPU registers.


When you declare a struct, most of the compilers insert padding bytes between members to ensure that they are aligned to appropriate addresses in memory (usually the padding bytes are a multiple of the type's size). This enables the compiler to have optimized access in aceessing these members.

#pragma pack(1) instructs the compiler to pack structure members with particular alignment. The 1 here tells the compiler not to insert any padding between members.

So yes there is a definite performance penalty, since you force the compiler to do something beyond what it would naturally do for performance optimization.Also, some platforms demand that the objects be aligned at specific boundaries and using unalighed structures might give you segmentation faults.

Ideally, it is best to avoid changing the default natural alignment rules. But If the 'pragma pack' directive cannot be avoided at all (as in your case), then the original packing scheme must be restored after the definition of the structures that require tight packing.

For eg:

//push current alignment rules to internal stack and force 1-byte alignment boundary
#pragma pack(push,1)  

/*   definition of structures that require tight packing go in here   */

//restore original alignment rules from stack    
#pragma pack(pop)

Tags:

C

Gcc