Getting the offset of a variable inside a struct is based on the NULL pointer, but why?

I found a trick on a youtube video explaining how you can get the offset of a struct member by using a NULL pointer.

Well, at least you came here to ask about the random Internet advice you turned up. We're an Internet resource ourselves, of course, but I like to think that our structure and reputation gives you a basis for estimating the reliability of what we have to say.

I understand the code snippit below (the casts, the ampersand, and so on), but I do not understand why this works with the NULL pointer. I thought that the NULL pointer could not point to anything.

Yes, from the perspective of C semantics, a null pointer definitely does not point to anything, and NULL is a null pointer constant.

So I cannot mentally visualize how it works.

The (flawed) idea is that

  • NULL is equivalent to a pointer to address 0 in a flat address space (unsafe assumption);
  • ((MyStructType * )NULL)->c designates the member c of an altogether hypothetical object of type MyStructType residing at that address (not supported by the standard);
  • applying the & operator yields the address that such a member would have if it in fact existed (not supported by the standard); and
  • converting the resulting address to an integer yields an address in the assumed flat address space, expressed in units the size of a C char (in no way guaranteed);
  • so that the resulting integer simultaneously represents both an absolute address and an offset (follows from the previous assumptions, because the supposed base address of the hypothetical structure is 0).

Second, the NULL pointer is not always represented by the compiler as being 0, somtimes it is a non-zero value.

Quite right, that is one of the flaws in the scheme presented.

But than how could this piece of code work correctly ? Or wouldn't it work correctly anymore ?

Although the Standard provides no basis to justify relying on the code to behave as advertised, that does not mean that it must necessarily fail. C implementations do need to be internally consistent about how they represent null pointers, and -- to a certain degree -- about how they convert between pointers and integer. It turns out to be fairly common that the code's assumptions about those things are in fact satisfied by implementations.

So in practice, the code does work with many C implementations. But it systematically produces the wrong answer with some others, and there may be some in which it produces the right answer some appreciable fraction of the time, but the wrong answer the rest of the time.


Note that this code is actually undefined behaviour. Dereferencing a NULL pointer is never allowed, even if no value is accessed, only the address (this was a root cause for a linux kernel exploit)

Use offsetof instead for a save alternative.


As to why it seems works with a NULL pointer: it assumes that NULL is 0. Basically you could use any pointer and calculate:

MyStructType t; 
unsigned off = (unsigned)(&(&t)->c) - (unsigned)&t;

if &t == 0, this becomes:

 unsigned off = (unsigned)(&(0)->c) - 0;

Substracting 0 is a no-op


This code is platform specific. This code might cause undefined behaviour on one platform and it might work on others. That's why the C standard requires every library to implement the offsetof macro which could expand to code like derefering the NULL pointer, at least you can be sure the code will not crash on any platform

typedef struct Struct
{
  double d;
} Struct;

offsetof(Struct, d)

Tags:

C