where/how does Apples GCC store DWARF inside an executable

On Mac OS X there was a decision to have the linker id not process all of the debug information when you link your program. The debug information is often 10xthe size of the program executable so having the linker process all of the debug info and include it in the executable binary was a serious detriment to link times. For iterative development - compile, link, compile, link, debug, compile link - this was a real hit.

Instead, the compiler generates the DWARF debug information in the .s files, the assembler outputs it in the .o files, and the linker includes a "debug map" in the executable binary which tells debug info users where all of the symbols were relocated during the link.

A consumer (doing .o-file debugging) loads the debug map from the executable and processes all of the DWARF in the .o files as-needed, remapping the symbols as per the debug map's instructions.

dsymutil can be thought of as a debug info linker. It does this same process -- read the debug map, load the DWARF from the .o files, relocate all the addresses -- and then outputs a single binary of all the DWARF at their final, linked addresses. This is the dSYM bundle.

Once you have a dSYM bundle, you've got plain old standard DWARF that any dwarf reading tool (which can deal with Mach-O binary files) can process.

There is an additional refinement that makes all of this work, the UUIDs included in Mach-O binaries. Every time the linker creates a binary, it emits a 128-bit UUID in the LC_UUID load command (v. otool -hlv or dwarfdump --uuid). This uniquely identifies that binary file. When dsymutil creates the dSYM, it includes that UUID. The debuggers will only associate a dSYM and an executable if they have matching UUIDs -- no dodgy file mod timestamps or anything like that.

We can also use the UUIDs to locate the dSYMs for binaries. They show up in crash reports, we include a Spotlight importer that you can use to search for them, e.g. mdfind "com_apple_xcode_dsym_uuids == E21A4165-29D5-35DC-D08D-368476F85EE1" if the dSYM is located in a Spotlight indexed location. You can even have a repository of dSYMs for your company and a program that can retrieve the correct dSYM given a UUID - maybe a little mysql database or something like that - so you run the debugger on a random executable and you instantly have all the debug info for that executable. There are some pretty neat things you can do with the UUIDs.

But anyway, to answer your original question: The unstripped binary has the debug map, the .o files have the DWARF, and when dsymutil is run these are combined to create the dSYM bundle.

If you want to see the debug map entries, do nm -pa executable and they're all there. They are in the form of the old stabs nlist records - the linker already knew how to process stabs so it was easiest to use them - but you'll see how it works without much trouble, maybe refer to some stabs documentation if you're uncertain.


It seems it actually doesn't.

I traced dsymutil and it reads all the *.o files. objdump -h also lists all the debug info in them.

So it seems that those info isn't copied over to the binary.


Some related comments about this can also be found here.