Was there a specific reason why garbage collection was not designed for C?

C was invented back in the early 1970s for writing operating systems and other low level stuff. Garbage collectors were around (eg. early versions of Smalltalk) but I doubt they were up to the task of running in such a lightweight environment, and there would be all the complications of working with very low level buffers and pointers.


Garbage collection has been implemented for C (e.g., the Boehm-Demers-Weiser collector). C wasn't specified to include GC when it was new for a number of reasons -- largely because for the hardware they were targeting and system they were building, it just didn't make much sense.

Edit (to answer a few allegations raised elsethread):

  1. To make conservative GC well-defined, you basically only have to make one change to the language: say that anything that makes a pointer temporarily "invisible" leads to undefined behavior. For example, in current C you can write a pointer out to a file, overwrite the pointer in memory, later read it back in, and (assuming it was previously valid) still access the data it points at. A GC wouldn't necessarily "realize" that pointer existed, so it could see the memory as no longer being accessible, and therefore open to collection, so the later dereference wouldn't "work".

  2. As far as garbage collection being non-deterministic: there are real-time collectors that are absolutely deterministic and can be used in hard real-time systems. There are also deterministic heap managers for manual management, but most manual managers are not deterministic.

  3. As far as garbage collection being slow and/or thrashing the cache: technically, this is sort of true, but it's purely a technicality. While designs (e.g., generational scavenging) that (at least mostly) avoid these problems are well known, it's open to argument that they're not exactly garbage collection (even though they do pretty much the same things for the programmer).

  4. As for the GC running at unknown or unexpected times: this isn't necessarily any more or less true than with manually managed memory. You can have a GC run in a separate thread that runs (at least somewhat) unpredictably. The same is true of coalescing free blocks with manual memory management. A particular attempt at allocating memory can trigger a collection cycle, leading to some allocations being much slower than others; the same is true with a manual manager that uses lazy coalescing of free blocks.

  5. Oddly, GC is much less compatible with C++ than with C. Most C++ depends on destructors being invoked deterministically, but with garbage collection that's no longer the case. This breaks lots of code -- and the better written the code, the bigger of a problem it generally causes.

  6. Likewise, C++ requires that std::less<T> provide meaningful (and, more importantly, consistent) results for pointers, even when they point to entirely independent objects. It would require some extra work to meet this requirement with a copying collector/scavenger (but I'm pretty sure it is possible). It's more difficult still to deal with (for example) somebody hashing an address and expecting consistent results. This is generally a poor idea, but it's still possible, and should produce consistent results.


Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.

The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by malloc.

What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.

Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:

  • Unpredictability
  • Cache pollution
  • Time spent walking all memory

The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in malloc/free that the overhead is not even measurable.