How fast is the go 1.5 gc with terabytes of RAM?

The new Java ZGC garbage collector can now use 16 Terrabytes of memory and garbage collect in under 10ms.


tl;dr:

  • You can't use TBs of RAM with a single Go process right now. Max is 512 GB on Linux, and most that I've seen tested is 240 GB.
  • With the current background GC, GC workload tends to be more important than GC pauses.
  • You can understand GC workload as pointers * allocation rate / spare RAM. Of apps using tons of RAM, only those with few pointers or little allocation will have a low GC workload.

I agree with inf's comment that huge heaps are worth asking other folks about (or testing). JimB notes that Go heaps have a hard limit of 512 GB right now, and 18 240 GB is the most I've seen tested.

Some things we know about huge heaps, from the design document and the GopherCon 2015 slides:

  • The 1.5 collector doesn't aim to cut GC work, just cut pauses by working in the background.
  • Your code is paused while the GC scans pointers on the stack and in globals.
  • The 1.5 GC has a short pause on a GC benchmark with a roughly 18GB heap, as shown by the rightmost yellow dot along the bottom of this graph from the GopherCon talk:

    GC Pauses vs. Heap Size showing well GCs of 18GB at multiple seconds under old versions and under 1 second for 1.5

Folks running a couple production apps that initially had about 300ms pauses reported drops to ~4ms and ~20ms. Another app reported their 95th percentile GC time went from 279ms to ~10ms.

Go 1.6 added polish and pushed some of the remaining work to the background. As a result, tests with heaps up to a bit over 200GB still saw a max pause time of 20ms, as shown in a slide in an early 2016 State of Go talk:

Graph of 1.6 GC times, hitting 20ms around 180GB

The same application that had 20ms pause times under 1.5 had 3-4ms pauses under 1.6, with about an 8GB heap and 150M allocations/minute.

Twitch, who use Go for their chat service, reported that by Go 1.7 pause times had been reduced to 1ms with lots of running goroutines.

1.8 took stack scanning out of the stop-the-world phase, bringing most pauses well under 1ms, even on large heaps. Early numbers look good. Occasionally applications still have code patterns that make a goroutine hard to pause, effectively lengthening the pause for all other threads, but generally it's fair to say the GC's background work is now usually much more important than GC pauses.


Some general observations on garbage collection, not specific to Go:

  • The frequency of collections depends on how quickly you use up the RAM you're willing to give to the process.
  • The amount of work each collection does depends in part on how many pointers are in use. (That includes the pointers within slices, interface values, strings, etc.)

Rephrased, an application accessing lots of memory might still not have a GC problem if it only has a few pointers (e.g., it handles relatively few large []byte buffers), and collections happen less often if the allocation rate is low (e.g., because you applied sync.Pool to reuse memory wherever you were chewing through RAM most quickly).

So if you're looking at something involving heaps of hundreds of GB that's not naturally GC-friendly, I'd suggest you consider any of

  1. writing in C or such
  2. moving the bulky data out of the object graph. For example, you could manage data in an embedded DB like bolt, put it in an outside DB service, or use something like groupcache or memcache if you want more of a cache than a DB
  3. running a set of smaller-heap'd processes instead of one big one
  4. just carefully prototyping, testing, and optimizing to avoid memory issues.