Module 4 · Lesson 3 · ~20 min read

Runtime, GC, and GOMAXPROCS

Go has a garbage collector and a goroutine scheduler. They mostly do the right thing, you mostly don't tune them, but knowing how they work helps you read incident reports and not pursue dead-end "optimizations."

The garbage collector — what it is

Go's GC is a concurrent, tri-color, mark-and-sweep collector with non-generational design. Words to internalize:

Concurrent — runs alongside your program, not in a stop-the-world phase. Only brief pauses (sub-millisecond on modern Go) for synchronization.
Mark-and-sweep — finds reachable objects (mark), reclaims unreachable ones (sweep).
Tri-color — the algorithm uses three colors (white = candidate, gray = found but not scanned, black = scanned) to do incremental marking without re-scanning.
Non-generational — every object is treated equally regardless of age. Different from JVM/V8.

Result: very low pause times even for large heaps. Trade-off: throughput is somewhat lower than a stop-the-world GC could achieve, and the GC is not free — it does run, and it does cost CPU.

What you can tune (and shouldn't, usually)

Knob	What it does	When to touch
`GOGC=100` (env)	Trigger GC when heap grows by 100% (the default). Lower = more frequent GC, less memory; higher = less GC, more memory.	Memory-constrained envs (lower it); CPU-bound batch jobs (raise it or set `GOGC=off`).
`GOMEMLIMIT` (env)	Soft heap-size limit — GC works harder as you approach it. Different from a hard OOM cap.	Containerized services where you know your memory budget.
`GOMAXPROCS`	Number of OS threads used for executing Go code. Default = number of CPUs.	When you've explicitly cgroup-limited CPUs and the runtime can't see it (older containers).
`runtime.GC()`	Force a collection now. Almost never use — the runtime knows better.	Benchmarks; tests measuring "is this leak?" Almost nothing else.

The scheduler — GMP, briefly

Recap from Module 3:

G = goroutine. Cheap, ~2KB stack, dynamically grown.
M = OS thread.
P = processor / scheduler context. Number = GOMAXPROCS.

The scheduler runs GOMAXPROCS P's in parallel, each driving its own M, each picking goroutines off a local run queue. When a goroutine blocks on I/O or syscall, the runtime parks it (the M can grab another G to run). When a G has been running too long without yielding, the scheduler preempts it (since Go 1.14, asynchronously).

Concretely: you don't generally manage threads yourself. The runtime maps your N goroutines onto its M OS threads transparently.

Containers and GOMAXPROCS — a real production trap

Container runtimes report the host machine's CPU count, not the container's CPU limit. So a container with a 2-CPU cgroup limit, running on a 64-core host, sees runtime.NumCPU() = 64, sets GOMAXPROCS = 64, spawns 64 P's, and creates contention for the 2 CPUs it actually has.

Fix: set GOMAXPROCS explicitly via env var. Or use automaxprocs, an Uber library that reads the cgroup limit at startup. Real Canton-adjacent ops services should always handle this.

Stack growth

Each goroutine starts with a tiny stack (~2KB) and grows by allocating new larger stacks and copying. This is invisible to your code — function calls just work. The cost is small but real for very deep call stacks; the savings are enormous for high goroutine counts.

You can't control stack size directly. runtime/debug.SetMaxStack sets a maximum (default 1GB). If a goroutine exceeds it, runtime panics with stack overflow.

Allocation and escape

Quick recap from Module 1: when the compiler can prove a value's lifetime is bounded by its enclosing function, it allocates on the stack (free). Otherwise it "escapes" to the heap, where the GC manages it.

You can see what escapes:

$ go build -gcflags='-m' ./...
./foo.go:12:6: can inline NewClient
./foo.go:13:9: &Client{...} escapes to heap

You don't usually optimize for this. Allocation reduction matters when profiling shows GC pressure. Then yes, look at escape analysis. Otherwise, write clear code.

How to know if GC is hurting you

Set GODEBUG=gctrace=1 at startup. Each GC cycle logs to stderr — pause times, heap growth, CPU%.
Look at runtime metrics: runtime.ReadMemStats exposes NumGC, PauseTotalNs, HeapAlloc, etc.
Profile with pprof (next lesson). The heap profile shows allocation hotspots.

What to actually do about GC

For 95% of services: nothing. The GC defaults are good.

For the other 5%, in priority order:

Reduce allocation. Pre-allocate slices with known capacity. Reuse buffers via sync.Pool for high-churn temporaries. Stop creating tons of short-lived objects in hot loops.
Set GOMEMLIMIT if running in a containerized environment with a known memory budget.
Bump GOGC if you're memory-rich and CPU-poor (less frequent GC, larger heap, more CPU for your code).

Don't reach for these without profiling first. Premature GC tuning has caused more incidents than it's fixed.

sync.Pool for high-allocation hot paths

var bufPool = sync.Pool{
    New: func() any { return make([]byte, 4096) },
}

func handle(req *Request) {
    buf := bufPool.Get().([]byte)
    defer bufPool.Put(buf)
    // ... use buf ...
}

Reuses buffers across requests. The pool may evict pooled items at any time (especially on GC), so it's only for temporaries — not for long-lived caches. Useful when handlers allocate the same shape over and over and you've shown via profile that allocation is hot.

Takeaways

GC is concurrent, mark-sweep, non-generational. Pauses are sub-millisecond.
You usually don't tune. GOGC, GOMEMLIMIT, GOMAXPROCS are the knobs when you do.
Containers + GOMAXPROCS bug: handle it explicitly or use automaxprocs.
Stack growth is automatic. Goroutines are cheap.
Escape analysis decides heap vs stack. Inspect with -gcflags='-m' if perf-tuning.
sync.Pool for high-churn hot path temporaries — never for long-lived state.
Profile first; tune second.