Module 4 · Lesson 3 · ~20 min read
Go has a garbage collector and a goroutine scheduler. They mostly do the right thing, you mostly don't tune them, but knowing how they work helps you read incident reports and not pursue dead-end "optimizations."
Go's GC is a concurrent, tri-color, mark-and-sweep collector with non-generational design. Words to internalize:
Result: very low pause times even for large heaps. Trade-off: throughput is somewhat lower than a stop-the-world GC could achieve, and the GC is not free — it does run, and it does cost CPU.
| Knob | What it does | When to touch |
|---|---|---|
GOGC=100 (env) | Trigger GC when heap grows by 100% (the default). Lower = more frequent GC, less memory; higher = less GC, more memory. | Memory-constrained envs (lower it); CPU-bound batch jobs (raise it or set GOGC=off). |
GOMEMLIMIT (env) | Soft heap-size limit — GC works harder as you approach it. Different from a hard OOM cap. | Containerized services where you know your memory budget. |
GOMAXPROCS | Number of OS threads used for executing Go code. Default = number of CPUs. | When you've explicitly cgroup-limited CPUs and the runtime can't see it (older containers). |
runtime.GC() | Force a collection now. Almost never use — the runtime knows better. | Benchmarks; tests measuring "is this leak?" Almost nothing else. |
Recap from Module 3:
GOMAXPROCS.The scheduler runs GOMAXPROCS P's in parallel, each driving its own M, each picking goroutines off a local run queue. When a goroutine blocks on I/O or syscall, the runtime parks it (the M can grab another G to run). When a G has been running too long without yielding, the scheduler preempts it (since Go 1.14, asynchronously).
Concretely: you don't generally manage threads yourself. The runtime maps your N goroutines onto its M OS threads transparently.
Container runtimes report the host machine's CPU count, not the container's CPU limit. So a container with a 2-CPU cgroup limit, running on a 64-core host, sees runtime.NumCPU() = 64, sets GOMAXPROCS = 64, spawns 64 P's, and creates contention for the 2 CPUs it actually has.
Fix: set GOMAXPROCS explicitly via env var. Or use automaxprocs, an Uber library that reads the cgroup limit at startup. Real Canton-adjacent ops services should always handle this.
Each goroutine starts with a tiny stack (~2KB) and grows by allocating new larger stacks and copying. This is invisible to your code — function calls just work. The cost is small but real for very deep call stacks; the savings are enormous for high goroutine counts.
You can't control stack size directly. runtime/debug.SetMaxStack sets a maximum (default 1GB). If a goroutine exceeds it, runtime panics with stack overflow.
Quick recap from Module 1: when the compiler can prove a value's lifetime is bounded by its enclosing function, it allocates on the stack (free). Otherwise it "escapes" to the heap, where the GC manages it.
You can see what escapes:
$ go build -gcflags='-m' ./...
./foo.go:12:6: can inline NewClient
./foo.go:13:9: &Client{...} escapes to heap
You don't usually optimize for this. Allocation reduction matters when profiling shows GC pressure. Then yes, look at escape analysis. Otherwise, write clear code.
GODEBUG=gctrace=1 at startup. Each GC cycle logs to stderr — pause times, heap growth, CPU%.runtime.ReadMemStats exposes NumGC, PauseTotalNs, HeapAlloc, etc.pprof (next lesson). The heap profile shows allocation hotspots.For 95% of services: nothing. The GC defaults are good.
For the other 5%, in priority order:
sync.Pool for high-churn temporaries. Stop creating tons of short-lived objects in hot loops.GOMEMLIMIT if running in a containerized environment with a known memory budget.GOGC if you're memory-rich and CPU-poor (less frequent GC, larger heap, more CPU for your code).Don't reach for these without profiling first. Premature GC tuning has caused more incidents than it's fixed.
var bufPool = sync.Pool{
New: func() any { return make([]byte, 4096) },
}
func handle(req *Request) {
buf := bufPool.Get().([]byte)
defer bufPool.Put(buf)
// ... use buf ...
}
Reuses buffers across requests. The pool may evict pooled items at any time (especially on GC), so it's only for temporaries — not for long-lived caches. Useful when handlers allocate the same shape over and over and you've shown via profile that allocation is hot.
GOGC, GOMEMLIMIT, GOMAXPROCS are the knobs when you do.automaxprocs.-gcflags='-m' if perf-tuning.sync.Pool for high-churn hot path temporaries — never for long-lived state.