Module 4 · Lesson 4 · ~20 min read

Profiling with pprof and trace

Two tools cover most production performance work in Go: pprof for "where's CPU and memory going?" and trace for "what's the goroutine scheduling actually doing?" Both ship with the standard library. Both are usable in 30 seconds of setup.

The five profile types

ProfileQuestion it answers
cpuWhere is CPU time being spent?
heapWhich lines allocated the bytes still live in memory?
allocsWhich lines allocated bytes ever (live or freed)?
goroutineHow many goroutines are alive and where are they parked?
block / mutexWhat's blocking on contention?

Two ways to expose pprof

1. Live HTTP endpoint — for services

import _ "net/http/pprof"  // blank import — registers handlers on http.DefaultServeMux

func main() {
    go http.ListenAndServe("localhost:6060", nil)
    // ... rest of your service
}

Now http://localhost:6060/debug/pprof/ serves profiles. Grab them via go tool pprof:

# 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# live heap snapshot
go tool pprof http://localhost:6060/debug/pprof/heap

# goroutine dump (text)
curl http://localhost:6060/debug/pprof/goroutine?debug=2
Production note

Don't expose :6060 publicly. The pprof endpoints leak runtime state and CPU info. Bind to localhost only, or to a separate admin network. The blank import auto-registers on the default mux — make sure your main mux isn't the same as the pprof mux.

2. Profile a test or benchmark

go test -bench=. -cpuprofile=cpu.out -memprofile=mem.out
go tool pprof cpu.out

Inside go tool pprof interactive mode:

Interpreting a flame graph in 30 seconds

Don't optimize without a profile. Engineers' intuition for "where is time spent" is wrong more than half the time.

The trace tool — different from pprof

pprof samples (statistical). runtime/trace records every scheduling event (exact). It's heavier but tells you things pprof can't:

// Capture a 5-second trace from your service:
curl -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=5'

// View it:
go tool trace trace.out

This opens a browser UI with several views — most useful is "Goroutine analysis" which shows what each goroutine actually did during the trace.

Common findings — what profiles tell you

Symptom in CPU profileLikely cause
Lots of time in runtime.mallocgcExcessive allocation. Check allocs profile.
Lots of time in runtime.gcBgMarkWorkerGC pressure. Reduce allocations or raise GOGC.
Lots of time in syscallHeavy I/O. Batch, cache, or stream more efficiently.
Wide tower under encoding/json.DecodeJSON is expensive. Consider streaming decoder, faster encoder, protobuf.
Lots of time under regexp.compileYou're recompiling regexes per call. regexp.MustCompile at package init.

Goroutine leak detection

Live goroutine count climbing over time = leak. Check:

curl http://localhost:6060/debug/pprof/goroutine?debug=2

You get a stack trace per goroutine. Look for many goroutines parked at the same line — that's where they're stuck. Common causes: a channel that's no one's writing to, a context that never cancels, a blocking call without a timeout.

Benchmarking — the lightweight tool

func BenchmarkSubmit(b *testing.B) {
    s := NewSubmitter()
    cmd := Command{ID: "x"}
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        s.Submit(cmd)
    }
}
go test -bench=BenchmarkSubmit -benchmem -count=10
# BenchmarkSubmit-8     1234567   872 ns/op   128 B/op   3 allocs/op

-benchmem reports allocations. -count=10 runs 10 iterations so you can use benchstat to compare before/after with statistical significance.

The performance loop in practice

  1. Measure first. Without a baseline you can't tell if a change helped.
  2. Profile under realistic load. A microbenchmark of one function can mislead vs production traffic.
  3. Optimize the widest box. 100x speedup of a 0.1% function is noise; 2x speedup of a 30% function is real.
  4. Re-measure. Confirm the change actually helped on the metric you care about.
  5. Stop when good enough. Performance work has diminishing returns.

Takeaways