Module 6 · Lesson 4 · ~20 min read

Structured Logging, Metrics, Tracing

A Canton-adjacent service in production needs three things to debug at 2 AM: logs (what happened), metrics (how often, how long), traces (where in the call chain). Each has a Go-canonical answer.

Structured logging — slog

Go 1.21+ ships log/slog, a structured logger in the standard library. Use it. Skip the legacy log package for anything new.

import "log/slog"

slog.Info("command submitted",
    "command_id", cmd.ID,
    "party", cmd.Party,
    "duration_ms", dur.Milliseconds(),
)

Output (JSON handler):

{"time":"2026-04-23T10:30:00Z","level":"INFO","msg":"command submitted","command_id":"abc","party":"Alice","duration_ms":42}

The arguments after the message are alternating key-value pairs. There's also a typed form:

slog.Info("command submitted",
    slog.String("command_id", cmd.ID),
    slog.Int64("duration_ms", dur.Milliseconds()),
)

The typed form is faster (no reflection) and clearer in hot paths. Use either.

Configure a JSON handler at startup

handler := slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{
    Level: slog.LevelInfo,
})
slog.SetDefault(slog.New(handler))

JSON to stderr is the universal pattern. Containerized envs (k8s, Docker) capture stderr, your log shipper picks it up, your log aggregator (Loki, Splunk, ELK, Datadog) parses the JSON.

Per-request loggers via context

type loggerKey struct{}

func WithLogger(ctx context.Context, l *slog.Logger) context.Context {
    return context.WithValue(ctx, loggerKey{}, l)
}

func Logger(ctx context.Context) *slog.Logger {
    if l, ok := ctx.Value(loggerKey{}).(*slog.Logger); ok {
        return l
    }
    return slog.Default()
}

// At the request boundary:
reqLog := slog.Default().With("request_id", reqID, "trace_id", traceID)
ctx = WithLogger(ctx, reqLog)

// Anywhere downstream:
Logger(ctx).Info("validating command", "command_id", cmd.ID)
// → automatically includes request_id and trace_id from the With() call

Every log line for that request now carries the request and trace IDs. When debugging, you grep one ID and get the full story.

Metrics — Prometheus is the standard

The github.com/prometheus/client_golang library is the de-facto standard for Go metrics in the cloud-native ecosystem. Three primitive types:

TypeUse for
CounterMonotonically increasing — request counts, error counts, bytes processed.
GaugeUp-and-down values — connections open, queue depth, memory in use.
Histogram / SummaryDistributions — request durations, response sizes.
var (
    submitCount = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "submitter_submits_total",
        Help: "Total commands submitted, by outcome",
    }, []string{"outcome"})

    submitDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
        Name:    "submitter_submit_duration_seconds",
        Help:    "Submit RPC duration",
        Buckets: prometheus.DefBuckets,
    }, []string{"outcome"})
)

func Submit(ctx context.Context, cmd Command) error {
    start := time.Now()
    err := doSubmit(ctx, cmd)
    outcome := "success"
    if err != nil { outcome = "error" }
    submitCount.WithLabelValues(outcome).Inc()
    submitDuration.WithLabelValues(outcome).Observe(time.Since(start).Seconds())
    return err
}

Then expose /metrics:

import "github.com/prometheus/client_golang/prometheus/promhttp"

http.Handle("/metrics", promhttp.Handler())

Prometheus scrapes that endpoint; Grafana renders dashboards; alerts fire when counters spike or histograms shift.

Label hygiene

Labels with high cardinality kill Prometheus. Don't use:

Do use:

Tracing — OpenTelemetry

For distributed systems, tracing connects logs across services. A "trace" is a tree of spans; each span is a unit of work with a start, end, parent, and attributes.

OpenTelemetry (OTel) is the cross-language standard. Go has go.opentelemetry.io/otel + provider packages.

import "go.opentelemetry.io/otel"

var tracer = otel.Tracer("submitter")

func Submit(ctx context.Context, cmd Command) error {
    ctx, span := tracer.Start(ctx, "Submit")
    defer span.End()

    span.SetAttributes(
        attribute.String("command_id", cmd.ID),
        attribute.String("party", cmd.Party),
    )

    err := doSubmit(ctx, cmd)  // downstream calls inherit the trace via ctx
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
    }
    return err
}

The tracing context propagates through your context.Context. When you call gRPC or HTTP downstream with that context, OTel injects the trace headers automatically. The downstream service's spans become children of yours, and a tracing backend (Jaeger, Tempo, Datadog APM) reconstructs the tree.

OTel + gRPC interceptors

import "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"

s := grpc.NewServer(grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()))

conn, _ := grpc.NewClient(addr,
    grpc.WithUnaryInterceptor(otelgrpc.UnaryClientInterceptor()),
)

One line each side. Now every gRPC call is automatically a span, with the right attributes (method name, status code), and the context propagates trace headers across the call.

The three together

Best practice in mature production systems:

All three should share IDs. Add the trace ID to every log line; emit a metric with status labels matching trace span statuses. Then "I see a slow span in the trace" becomes "let me grep logs for that trace ID and see what the system was doing."

What Canton typically expects

If you're shipping a Go service alongside Canton:

Match these and your service slots into the operator's observability stack without surprise.

Takeaways