Module 6 · Lesson 4 · ~20 min read

Structured Logging, Metrics, Tracing

A Canton-adjacent service in production needs three things to debug at 2 AM: logs (what happened), metrics (how often, how long), traces (where in the call chain). Each has a Go-canonical answer.

Structured logging — slog

Go 1.21+ ships log/slog, a structured logger in the standard library. Use it. Skip the legacy log package for anything new.

import "log/slog"

slog.Info("command submitted",
    "command_id", cmd.ID,
    "party", cmd.Party,
    "duration_ms", dur.Milliseconds(),
)

Output (JSON handler):

{"time":"2026-04-23T10:30:00Z","level":"INFO","msg":"command submitted","command_id":"abc","party":"Alice","duration_ms":42}

The arguments after the message are alternating key-value pairs. There's also a typed form:

slog.Info("command submitted",
    slog.String("command_id", cmd.ID),
    slog.Int64("duration_ms", dur.Milliseconds()),
)

The typed form is faster (no reflection) and clearer in hot paths. Use either.

Configure a JSON handler at startup

handler := slog.NewJSONHandler(os.Stderr, &slog.HandlerOptions{
    Level: slog.LevelInfo,
})
slog.SetDefault(slog.New(handler))

JSON to stderr is the universal pattern. Containerized envs (k8s, Docker) capture stderr, your log shipper picks it up, your log aggregator (Loki, Splunk, ELK, Datadog) parses the JSON.

Per-request loggers via context

type loggerKey struct{}

func WithLogger(ctx context.Context, l *slog.Logger) context.Context {
    return context.WithValue(ctx, loggerKey{}, l)
}

func Logger(ctx context.Context) *slog.Logger {
    if l, ok := ctx.Value(loggerKey{}).(*slog.Logger); ok {
        return l
    }
    return slog.Default()
}

// At the request boundary:
reqLog := slog.Default().With("request_id", reqID, "trace_id", traceID)
ctx = WithLogger(ctx, reqLog)

// Anywhere downstream:
Logger(ctx).Info("validating command", "command_id", cmd.ID)
// → automatically includes request_id and trace_id from the With() call

Every log line for that request now carries the request and trace IDs. When debugging, you grep one ID and get the full story.

Metrics — Prometheus is the standard

The github.com/prometheus/client_golang library is the de-facto standard for Go metrics in the cloud-native ecosystem. Three primitive types:

Type	Use for
Counter	Monotonically increasing — request counts, error counts, bytes processed.
Gauge	Up-and-down values — connections open, queue depth, memory in use.
Histogram / Summary	Distributions — request durations, response sizes.

var (
    submitCount = promauto.NewCounterVec(prometheus.CounterOpts{
        Name: "submitter_submits_total",
        Help: "Total commands submitted, by outcome",
    }, []string{"outcome"})

    submitDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
        Name:    "submitter_submit_duration_seconds",
        Help:    "Submit RPC duration",
        Buckets: prometheus.DefBuckets,
    }, []string{"outcome"})
)

func Submit(ctx context.Context, cmd Command) error {
    start := time.Now()
    err := doSubmit(ctx, cmd)
    outcome := "success"
    if err != nil { outcome = "error" }
    submitCount.WithLabelValues(outcome).Inc()
    submitDuration.WithLabelValues(outcome).Observe(time.Since(start).Seconds())
    return err
}

Then expose /metrics:

import "github.com/prometheus/client_golang/prometheus/promhttp"

http.Handle("/metrics", promhttp.Handler())

Prometheus scrapes that endpoint; Grafana renders dashboards; alerts fire when counters spike or histograms shift.

Label hygiene

Labels with high cardinality kill Prometheus. Don't use:

User IDs.
Request IDs.
Free-form error messages.

Do use:

HTTP method.
Status code (or status code class).
Endpoint name (templated, not the full URL with IDs).
Outcome class (success/transient/permanent).

Tracing — OpenTelemetry

For distributed systems, tracing connects logs across services. A "trace" is a tree of spans; each span is a unit of work with a start, end, parent, and attributes.

OpenTelemetry (OTel) is the cross-language standard. Go has go.opentelemetry.io/otel + provider packages.

import "go.opentelemetry.io/otel"

var tracer = otel.Tracer("submitter")

func Submit(ctx context.Context, cmd Command) error {
    ctx, span := tracer.Start(ctx, "Submit")
    defer span.End()

    span.SetAttributes(
        attribute.String("command_id", cmd.ID),
        attribute.String("party", cmd.Party),
    )

    err := doSubmit(ctx, cmd)  // downstream calls inherit the trace via ctx
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
    }
    return err
}

The tracing context propagates through your context.Context. When you call gRPC or HTTP downstream with that context, OTel injects the trace headers automatically. The downstream service's spans become children of yours, and a tracing backend (Jaeger, Tempo, Datadog APM) reconstructs the tree.

OTel + gRPC interceptors

import "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"

s := grpc.NewServer(grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()))

conn, _ := grpc.NewClient(addr,
    grpc.WithUnaryInterceptor(otelgrpc.UnaryClientInterceptor()),
)

One line each side. Now every gRPC call is automatically a span, with the right attributes (method name, status code), and the context propagates trace headers across the call.

The three together

Best practice in mature production systems:

Logs — searchable narrative. Grep them when investigating.
Metrics — aggregate numbers for dashboards and alerts.
Traces — call-chain visualization for debugging slow or failed requests.

All three should share IDs. Add the trace ID to every log line; emit a metric with status labels matching trace span statuses. Then "I see a slow span in the trace" becomes "let me grep logs for that trace ID and see what the system was doing."

What Canton typically expects

If you're shipping a Go service alongside Canton:

Logs — JSON to stderr, with a configurable level.
/metrics Prometheus endpoint on a separate admin port (typically not exposed publicly).
OTel traces exported to whatever backend the operator runs (Jaeger, Tempo, vendor APM).
/healthz and gRPC health checks.

Match these and your service slots into the operator's observability stack without surprise.

Takeaways

log/slog for structured logging. JSON to stderr in production.
Per-request logger via context, including request and trace IDs.
Prometheus client for metrics: counters, gauges, histograms. Watch label cardinality.
OpenTelemetry for traces. otelgrpc wires it into gRPC in one line per side.
Share IDs across logs/metrics/traces. They're complementary, not redundant.
Match the operator's observability stack. Don't invent novel formats.