Production Debugging & Incident Response Playbook · Lesson

Distributed Tracing for Latency Hotspots

Learn to use distributed tracing to follow a single request across services, identify latency hotspots, and correlate traces with logs during production debugging.

Why Distributed Tracing

In a microservice system a single user request can fan out across dozens of services. When it is slow, which service is to blame?

Distributed tracing answers this by attaching a shared trace_id to a request and recording a span for every operation it touches.

A trace = the whole request journey
A span = one timed unit of work

Anatomy of a Span

Each span carries timing and context so you can reconstruct the call tree.

trace_id links all spans of one request
span_id identifies the operation
parent_id records who called it
start/end timestamps give duration

{
  "trace_id": "abc123",
  "span_id": "s2",
  "parent_id": "s1",
  "name": "db.query.users",
  "start_ms": 1042,
  "end_ms": 1310
}

All lessons in this course

Remote Debugging Live Applications
Post-mortem Debugging with Core Dumps
Memory and CPU Profiling Techniques
Distributed Tracing for Latency Hotspots

← Back to Production Debugging & Incident Response Playbook