Distributed Tracing with OpenTelemetry Spans
Auto and manually instrument services to produce spans that reveal latency across service boundaries.
Why Distributed Tracing?
In a microservices system one user request can hop across an API gateway, an orders service, a payments service, and a database. When that request is slow, a single service's logs cannot tell you where the time went.
Distributed tracing stitches the whole journey together. Each unit of work becomes a span, spans are linked into a trace, and the trace reveals latency across every service boundary.
- Trace: the entire end-to-end request, identified by a
traceId. - Span: one operation (an HTTP call, a DB query) with a start time, duration, and parent.
- Context propagation: passing
traceIdandspanIdacross service boundaries, usually via HTTP headers.
OpenTelemetry (OTel) is the vendor-neutral standard for producing these spans in Node.js.
Anatomy of a Span
A span is the atomic building block of a trace. Every span carries the same trace identity but its own identity and timing.
traceId: 16 bytes, shared by every span in the trace.spanId: 8 bytes, unique to this span.parentSpanId: links this span to the operation that caused it.name,startTime,endTime(duration = end - start).- Attributes: key/value tags like
http.methodordb.system. - Status:
OK,ERROR, orUNSET.
Parent/child links form a tree. The root span is the whole request; child spans are the calls it makes. Visualized, the tree becomes the familiar waterfall you see in Jaeger or Tempo.
All lessons in this course
- Structured Logging with Correlation IDs
- Distributed Tracing with OpenTelemetry Spans
- Exposing Application Metrics and the RED Method
- Context Propagation with AsyncLocalStorage