Linux Command Line & Bash Scripting Mastery · Lesson

Computing Metrics and Histograms from Log Streams

Aggregate request rates, percentiles, and top-N reports directly from streaming log data.

Why Compute Metrics from Raw Logs?

Production systems emit thousands of log lines per second. Rather than shipping raw logs to expensive analytics platforms, you can compute request rates, percentiles, and top-N reports directly in the shell — at near-zero cost.

Request rate: How many requests per second/minute does your service handle?
Latency percentiles: What is the p50/p95/p99 response time?
Top-N reports: Which endpoints, IPs, or error codes appear most frequently?

Shell tools like awk, sort, uniq, and bc form a powerful, composable pipeline that can answer these questions from a live log stream or a historical file without leaving the terminal.

Anatomy of a Common Access Log

Most web servers write logs in Combined Log Format. Understanding its fields is the foundation of every metric pipeline:

127.0.0.1 - frank [10/Oct/2024:13:55:36 -0700] "GET /api/users HTTP/1.1" 200 2326 0.042

Field 1: client IP
Field 4 (brackets): timestamp
Field 7 (quoted): HTTP method + path
Field 9: status code
Field 10: response bytes
Field 11: response time in seconds (custom field, not always present)

Use awk to reference fields by position ($1, $9, etc.). Fields inside quotes count as one token only when you split carefully — wrap the log line in awk -F'"' or use multiple passes.

All lessons in this course

Parsing Web and Application Logs at Scale
Real-Time Log Following and Streaming Alerts
Querying journald with journalctl in Scripts
Computing Metrics and Histograms from Log Streams

← Back to Linux Command Line & Bash Scripting Mastery