Linux Command Line & Bash Scripting Mastery · Lesson

Parsing Web and Application Logs at Scale

Extract status codes, latencies, and client fields from access logs using grep, cut, and awk.

What Is a Web Access Log?

Every HTTP server — Apache, Nginx, Caddy — writes one line to an access log for each request. Understanding the structure of these lines is the foundation of all log analysis work.

A typical Combined Log Format (CLF) line looks like this:

Client IP — who made the request
Timestamp — when it happened
Request line — method, path, protocol
Status code — HTTP response (200, 404, 500…)
Bytes sent — response body size
Referer — origin page
User-Agent — browser or bot string

Example line from /var/log/nginx/access.log:

192.168.1.10 - alice [11/Jun/2026:14:32:01 +0000] "GET /api/orders HTTP/1.1" 200 1482 "-" "curl/7.88.1"

At scale, these files grow to millions of lines per day. The goal of this lesson is to extract, filter, and aggregate fields from them efficiently using standard BASH tools.

Sampling a Live Log with tail and grep

Before writing any pipeline, inspect the log to understand its shape. tail lets you watch a live stream; grep narrows it to relevant lines immediately.

Common patterns:

tail -n 1000 access.log — last 1000 lines
tail -f access.log — follow in real time
tail -f access.log | grep '" 5' — only 5xx errors as they arrive

The key insight is that grep matches against the entire line, so anchoring your pattern matters. Matching ' 500 ' (with spaces) avoids accidentally matching a URL path that contains the string 500.

#!/usr/bin/env bash
# Watch only HTTP 5xx errors arriving in real time
tail -f /var/log/nginx/access.log \
  | grep --line-buffered '" 5[0-9][0-9] '

All lessons in this course

Parsing Web and Application Logs at Scale
Real-Time Log Following and Streaming Alerts
Querying journald with journalctl in Scripts
Computing Metrics and Histograms from Log Streams

← Back to Linux Command Line & Bash Scripting Mastery