Aggregation with awk Arrays and Grouping
Compute sums, counts, and group-by summaries using associative arrays keyed on field values.
What Are awk Associative Arrays?
In awk, an associative array is a key-value store where keys can be any string or number. Unlike indexed arrays in most languages, awk arrays are hash maps under the hood — perfect for grouping and aggregating data by field values.
- Declare implicitly: just assign
arr[key] = value - Keys are strings by default (numbers are coerced)
- No size limit — awk grows the array as needed
- Ideal for computing sums, counts, and group-by rollups in a single pass
You do not need to initialize a key before incrementing it — awk treats an unset key as zero or an empty string automatically.
Counting Lines Per Group
The most common aggregation pattern is counting how many times each unique value in a field appears. Use field $1 (or any field) as the array key and increment a counter on every matching row.
The END block runs after all input is consumed — that is where you print the accumulated results.
count[$1]++After processing, count holds the total number of lines for each unique value of $1.
#!/usr/bin/env bash
# Count how many log entries exist per HTTP status code
# Input format: <ip> <date> <method> <path> <status> <bytes>
printf '10.0.0.1 2024-01-01 GET /index 200 512
10.0.0.2 2024-01-01 POST /api 404 128
10.0.0.3 2024-01-01 GET /img 200 2048
10.0.0.4 2024-01-01 GET /api 500 64
10.0.0.5 2024-01-01 DELETE /api 404 32
' | awk '
{
count[$5]++
}
END {
for (status in count)
print status, count[status]
}'
All lessons in this course
- Records, Fields, and Custom Separators in awk
- Patterns, Ranges, and BEGIN/END Blocks
- Aggregation with awk Arrays and Grouping
- awk Functions, printf Formatting, and Report Generation