Linux Command Line & Bash Scripting Mastery · Lesson

Aggregation with awk Arrays and Grouping

Compute sums, counts, and group-by summaries using associative arrays keyed on field values.

What Are awk Associative Arrays?

In awk, an associative array is a key-value store where keys can be any string or number. Unlike indexed arrays in most languages, awk arrays are hash maps under the hood — perfect for grouping and aggregating data by field values.

Declare implicitly: just assign arr[key] = value
Keys are strings by default (numbers are coerced)
No size limit — awk grows the array as needed
Ideal for computing sums, counts, and group-by rollups in a single pass

You do not need to initialize a key before incrementing it — awk treats an unset key as zero or an empty string automatically.

Counting Lines Per Group

The most common aggregation pattern is counting how many times each unique value in a field appears. Use field $1 (or any field) as the array key and increment a counter on every matching row.

The END block runs after all input is consumed — that is where you print the accumulated results.

count[$1]++

After processing, count holds the total number of lines for each unique value of $1.

#!/usr/bin/env bash
# Count how many log entries exist per HTTP status code
# Input format: <ip> <date> <method> <path> <status> <bytes>
printf '10.0.0.1 2024-01-01 GET /index 200 512
10.0.0.2 2024-01-01 POST /api 404 128
10.0.0.3 2024-01-01 GET /img 200 2048
10.0.0.4 2024-01-01 GET /api 500 64
10.0.0.5 2024-01-01 DELETE /api 404 32
' | awk '
{
    count[$5]++
}
END {
    for (status in count)
        print status, count[status]
}'

All lessons in this course

Records, Fields, and Custom Separators in awk
Patterns, Ranges, and BEGIN/END Blocks
Aggregation with awk Arrays and Grouping
awk Functions, printf Formatting, and Report Generation

← Back to Linux Command Line & Bash Scripting Mastery