Linux Command Line & Bash Scripting Mastery · Lesson

Building System Health Check and Alert Scripts

Collect load, memory, and disk metrics and trigger threshold-based alerts from scheduled scripts.

Why System Health Checks Matter

Production servers can degrade silently. CPU spikes, memory leaks, and full disks cause outages — but only if nobody notices in time. System health check scripts automate the monitoring loop: collect metrics, compare against thresholds, and fire alerts before users feel the pain.

Scheduled via cron, they run every few minutes without human attention
They produce consistent, timestamped output suitable for log aggregation
Threshold-based logic keeps alerts meaningful — not every hiccup pages the on-call team

In this lesson you will build a production-grade health check script from scratch, layer by layer, covering load average, memory pressure, and disk utilisation.

Capturing Load Average

Linux exposes the 1-minute, 5-minute, and 15-minute load averages through /proc/loadavg and the uptime command. For scripting, /proc/loadavg is the cleanest source — no locale issues, no parsing variation across distros.

The snippet below reads the 1-minute load average and stores it in a variable for threshold comparison. cut extracts the first field; awk strips the decimal for integer comparison using bc for float arithmetic.

#!/usr/bin/env bash
# Read 1-minute load average from /proc/loadavg
LOAD_RAW=$(cut -d' ' -f1 /proc/loadavg)
echo "Raw load average: $LOAD_RAW"

# Number of CPU cores — used to normalise load
CPU_CORES=$(nproc)
echo "CPU cores: $CPU_CORES"

# Compute load percentage (load / cores * 100) using bc
LOAD_PCT=$(echo "scale=2; $LOAD_RAW / $CPU_CORES * 100" | bc)
echo "Load %: $LOAD_PCT"

All lessons in this course

Automating User and Group Provisioning
Controlling systemd Services and Writing Unit Files
Disk, Filesystem, and Mount Automation
Building System Health Check and Alert Scripts

← Back to Linux Command Line & Bash Scripting Mastery