Linux Command Line & Bash Scripting Mastery · Lesson

Orchestrating Workloads with GNU parallel

Distribute large input sets across cores with GNU parallel, job slots, and result ordering.

What Is GNU parallel and Why Use It?

GNU parallel is a shell tool that lets you run jobs in parallel on one or multiple machines. Instead of processing a large list of items one by one in a for loop, parallel spreads that work across all available CPU cores simultaneously.

Speed: A task that takes 8 minutes sequentially can finish in ~1 minute on an 8-core machine.
Simplicity: It accepts input from stdin, files, or argument lists — no manual process management.
Safety: Output from different jobs is kept separate; results are never interleaved.

Install it with sudo apt install parallel (Debian/Ubuntu) or brew install parallel (macOS). Verify with parallel --version.

Your First parallel Command

The simplest form of parallel reads items from stdin and runs a command for each one. The placeholder {} represents the current input item.

The example below compresses five log files concurrently using gzip. Without parallel, each file would be compressed one after the other. With it, up to N files (where N = number of CPU cores) are compressed at the same time.

#!/usr/bin/env bash
# Create sample files first
for i in 1 2 3 4 5; do
  dd if=/dev/urandom bs=1M count=2 of="log_${i}.txt" 2>/dev/null
done

# Compress all of them in parallel
ls log_*.txt | parallel gzip {}

echo "Done. Compressed files:"
ls log_*.txt.gz

All lessons in this course

Profiling Scripts and Avoiding Useless Subshells
Parallelism with xargs -P and Background Jobs
Orchestrating Workloads with GNU parallel
Streaming Pipelines and Named Pipes for Throughput

← Back to Linux Command Line & Bash Scripting Mastery