Orchestrating Workloads with GNU parallel
Distribute large input sets across cores with GNU parallel, job slots, and result ordering.
What Is GNU parallel and Why Use It?
GNU parallel is a shell tool that lets you run jobs in parallel on one or multiple machines. Instead of processing a large list of items one by one in a for loop, parallel spreads that work across all available CPU cores simultaneously.
- Speed: A task that takes 8 minutes sequentially can finish in ~1 minute on an 8-core machine.
- Simplicity: It accepts input from stdin, files, or argument lists — no manual process management.
- Safety: Output from different jobs is kept separate; results are never interleaved.
Install it with sudo apt install parallel (Debian/Ubuntu) or brew install parallel (macOS). Verify with parallel --version.
Your First parallel Command
The simplest form of parallel reads items from stdin and runs a command for each one. The placeholder {} represents the current input item.
The example below compresses five log files concurrently using gzip. Without parallel, each file would be compressed one after the other. With it, up to N files (where N = number of CPU cores) are compressed at the same time.
#!/usr/bin/env bash
# Create sample files first
for i in 1 2 3 4 5; do
dd if=/dev/urandom bs=1M count=2 of="log_${i}.txt" 2>/dev/null
done
# Compress all of them in parallel
ls log_*.txt | parallel gzip {}
echo "Done. Compressed files:"
ls log_*.txt.gzAll lessons in this course
- Profiling Scripts and Avoiding Useless Subshells
- Parallelism with xargs -P and Background Jobs
- Orchestrating Workloads with GNU parallel
- Streaming Pipelines and Named Pipes for Throughput