0Pricing
Linux Command Line & Bash Scripting Mastery · Lesson

Streaming Pipelines and Named Pipes for Throughput

Use FIFOs and process substitution to stream data between stages without intermediate files.

Why Intermediate Files Hurt Throughput

When you chain commands like sort file.txt > tmp.txt && uniq tmp.txt > result.txt, you pay a hidden tax: disk writes, disk reads, and the pipeline stalls until the first stage finishes completely before the next begins.

Streaming pipelines eliminate that tax. Data flows directly from producer to consumer in memory, stage by stage, concurrently. This is the core idea behind Unix pipes — and named pipes (FIFOs) extend it further.

  • Anonymous pipe (|): connects two adjacent commands in the same shell line.
  • Named pipe (FIFO): a special file in the filesystem that lets unrelated processes stream to each other.
  • Process substitution: lets a command treat another command's output as if it were a file.

This lesson shows you how to apply all three to maximize throughput in real-world Bash workflows.

Anatomy of a Streaming Pipeline

An anonymous pipe connects stdout of one process to stdin of the next. The kernel keeps both processes running simultaneously in a fixed-size in-memory buffer (typically 64 KB on Linux).

The key insight: the pipeline is as fast as its slowest stage. If the producer is faster, it blocks on a full buffer. If the consumer is faster, it blocks on an empty buffer. This back-pressure is free, automatic flow control.

The example below counts unique IP addresses in a large access log without ever writing a temporary file. Each stage runs concurrently:

#!/usr/bin/env bash
# Stream a 2 GB access log — all stages run in parallel
grep '"GET' /var/log/nginx/access.log \
  | awk '{print $1}' \
  | sort \
  | uniq -c \
  | sort -rn \
  | head -20

All lessons in this course

  1. Profiling Scripts and Avoiding Useless Subshells
  2. Parallelism with xargs -P and Background Jobs
  3. Orchestrating Workloads with GNU parallel
  4. Streaming Pipelines and Named Pipes for Throughput
← Back to Linux Command Line & Bash Scripting Mastery