Deferring Indexes and Constraints During Load
Drop and rebuild indexes and constraints around bulk loads to slash write amplification.
Why Bulk Loads Get Slow
When you load millions of rows into a table that already has indexes and constraints, PostgreSQL pays a hidden tax on every single row.
- Each index must be updated (B-tree page splits, WAL writes).
- Each foreign key triggers a lookup against the referenced table.
- Each unique/check constraint is validated per-row.
This per-row work is called write amplification: one logical INSERT becomes many physical writes. The core optimization in this lesson is to defer that work — load the raw data first, then build indexes and validate constraints once, in bulk.
The Per-Index Cost
Maintaining a B-tree index during a load is not free. For each inserted row, PostgreSQL must walk the tree, find the leaf page, possibly split it, and log the change to WAL.
Building the same index after the data is present is far cheaper: PostgreSQL sorts all keys at once and writes dense, sequential pages. A table with 5 indexes loaded row-by-row does roughly 6x the write work versus loading the heap alone.
The takeaway: fewer indexes present during load = less amplification.
All lessons in this course
- COPY vs Multi-Row INSERT Throughput
- Deferring Indexes and Constraints During Load
- Tuning WAL and Checkpoints for Ingestion
- Upserts at Scale with ON CONFLICT