How the Planner Estimates Row Counts
Trace selectivity estimation from pg_statistic to the cardinalities that drive plan choice.
Why Row Estimates Drive Everything
Before PostgreSQL executes a query, the planner must decide how to run it: sequential scan vs. index scan, nested loop vs. hash join, which table to drive a join from. Every one of these decisions hinges on a single guess: how many rows will each step produce?
- If the planner thinks a filter returns 5 rows, an index scan + nested loop looks cheap.
- If it thinks the same filter returns 5 million rows, a sequential scan + hash join wins.
These row-count guesses are called cardinality estimates. When they are wrong, the planner picks a bad plan even though its cost model is perfectly sound. This lesson traces exactly where those numbers come from.
Reading Estimates from EXPLAIN
Every node in an EXPLAIN plan reports the planner's estimate. The rows= value is the estimated cardinality for that node. Run EXPLAIN ANALYZE to compare it against the real count.
(cost=… rows=120 …)is the estimate.(actual … rows=118 …)is the truth.
A large gap between estimated and actual rows is the single most common root cause of slow plans. Train your eye to scan for it first.
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE status = 'shipped'
AND country = 'DE';All lessons in this course
- How the Planner Estimates Row Counts
- Multivariate Statistics for Correlated Columns
- MCV and N-Distinct Corrections
- Validating Estimates Against Actual Rows