MCV and N-Distinct Corrections
Use ndistinct and most-common-value statistics to repair join and grouping estimates.
Why Estimates Drift
The PostgreSQL planner chooses join orders, join methods, and grouping strategies from row-count estimates. When those estimates are wrong, you get nested loops over millions of rows or a hash table sized for the wrong cardinality.
Two column-level statistics drive most of these estimates:
- n_distinct — how many distinct values the planner believes a column holds. It feeds grouping and join cardinality.
- most_common_vals (MCV) — the list of frequent values and their frequencies, used for selectivity of equality predicates.
This lesson shows how to read, diagnose, and correct both when the default sampling gets them wrong.
Reading pg_stats
Everything the planner knows about a column lives in the pg_stats view, a human-readable wrapper over pg_statistic. Start every diagnosis here.
Key columns: n_distinct, most_common_vals, most_common_freqs, and null_frac.
SELECT attname,
n_distinct,
null_frac,
most_common_vals,
most_common_freqs
FROM pg_stats
WHERE schemaname = 'public'
AND tablename = 'orders'
AND attname IN ('customer_id', 'status');