Designing tsvector Columns and GIN Indexes
Precompute and index search documents so full-text queries stay sub-millisecond at scale.
Why a Precomputed tsvector
PostgreSQL full-text search compares a tsvector (the searchable document) against a tsquery (the search terms). The naive approach calls to_tsvector() on a raw text column at query time.
That works, but it has two costs at scale:
- CPU per row: parsing and stemming text on every scan is expensive.
- No usable index unless the index expression exactly matches the query expression.
The fix is to precompute the document once and store it, then index it. This lesson shows how to design that column and the GIN index so full-text queries stay sub-millisecond even on millions of rows.
The Naive Query (and Its Trap)
Here is the pattern most people start with: store plain text, and build the tsvector on the fly.
The query below works correctly, but on a large table it triggers a sequential scan and re-parses body for every row. Each call to to_tsvector stems and normalizes the full document text.
The lesson's goal is to eliminate this per-row work entirely.
SELECT id, title
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'index & scan');All lessons in this course
- Designing tsvector Columns and GIN Indexes
- Ranking and Relevance Tuning with ts_rank
- Fuzzy Matching with pg_trgm Similarity
- Combining Filters with Search Predicates