Re-ranking with Cross-Encoders
Retrieve top-50 with cheap embeddings, then re-rank top-5 with a slower cross-encoder for higher precision.
Why Re-Rank?
Embedding similarity is a fast, coarse first pass. It often pulls in chunks that share KEYWORDS with the query but are not actually relevant to the QUESTION.
A re-ranker takes (query, chunk) pairs and computes a much more precise relevance score.
Bi-Encoder vs Cross-Encoder
Two architectures for text similarity:
- Bi-encoder — embeds query and chunk separately, computes cosine. Fast (vectorise once, reuse), less accurate.
- Cross-encoder — runs (query, chunk) jointly through the model. Slow (per-pair), much more accurate.
All lessons in this course
- Re-ranking with Cross-Encoders
- HyDE: Hypothetical Document Embeddings
- Multi-Vector Retrieval (ColBERT)
- RAG Evaluation (RAGAS, Recall@K)