AI Prompt Engineering · Lesson

Re-ranking Retrieved Chunks

Cross-encoder re-ranking.

Why Re-Rank at All

First-stage retrieval (dense or sparse) optimizes for recall at scale: get the gold chunk somewhere in the top 50. It is fast but coarse. A second-stage re-ranker then reorders that shortlist for precision, surfacing the truly relevant chunks to the top.

This retrieve-broadly-then-rerank-precisely pattern is the backbone of advanced RAG.

def two_stage(query, k_retrieve=50, k_final=5):
    candidates = first_stage_retrieve(query, k_retrieve)  # high recall
    reranked = rerank(query, candidates)                  # high precision
    return reranked[:k_final]

Bi-Encoder vs Cross-Encoder

A bi-encoder encodes query and document separately into vectors and compares by cosine; fast and indexable but loses query-document interaction. A cross-encoder feeds the query and a candidate together through the model and outputs a relevance score, capturing fine-grained interaction.

Cross-encoders are far more accurate but cannot be precomputed, so they only run on the shortlist.

# Bi-encoder: score = cos(enc(q), enc(d))     -> precomputable
# Cross-encoder: score = model(q, d) -> 0..1   -> per-pair, no index
def cross_encode(query, doc):
    return cross_encoder.predict([(query, doc)])[0]  # joint attention

All lessons in this course

← Back to AI Prompt Engineering