Cross-Encoder Re-ranking with Cohere and BGE
Integrate Cohere's rerank endpoint and the BGE-reranker model to score query-document pairs and reorder your top-k chunks by true relevance.
What Is a Cross-Encoder?
A cross-encoder is a neural model that takes a query and a document as a single concatenated input and outputs a relevance score. Unlike a bi-encoder that embeds query and document separately, the cross-encoder's attention layers see both texts simultaneously, enabling it to model fine-grained relevance signals that a bi-encoder misses. This joint processing is what makes cross-encoders significantly more accurate at ranking.
# Bi-encoder: separate encoding
query_vec = encoder.encode(query) # [768] vector
doc_vec = encoder.encode(document) # [768] vector
score = cosine_similarity(query_vec, doc_vec) # compare independently
# Cross-encoder: joint encoding
combined_input = '[CLS] ' + query + ' [SEP] ' + document + ' [SEP]'
logits = cross_encoder.forward(combined_input) # score from joint attention
# The model attends from every query token to every document tokenThe Cohere Rerank API
Cohere Rerank is a cloud-hosted cross-encoder re-ranking API that accepts a query and a list of up to 1000 document texts and returns relevance scores. It uses large cross-encoder models trained on high-quality ranking datasets and consistently outperforms self-hosted small cross-encoders on most benchmarks. The API is priced per thousand documents re-ranked.
import cohere
co = cohere.Client('YOUR_COHERE_API_KEY')
candidates = [
'How to configure pgvector in PostgreSQL for vector search',
'BM25 scoring formula and hyperparameters explained',
'Installing and using the pgvector extension for embeddings',
'Hybrid search combining BM25 and dense retrieval',
]
result = co.rerank(
model='rerank-english-v3.0',
query='pgvector setup guide',
documents=candidates,
top_n=3,
return_documents=True,
)
for item in result.results:
print(f'Score {item.relevance_score:.4f}: {item.document.text[:60]}')All lessons in this course
- Why Two-Stage Retrieval Works
- Cross-Encoder Re-ranking with Cohere and BGE
- Contextual Compression and Relevance Filtering
- Measuring the Impact of Re-ranking