AI Engineering Academy · Lesson

Cross-Encoder Re-ranking with Cohere and BGE

Integrate Cohere's rerank endpoint and the BGE-reranker model to score query-document pairs and reorder your top-k chunks by true relevance.

What Is a Cross-Encoder?

A cross-encoder is a neural model that takes a query and a document as a single concatenated input and outputs a relevance score. Unlike a bi-encoder that embeds query and document separately, the cross-encoder's attention layers see both texts simultaneously, enabling it to model fine-grained relevance signals that a bi-encoder misses. This joint processing is what makes cross-encoders significantly more accurate at ranking.

# Bi-encoder: separate encoding
query_vec = encoder.encode(query)          # [768] vector
doc_vec = encoder.encode(document)         # [768] vector
score = cosine_similarity(query_vec, doc_vec)  # compare independently

# Cross-encoder: joint encoding
combined_input = '[CLS] ' + query + ' [SEP] ' + document + ' [SEP]'
logits = cross_encoder.forward(combined_input)  # score from joint attention
# The model attends from every query token to every document token

The Cohere Rerank API

Cohere Rerank is a cloud-hosted cross-encoder re-ranking API that accepts a query and a list of up to 1000 document texts and returns relevance scores. It uses large cross-encoder models trained on high-quality ranking datasets and consistently outperforms self-hosted small cross-encoders on most benchmarks. The API is priced per thousand documents re-ranked.

import cohere

co = cohere.Client('YOUR_COHERE_API_KEY')

candidates = [
    'How to configure pgvector in PostgreSQL for vector search',
    'BM25 scoring formula and hyperparameters explained',
    'Installing and using the pgvector extension for embeddings',
    'Hybrid search combining BM25 and dense retrieval',
]

result = co.rerank(
    model='rerank-english-v3.0',
    query='pgvector setup guide',
    documents=candidates,
    top_n=3,
    return_documents=True,
)

for item in result.results:
    print(f'Score {item.relevance_score:.4f}: {item.document.text[:60]}')

All lessons in this course

Why Two-Stage Retrieval Works
Cross-Encoder Re-ranking with Cohere and BGE
Contextual Compression and Relevance Filtering
Measuring the Impact of Re-ranking

← Back to AI Engineering Academy