Ranking and Filtering Search Results
Scoring relevance, deduplication, and selecting the best results for context.
Why Ranking and Filtering Matters
A search API returns 5-10 results, but not all are equally relevant, reliable, or useful for the agent's task. Raw results fed directly to the LLM waste context tokens and can introduce noise or misinformation.
Ranking and filtering improves the signal-to-noise ratio before results reach the LLM.
Relevance Scoring with BM25
BM25 (Best Match 25) is a classical text ranking algorithm that scores documents by keyword overlap with the query. It works well for lexical matching — when the query and document share the same words.
Install with pip install rank-bm25.
from rank_bm25 import BM25Okapi
def rank_with_bm25(query, results):
# Tokenize: lowercase and split into words
tokenized_results = [
r['content'].lower().split()
for r in results
]
bm25 = BM25Okapi(tokenized_results)
query_tokens = query.lower().split()
scores = bm25.get_scores(query_tokens)
# Sort results by score descending
ranked = sorted(
zip(scores, results),
key=lambda x: x[0],
reverse=True
)
return [(score, result) for score, result in ranked]All lessons in this course
- Tavily and SerpAPI for Agent Search
- Ranking and Filtering Search Results
- Deep Research Loop Pattern
- Combining Web Search with RAG