AI Agents · Lesson

Ranking and Filtering Search Results

Scoring relevance, deduplication, and selecting the best results for context.

Why Ranking and Filtering Matters

A search API returns 5-10 results, but not all are equally relevant, reliable, or useful for the agent's task. Raw results fed directly to the LLM waste context tokens and can introduce noise or misinformation.

Ranking and filtering improves the signal-to-noise ratio before results reach the LLM.

Relevance Scoring with BM25

BM25 (Best Match 25) is a classical text ranking algorithm that scores documents by keyword overlap with the query. It works well for lexical matching — when the query and document share the same words.

Install with pip install rank-bm25.

from rank_bm25 import BM25Okapi

def rank_with_bm25(query, results):
    # Tokenize: lowercase and split into words
    tokenized_results = [
        r['content'].lower().split()
        for r in results
    ]
    bm25 = BM25Okapi(tokenized_results)

    query_tokens = query.lower().split()
    scores = bm25.get_scores(query_tokens)

    # Sort results by score descending
    ranked = sorted(
        zip(scores, results),
        key=lambda x: x[0],
        reverse=True
    )
    return [(score, result) for score, result in ranked]

All lessons in this course

← Back to AI Agents