Generation Metrics: Faithfulness and Answer Relevance
Use RAGAS to measure whether generated answers are faithful to the retrieved context and whether they actually address the user's question without hallucinating.
Measuring the Generation Stage
Even when your retriever finds the perfect chunks, the generation stage can still fail. The LLM might ignore the retrieved context and answer from its parametric memory, misinterpret what the context says, or answer a slightly different question than what was asked. Generation metrics quantify these failures independently from retrieval so you can pinpoint and fix each problem. The two primary generation metrics are faithfulness and answer relevance.
Faithfulness: Definition
Faithfulness measures whether every claim in the generated answer can be directly traced back to the retrieved context. A faithful answer introduces no information that is not present in the context. Faithfulness is measured at the claim level: the answer is decomposed into individual atomic statements, and each is checked against the context for support. The faithfulness score is the fraction of claims that are supported.
# Faithfulness = supported_claims / total_claims
example_answer = (
'Employees receive 15 vacation days per year. '
'Remote work is allowed on Wednesdays and Fridays. '
'The CEO is John Smith.'
)
example_context = (
'15 vacation days are granted annually. '
'Remote work is permitted on Wednesdays and Fridays.'
)
# Claim 1: 15 vacation days — SUPPORTED
# Claim 2: Remote work Wed+Fri — SUPPORTED
# Claim 3: CEO is John Smith — NOT IN CONTEXT (hallucinated)
# Faithfulness = 2/3 = 0.67All lessons in this course
- Why Evaluation Matters in RAG
- Retrieval Metrics: Hit Rate, MRR, and NDCG
- Generation Metrics: Faithfulness and Answer Relevance
- Building an Automated Evaluation Harness