Overview
All evaluators implement theBaseEvaluator interface:
EvaluationResult:
Faithfulness
What it measures: Whether the response is grounded in the provided context. Faithfulness checks if the answer contains hallucinations or makes claims not supported by the source documents.Evaluate Response Objects
Directly evaluate query engine responses:Custom Prompts
Relevancy
What it measures: Whether the response actually answers the question. Relevancy checks if the response addresses the user’s query.How It Works
Relevancy uses an LLM to determine if the response answers the question:- Formats query and response together
- Queries a SummaryIndex of the contexts
- LLM answers “yes” or “no”
- Returns score (1.0 for yes, 0.0 for no)
Correctness
What it measures: How correct the response is compared to a reference answer. Correctness requires a reference (ground truth) answer:Score Scale
Correctness uses a 1-5 scale:- 5 - Perfect match
- 4 - Correct with minor differences
- 3 - Partially correct
- 2 - Mostly incorrect
- 1 - Completely incorrect
Custom Parser
Parse LLM responses differently:Batch Evaluation
Evaluate multiple queries:Rate Limiting
Avoid API rate limits:Evaluation Pipeline
Create a comprehensive evaluation workflow:Best Practices
Test Set Creation:- Create diverse test cases covering different query types
- Include edge cases and common failure modes
- Use real user queries when possible
- Maintain reference answers for correctness evaluation
- Faithfulness - Critical for preventing hallucinations
- Relevancy - Ensures responses answer the question
- Correctness - Requires reference answers, best for regression testing
- Establish baseline scores
- Make changes (prompts, retrievers, etc.)
- Re-run evaluation
- Compare scores to baseline
- Keep improvements, discard regressions
- Run evaluations in parallel when possible
- Cache LLM responses to avoid redundant calls
- Use rate limiting to avoid API errors
Next Steps
Postprocessors
Improve retrieval quality with filtering and reranking
Memory
Manage conversation context and history