Overview
Learn how to implement and evaluate faithfulness checks in RAG (Retrieval-Augmented Generation) pipelines through this detailed technical lab session. Explore various approaches to detect and prevent hallucinations in LLM responses, including using LLM-as-judge with gpt4o-mini, implementing Lynx for hallucination detection, and leveraging frameworks like Llama-index and RAGAS for faithfulness evaluation. Dive into Azure AI Content Safety's Groundedness API capabilities and compare different solutions using the MiniHaluBench dataset. Follow along with practical demonstrations and code implementations while learning to ensure LLM responses remain grounded in provided context, complete with performance comparisons and evaluation metrics across different approaches.
Syllabus
- Introduction
- RAG pipeline recap
- Using LLM-as-judge to verify answer faithfulness
- Using Lynx to detect hallucinations
- Llama-Index Faithfulness Evaluation
- RAGAS Faithfulness Evaluation
- Azure Groundedness Checks
- HaluBench - Dataset to evaluate hallucinations
- Scripts to run evaluations
- Evaluation Results and Comparison
Taught by
Donato Capitella