Evaluating LLMs and RAG Pipelines at Scale

Overview

Discover how to effectively evaluate Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines in production environments. Explore the unique challenges posed by unstructured outputs and the multitude of parameters involved in these systems. Learn about Valor, an open-source evaluation service, and its role in facilitating rigorous, real-world testing. Gain insights into integrating evaluation processes into existing LLMOps tech stacks, enabling teams to determine the optimal LLM model and parameters for specific tasks and datasets. Delve into strategies for addressing the complexities of LLM evaluation, including prompt templates, document chunking strategies, and embedding models.