Challenges in Using LLM-as-a-Judge for Production Evaluations

Overview

Explore a 43-minute Vector Space Talk where Sourabh Agrawal, CEO & Co-Founder of UpTrain AI, delves into the complexities and challenges of using Large Language Models (LLMs) as evaluation tools. Learn about production-grade evaluation techniques employed in industry and academia, focusing on how to effectively implement LLM-based assessments for RAG-based applications. Discover key insights from Sourabh's extensive experience in AI/ML, from his work at Goldman Sachs and Bosch/Mercedes to founding an AI-powered fitness startup. Gain practical knowledge about overcoming the subjective nature of human evaluations, implementing scalable LLM-based evaluations, and leveraging these assessments to enhance LLM applications. Understand the root-cause analysis methods for identifying pipeline failures, recognizing patterns in failing cases, and implementing automated solutions through UpTrain's open-source LLMOps tool.