Explore the evolution of AI evaluation at Coda in this 29-minute conference talk from Data Council. Journey through the meticulous process of developing an AI evaluation system, starting with initial tests using OpenAI's playground and progressing to integrated evaluations in Coda documents and multi-vendor assessments. Learn about the crucial role of a robust benchmark dataset, the differences between manual and automated evaluations, and the complexities of integrating with application code. Gain valuable insights and practical strategies for AI feature development, applicable to startup engineers, product developers, and anyone interested in real-world AI applications. Presented by Kenny Wong, Software Engineer at Coda, this talk offers essential lessons from Coda's experience in crafting and implementing AI evaluation systems.
Overview
Syllabus
From Playgrounds to Production: The Evolution of AI Evaluation at Coda
Taught by
Data Council