Evaluating Language Models - Challenges and Best Practices

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the challenges and solutions for evaluating language models in this 23-minute lightning talk from the AI in Production Conference. Delve into the metrics and datasets available for assessment, and examine the difficulties of continuous evaluation in production environments. Learn about common pitfalls to avoid and gain insights from Matthew Sharp, author of "LLMs in Production" and a seasoned professional with over a decade of experience in ML/AI and deploying models to production. Discover the importance of contributing to public evaluation datasets and join the call for a community-wide effort to reduce harmful bias in language models. Gain valuable takeaways for improving language model evaluation practices in your own projects or organizations.