Completed
How data slices enable better LLM evaluation
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
How to Evaluate LLM Performance for Domain-Specific Use Cases
Automatically move to the next video in the Classroom when playback concludes
- 1 Agenda
- 2 : Why do we need LLM evaluation?
- 3 Common evaluation axes
- 4 Why eval is more critical in Gen AI use cases
- 5 Why enterprises are often blocked on effective LLM evaluation
- 6 Common approaches to LLM evaluation
- 7 OSS benchmarks + metrics
- 8 LLM-as-a-judge
- 9 Annotation strategies
- 10 How can we do better than manual annotation strategies?
- 11 How data slices enable better LLM evaluation
- 12 How does LLM eval work with Snorkel?
- 13 Building a quality model
- 14 Using fine-grained benchmarks for next steps
- 15 Workflow overview review
- 16 Workflow—starting with the model
- 17 Workflow—Using an LLM as a judge
- 18 Workflow—the quality model
- 19 Chatbot demo
- 20 Annotating data in Snorkel Flow demo
- 21 Building labeling functions in Snorkel Flow demo
- 22 LLM evaluation in Snorkel Flow demo
- 23 Snorkel Flow jupyter notebook demo
- 24 Data slices in Snorkel Flow demo
- 25 Recap
- 26 Snorkel eval offer!
- 27 Q&A