How to Evaluate LLM Performance for Domain-Specific Use Cases

How to Evaluate LLM Performance for Domain-Specific Use Cases

Snorkel AI via YouTube Direct link

Agenda

1 of 27

1 of 27

Agenda

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

How to Evaluate LLM Performance for Domain-Specific Use Cases

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Agenda
  2. 2 : Why do we need LLM evaluation?
  3. 3 Common evaluation axes
  4. 4 Why eval is more critical in Gen AI use cases
  5. 5 Why enterprises are often blocked on effective LLM evaluation
  6. 6 Common approaches to LLM evaluation
  7. 7 OSS benchmarks + metrics
  8. 8 LLM-as-a-judge
  9. 9 Annotation strategies
  10. 10 How can we do better than manual annotation strategies?
  11. 11 How data slices enable better LLM evaluation
  12. 12 How does LLM eval work with Snorkel?
  13. 13 Building a quality model
  14. 14 Using fine-grained benchmarks for next steps
  15. 15 Workflow overview review
  16. 16 Workflow—starting with the model
  17. 17 Workflow—Using an LLM as a judge
  18. 18 Workflow—the quality model
  19. 19 Chatbot demo
  20. 20 Annotating data in Snorkel Flow demo
  21. 21 Building labeling functions in Snorkel Flow demo
  22. 22 LLM evaluation in Snorkel Flow demo
  23. 23 Snorkel Flow jupyter notebook demo
  24. 24 Data slices in Snorkel Flow demo
  25. 25 Recap
  26. 26 Snorkel eval offer!
  27. 27 Q&A

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.