Overview
Syllabus
- Introduction to LLM evals and their importance
- Overview of creating high-quality prompts from scratch
- Emphasis on importance of high-quality examples
- Introduction to systematic approach for creating examples
- Overview of demonstration using touch rugby example
- Introduction to LLM eval repo and UI demonstration
- Start of UI demonstration with pipeline creation
- Creating initial pipeline with Claude Sonnet
- Creating first dataset for touch rugby Q&A
- Setting up evaluation criteria and format requirements
- Demonstration of generating ground truth answers
- Creating second evaluation task
- Introduction to creating few-shot examples
- Setting up pipeline with few-shot examples
- Creating training examples for few-shot learning
- Demonstration of improved performance with few-shot examples
- Discussion of pipeline customization options
- Final tips on judges and evaluation
- Recommendations for managing examples
- Discussion of OpenAI's o1 model considerations
- Conclusion and future topics
Taught by
Trelis Research