Improving LLM Performance Through Evaluation and Few-Shot Examples - Part 2

Overview

Learn how to improve Large Language Model (LLM) performance through systematic evaluation techniques in this 35-minute video tutorial from Trelis Research. Master the creation of high-quality prompts from scratch, understand the critical role of exemplary training data, and explore a practical demonstration using touch rugby scenarios. Follow along with a detailed UI walkthrough of pipeline creation using Claude Sonnet, including dataset generation, evaluation criteria setup, and ground truth answer development. Discover effective strategies for few-shot learning implementation, witness performance improvements through practical examples, and gain insights into pipeline customization options. Explore advanced topics such as evaluation management, judge selection criteria, and considerations for OpenAI's o1 model deployment. Access comprehensive resources including an ADVANCED-evals repository, detailed slides, and additional learning materials through provided links.

Syllabus

- Introduction to LLM evals and their importance
- Overview of creating high-quality prompts from scratch
- Emphasis on importance of high-quality examples
- Introduction to systematic approach for creating examples
- Overview of demonstration using touch rugby example
- Introduction to LLM eval repo and UI demonstration
- Start of UI demonstration with pipeline creation
- Creating initial pipeline with Claude Sonnet
- Creating first dataset for touch rugby Q&A
- Setting up evaluation criteria and format requirements
- Demonstration of generating ground truth answers
- Creating second evaluation task
- Introduction to creating few-shot examples
- Setting up pipeline with few-shot examples
- Creating training examples for few-shot learning
- Demonstration of improved performance with few-shot examples
- Discussion of pipeline customization options
- Final tips on judges and evaluation
- Recommendations for managing examples
- Discussion of OpenAI's o1 model considerations
- Conclusion and future topics

Taught by

Trelis Research

Reviews

Start your review of Improving LLM Performance Through Evaluation and Few-Shot Examples - Part 2

Taught by

Comprehensive Guide to Evaluating Large Language Models (LLM) Performance

Never Stop Learning.