Reinforcement Learning with Human Feedback - Understanding LLM Fine-tuning with PPO and DPO

Overview

Explore the transformative world of Reinforcement Learning with Human Feedback (RLHF) in this 29-minute technical talk by Luis Serrano, PhD, author of Grokking Machine Learning and former engineer at Google, Apple, and Cohere. Master the core concepts of reinforcement learning while diving into advanced techniques like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) for training large language models. Starting with the fundamentals of transformer-based language models, progress through the intricacies of fine-tuning with human feedback, and understand how these techniques enhance text generation capabilities. Learn practical applications of RLHF through clear explanations from an industry expert who breaks down complex concepts into digestible segments, making advanced AI concepts accessible to machine learning and NLP enthusiasts.

Syllabus

- Introduction
- Large Language ModelsTransformers
- How to fine-tune them with RLHF
- Quick intro to reinforcement learning
- PPO reinforcement learning technique to fine-tune LLMs
- DPOnon-reinforcement learning technique to fine-tune LLMs