Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Reinforcement Learning with Human Feedback - Understanding LLM Fine-tuning with PPO and DPO

Open Data Science via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the transformative world of Reinforcement Learning with Human Feedback (RLHF) in this 29-minute technical talk by Luis Serrano, PhD, author of Grokking Machine Learning and former engineer at Google, Apple, and Cohere. Master the core concepts of reinforcement learning while diving into advanced techniques like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO) for training large language models. Starting with the fundamentals of transformer-based language models, progress through the intricacies of fine-tuning with human feedback, and understand how these techniques enhance text generation capabilities. Learn practical applications of RLHF through clear explanations from an industry expert who breaks down complex concepts into digestible segments, making advanced AI concepts accessible to machine learning and NLP enthusiasts.

Syllabus

- Introduction
- Large Language ModelsTransformers
- How to fine-tune them with RLHF
- Quick intro to reinforcement learning
- PPO reinforcement learning technique to fine-tune LLMs
- DPOnon-reinforcement learning technique to fine-tune LLMs

Taught by

Open Data Science

Reviews

Start your review of Reinforcement Learning with Human Feedback - Understanding LLM Fine-tuning with PPO and DPO

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.