Reinforcement Learning with Human Feedback - Understanding LLM Fine-tuning with PPO and DPO

Reinforcement Learning with Human Feedback - Understanding LLM Fine-tuning with PPO and DPO

Open Data Science via YouTube Direct link

- How to fine-tune them with RLHF

3 of 6

3 of 6

- How to fine-tune them with RLHF

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Reinforcement Learning with Human Feedback - Understanding LLM Fine-tuning with PPO and DPO

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Introduction
  2. 2 - Large Language ModelsTransformers
  3. 3 - How to fine-tune them with RLHF
  4. 4 - Quick intro to reinforcement learning
  5. 5 - PPO reinforcement learning technique to fine-tune LLMs
  6. 6 - DPOnon-reinforcement learning technique to fine-tune LLMs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.