Completed
Direct Preference Optimisation
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique
Automatically move to the next video in the Classroom when playback concludes
- 1 Direct Preference Optimisation
- 2 Video Overview
- 3 How does “normal” fine-tuning work?
- 4 How does DPO work?
- 5 DPO Datasets: UltraChat
- 6 DPO Datasets: Helpful and Harmless
- 7 DPO vs RLHF
- 8 Required datasets and SFT models
- 9 DPO Notebook Run through
- 10 DPO Evaluation Results
- 11 Weights and Biases Results Interpretation
- 12 Runpod Setup for 1 epoch Training Run
- 13 Resources