Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique

Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique

Trelis Research via YouTube Direct link

Direct Preference Optimisation

1 of 13

1 of 13

Direct Preference Optimisation

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Direct Preference Optimisation
  2. 2 Video Overview
  3. 3 How does “normal” fine-tuning work?
  4. 4 How does DPO work?
  5. 5 DPO Datasets: UltraChat
  6. 6 DPO Datasets: Helpful and Harmless
  7. 7 DPO vs RLHF
  8. 8 Required datasets and SFT models
  9. 9 DPO Notebook Run through
  10. 10 DPO Evaluation Results
  11. 11 Weights and Biases Results Interpretation
  12. 12 Runpod Setup for 1 epoch Training Run
  13. 13 Resources

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.