Direct Preference Optimization - Fine-Tuning LLMs Without Reinforcement Learning

Direct Preference Optimization - Fine-Tuning LLMs Without Reinforcement Learning

Serrano.Academy via YouTube Direct link

Conclusion

6 of 6

6 of 6

Conclusion

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Direct Preference Optimization - Fine-Tuning LLMs Without Reinforcement Learning

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Introduction
  2. 2 RLHF vs DPO
  3. 3 The Bradley-Terry Model
  4. 4 KL Divergence
  5. 5 The Loss Function
  6. 6 Conclusion

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.