Combined Preference and Supervised Fine-Tuning with ORPO

Combined Preference and Supervised Fine-Tuning with ORPO

Trelis Research via YouTube Direct link

Understanding cross-entropy and odds ratio loss functions

5 of 11

5 of 11

Understanding cross-entropy and odds ratio loss functions

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Combined Preference and Supervised Fine-Tuning with ORPO

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Preference and Supervised Fine-tuning at the Same Time!
  2. 2 A short history of fine-tuning methods
  3. 3 Video Overview/Agenda
  4. 4 Difference between Unsupervised, Supervised and Preferences
  5. 5 Understanding cross-entropy and odds ratio loss functions
  6. 6 Why preference fine-tuning improves performance
  7. 7 Notebook demo of SFT and ORPO
  8. 8 Evaluation with lm-evaluation-harness
  9. 9 Results: Comparing SFT and ORPO with gsm8k, arithmetic and mmlu
  10. 10 Evaluation with Carlini's practical benchmark
  11. 11 Is it worth doing ORPO? Yes!

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.