Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Combined Preference and Supervised Fine-Tuning with ORPO

Trelis Research via YouTube

Overview

Explore advanced fine-tuning techniques in this comprehensive video tutorial on Combined Preference and Supervised Fine Tuning with ORPO. Learn about the evolution of fine-tuning methods, understand the differences between unsupervised, supervised, and preference-based approaches, and delve into cross-entropy and odds ratio loss functions. Discover why preference fine-tuning enhances performance through a hands-on notebook demonstration of SFT and ORPO. Evaluate the results using lm-evaluation-harness and compare SFT and ORPO performance across various benchmarks. Gain insights into the practical benefits of ORPO and access valuable resources for further exploration and implementation.

Syllabus

Preference and Supervised Fine-tuning at the Same Time!
A short history of fine-tuning methods
Video Overview/Agenda
Difference between Unsupervised, Supervised and Preferences
Understanding cross-entropy and odds ratio loss functions
Why preference fine-tuning improves performance
Notebook demo of SFT and ORPO
Evaluation with lm-evaluation-harness
Results: Comparing SFT and ORPO with gsm8k, arithmetic and mmlu
Evaluation with Carlini's practical benchmark
Is it worth doing ORPO? Yes!

Taught by

Trelis Research

Reviews

Start your review of Combined Preference and Supervised Fine-Tuning with ORPO

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.