Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique

Trelis Research via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Direct Preference Optimization (DPO), a cutting-edge technique in machine learning, through this comprehensive 43-minute video tutorial by Trelis Research. Learn how DPO differs from traditional fine-tuning methods and compares to RLHF. Dive into practical applications using datasets like UltraChat and Anthropic's Helpful and Harmless. Follow along with a detailed DPO notebook run-through, interpret evaluation results using Weights and Biases, and set up Runpod for a one-epoch training run. Gain access to valuable resources, including Google Slides, datasets, and scripts to enhance your understanding and implementation of DPO in advanced fine-tuning projects.

Syllabus

Direct Preference Optimisation
Video Overview
How does “normal” fine-tuning work?
How does DPO work?
DPO Datasets: UltraChat
DPO Datasets: Helpful and Harmless
DPO vs RLHF
Required datasets and SFT models
DPO Notebook Run through
DPO Evaluation Results
Weights and Biases Results Interpretation
Runpod Setup for 1 epoch Training Run
Resources

Taught by

Trelis Research

Reviews

Start your review of Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.