Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Coding RLHF on LLama 2 with LoRA, 4-bit Quantization, TRL and DPO

Discover AI via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn to implement Reinforcement Learning from Human Feedback (RLHF) in this comprehensive tutorial video that demonstrates Python coding techniques for fine-tuning LLama 2 models using both traditional and modern approaches. Master the implementation of Stanford University's Direct Preference Optimization (DPO) method as an alternative to Proximal Policy Optimization (PPO), while incorporating 4-bit quantization and Low-Rank Adaptation (LoRA) techniques. Explore detailed code examples for Supervised Fine-tuning of LLama2 models with 4-bit quantization, implement DPO-Trainer using HuggingFace's toolkit with PEFT and LoRA, and understand the complete workflow from supervised fine-tuning to reward modeling and reinforcement learning training. Compare implementations between LLama 1 and LLama 2 models while learning to optimize model performance through various quantization and adaptation techniques.

Syllabus

How to Code RLHF on LLama2 w/ LoRA, 4-bit, TRL, DPO

Taught by

Discover AI

Reviews

Start your review of Coding RLHF on LLama 2 with LoRA, 4-bit Quantization, TRL and DPO

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.