Fine-Tuning Self-Rewarding Language Models with Mistral 7B

Overview

Learn how to fine-tune Mistral 7B with Meta's Self-Rewarding Language Model in this comprehensive technical tutorial video. Explore the self-rewarding language architecture, understand the fine-tuning process using LoRA, and follow along with practical demonstrations of prompt generation and scoring. Master the implementation details including supervised fine-tuning scripts, data preparation, evaluation methods, and configuration settings. Watch live demonstrations of prompt generation and DPO (Direct Preference Optimization) runs while gaining insights into compute requirements and cost considerations. Access provided code repositories, datasets, and additional resources to implement self-rewarding language models in your own projects. Connect with the Oxen community through Discord and join their Arxiv Dives series for more in-depth AI discussions.

Syllabus

Intro
Self-Rewarding Language Architecture
Fine-Tuning Scripts
Data for Fine-Tuning
Supervised Fine-Tuning Script
High Lora Alpha and Quantization
Evaluation Fine-Tuning Data
Generating New Prompts
Live Demo of Prompt Gen
Generating Responses
Generating Scores
Config, Compute, and Cost
Analyzing Scores
Live Run of DPO