Advanced Fine-Tuning Techniques for Long Context Summarization

Overview

Explore advanced techniques for fine-tuning language models to handle long context summarization in this 32-minute video tutorial. Learn three key tricks: increasing rope_theta, training norm and embed layers, and training on prompt + response pairs. Dive into the process of creating a long context summarization dataset, walk through a fine-tuning script, and set up prompts for effective summarization. Compare the performance of raw Mistral 16k with fine-tuned versions, and examine the effects of increasing rope theta and training on summarization tasks. Analyze the 64k context summarization performance against Yi 6B and evaluate the passkey retrieval capabilities of Mistral 64k. Access provided resources, including scripts, models, and datasets, to enhance your understanding and implementation of long context summarization techniques.

Syllabus

Fine-tuning for long context and summarisation
Video overview
Trick One: Increase rope_theta
Trick Two: Train norm and embed
Trick Three: Train on prompt + response
Long context and summarisation dataset
Fine-tuning script walk through
Prompt setup for summarisation
Raw Mistral 16k summarisation performance
Effect of increasing rope theta
Effect of training on summarization
64k context summarisation performance vs Yi 6B
Passkey retrieval performance of Mistral 64k
Resources