LongRoPE and Theta Scaling for Extended Context Length in LLMs - Part 2

Overview

Explore an in-depth technical video examining the scientific details of LongRoPE and Theta Extrapolation scaling methods for extending context lengths in Large Language Models. Learn how to increase context windows from 8K to 4M tokens using a Llama 3-7B LLM architecture. Understand the challenges of RoPE encoding when sequence lengths exceed training parameters and discover how theta scaling addresses these limitations by adjusting the rotary base parameter. Examine both increasing and decreasing rotary base approaches for enhancing model extrapolation capabilities and maintaining performance with longer sequences. Gain insights into how positional encodings can be optimized to handle out-of-distribution scenarios and improve attention mechanism stability for extended context processing.

Syllabus

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

Taught by

Discover AI

Reviews

Start your review of LongRoPE and Theta Scaling for Extended Context Length in LLMs - Part 2

Taught by

How to Code Long-Context LLMs - LongLoRA Implementation with Llama 2 100K

Uncertainty, Prompting, and Chain-of-Thoughts in Large Language Models - Part 2

Never Stop Learning.