Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

LongRoPE and Theta Scaling for Extended Context Length in LLMs - Part 2

Discover AI via YouTube

Overview

Explore an in-depth technical video examining the scientific details of LongRoPE and Theta Extrapolation scaling methods for extending context lengths in Large Language Models. Learn how to increase context windows from 8K to 4M tokens using a Llama 3-7B LLM architecture. Understand the challenges of RoPE encoding when sequence lengths exceed training parameters and discover how theta scaling addresses these limitations by adjusting the rotary base parameter. Examine both increasing and decreasing rotary base approaches for enhancing model extrapolation capabilities and maintaining performance with longer sequences. Gain insights into how positional encodings can be optimized to handle out-of-distribution scenarios and improve attention mechanism stability for extended context processing.

Syllabus

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

Taught by

Discover AI

Reviews

Start your review of LongRoPE and Theta Scaling for Extended Context Length in LLMs - Part 2

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.