Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

How to Code Long-Context LLMs - LongLoRA Implementation with Llama 2 100K

Discover AI via YouTube

Overview

Learn to implement and understand long-context Large Language Models through this technical tutorial video that explains LongLoRA's application on LLama 2 100K. Dive into essential concepts including Flash Attention 2, vision transformers, and rotary positional embedding while exploring the theoretical foundations and practical implementation details of extending LLM context lengths. Master the technical aspects of transformer architecture, embedded normalization layers, and model tokenization necessary for working with extended context lengths like Claude 100K, ChatGPT 32K, and LLama2 100K. Explore performance figures, scientific preprints, and understand why certain architectural choices impact long sequence processing in LLMs. Follow along with code examples optimized for Flash Attention 2 to implement these concepts in your own projects, particularly useful when dealing with lengthy scientific articles exceeding 32K or 64K tokens.

Syllabus

Introduction
Flash Attention
What is LongLoRA
Vision Transformers
Simplest solution
LongLoRA
Why is this happening
Scientific Preprint
Transformer Architecture
Performance figures
Summary
LongLoRA repo
Rotary positional embedding
Model tokenizer
Embedded normalization layers

Taught by

Discover AI

Reviews

Start your review of How to Code Long-Context LLMs - LongLoRA Implementation with Llama 2 100K

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.