RecurrentGemma: Moving Past Transformers with Griffin Architecture for Long Context Length
Discover AI via YouTube
Overview
Explore a comprehensive technical video that delves into Google's groundbreaking RecurrentLLM architecture with Griffin, presenting a significant shift from traditional transformer-based models. Learn about the innovative RecurrentGemma-2B model, which achieves an impressive throughput of 6000 tokens per second while maintaining performance comparable to transformer-based Gemma 2B. Discover the technical intricacies of new architectures like GRIFFIN and HAWK, with detailed explanations of their advantages over State Space Models such as Mamba-S6. Master concepts including local attention mechanisms, linear recurrences, GRU (Gated Recurrent Unit), LRU (Linear Recurrent Unit), and RG-LRU (Real-Gated Linear Recurrent Unit). Gain insights into the model's fixed-size state architecture, which offers superior memory efficiency for long sequences compared to traditional transformer models' growing key-value cache. Examine performance benchmarks, practical implementations through Github code examples, and understand how this architectural innovation maintains high throughput regardless of sequence length while requiring 33% fewer training tokens than its transformer counterpart.
Syllabus
Llama 3 inference and finetuning
New Language Model Dev
Local Attention
Linear complexity of RNN
Gated recurrent unit - GRU
Linear recurrent Unit - LRU
GRIFFIN architecture
Real-Gated Linear recurrent unit RG-LRU
Griffin Key Features
RecurrentGemma
Github code
Performance benchmark
Taught by
Discover AI