INFINI Attention: Efficient Infinite Context Transformers with 1 Million Token Context Length
Discover AI via YouTube
Overview
Explore a technical video presentation detailing Google's innovative Infini-attention transformer architecture, designed to handle context lengths of up to 1 million tokens. Learn about the integration of compressive memory components within vanilla attention mechanisms, allowing models to store and retrieve historical key-value states efficiently. Understand the technical challenges and solutions around information compression, implementation complexity, and performance optimization. Dive into detailed mathematical explanations of memory updates and retrieval processes, benchmark data analysis, and explore the relationship between Infini-attention and internal RAG systems. The presentation concludes with insights into TransformerFAM with Feedback attention and includes a simplified summary for beginners. Based on the research paper "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention," this comprehensive breakdown covers everything from basic concepts to advanced mathematical implementations.
Syllabus
Infinite context length of LLM
INFINI paper by Google
Matrix Memory of limited size
Update memory simple
Retrieve memory simple
Update memory maths
Retrieve memory maths
Infini attention w/ internal RAG?
Benchmark data
Summary for green grasshoppers
TransformerFAM w/ Feedback attention
Taught by
Discover AI