Overview
Explore the groundbreaking ∞-former (Infinity-Former) model in this comprehensive video explanation of a research paper. Dive into how this innovative approach extends vanilla Transformers with an unbounded long-term memory, allowing for processing of arbitrarily long sequences. Learn about the continuous attention mechanism that enables attention complexity independent of context length, and discover the concept of "sticky memories" for highlighting important past events. Follow along as the video breaks down the problem statement, architecture, and experimental results, including applications in language modeling. Gain insights into the pros and cons of using heuristics and understand how this model addresses long-range dependencies in sequence tasks.
Syllabus
- Intro & Overview
- Sponsor Spot: Weights & Biases
- Problem Statement
- Continuous Attention Mechanism
- Unbounded Memory via concatenation & contraction
- Does this make sense?
- How the Long-Term Memory is used in an attention layer
- Entire Architecture Recap
- Sticky Memories by Importance Sampling
- Commentary: Pros and cons of using heuristics
- Experiments & Results
Taught by
Yannic Kilcher