Evolution of Transformer Architectures - From Attention to Modern Variants

Overview

Explore the evolution of neural attention mechanisms in this 25-minute technical video, starting from Bahnadau Attention and progressing through Self-Attention and Causal Masked Attention as introduced in the "Attention is all you need" paper. Dive deep into advanced implementations of Multi-Headed Attention, including Multi Query Attention and Grouped Query Attention, while learning about crucial innovations in Transformer and Large Language Model architectures such as KV Caching. Through detailed visualizations and graphics, gain a comprehensive understanding of language modeling, next word prediction, and various attention mechanisms that have shaped modern AI architectures. Master key concepts through a structured progression covering Self Attention, Causal Masked Attention, Multi-Headed Attention, KV Cache, Multi Query Attention, and Grouped Query Attention, with special emphasis on performance implications and architectural trade-offs.

Syllabus

Correction in the slide at - MHA has high latency runs slow MQA has low latency runs faster
- Intro
- Language Modeling and Next Word Prediction
- Self Attention
- Causal Masked Attention
- Multi Headed Attention
- KV Cache
- Multi Query Attention
- Grouped Query Attention

Taught by

Neural Breakdown with AVB

Reviews

Start your review of Evolution of Transformer Architectures - From Attention to Modern Variants

Taught by

From Attention to Generative Language Models - Building Transformers from Scratch

Understanding Self-Attention in Transformer Models - Part 2

The Complete Guide to Transformer Neural Networks

Blowing Up Transformer Decoder Architecture

Attention Is All You Need - Transformer Paper Explained

The Transformer Architecture: Understanding Self-Attention and Positional Encoding

10 Best Deep Learning Courses for 2024

Never Stop Learning.