Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Transformer Architecture: Understanding Self-Attention and Positional Encoding

Donato Capitella via YouTube

Overview

Explore the revolutionary Transformer architecture in this 15-minute technical video that delves into the groundbreaking 2017 innovation by Google researchers. Learn about the fundamental concepts of self-attention and positional encoding that eliminated the need for recurrence while enhancing parallelization capabilities. Master key architectural components including masked self-attention in decoders, residual connections, and layer normalization through detailed explanations and accompanying mindmaps. Discover how these elements work together to create the complete Transformer architecture that has become foundational to modern language models. Access supplementary materials including comprehensive mindmaps and summaries through provided download links, while following a well-structured progression through topics from basic attention mechanisms to advanced architectural features.

Syllabus

- Attention is all you need
- Attention makes recurrence redundant
- Removing recurrence
- Self-Attention
- Advantage of Self-Attention parallel processing
- Positional Encoding
- Masked Self-Attention in the decoder
- Residual connections
- Layer Normalization
- The full Transformer architecture

Taught by

Donato Capitella

Reviews

Start your review of The Transformer Architecture: Understanding Self-Attention and Positional Encoding

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.