Completed
- Masked Self-Attention in the decoder
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
The Transformer Architecture: Understanding Self-Attention and Positional Encoding
Automatically move to the next video in the Classroom when playback concludes
- 1 - Attention is all you need
- 2 - Attention makes recurrence redundant
- 3 - Removing recurrence
- 4 - Self-Attention
- 5 - Advantage of Self-Attention parallel processing
- 6 - Positional Encoding
- 7 - Masked Self-Attention in the decoder
- 8 - Residual connections
- 9 - Layer Normalization
- 10 - The full Transformer architecture