The Transformer Architecture: Understanding Self-Attention and Positional Encoding

The Transformer Architecture: Understanding Self-Attention and Positional Encoding

Donato Capitella via YouTube Direct link

- Attention makes recurrence redundant

2 of 10

2 of 10

- Attention makes recurrence redundant

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

The Transformer Architecture: Understanding Self-Attention and Positional Encoding

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Attention is all you need
  2. 2 - Attention makes recurrence redundant
  3. 3 - Removing recurrence
  4. 4 - Self-Attention
  5. 5 - Advantage of Self-Attention parallel processing
  6. 6 - Positional Encoding
  7. 7 - Masked Self-Attention in the decoder
  8. 8 - Residual connections
  9. 9 - Layer Normalization
  10. 10 - The full Transformer architecture

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.