Completed
What is Attention: Scaling
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Transformer Encoder in 100 Lines of Code
Automatically move to the next video in the Classroom when playback concludes
- 1 What we will cover
- 2 Introducing Colab
- 3 Word Embeddings and d_model
- 4 What are Attention heads?
- 5 What is Dropout?
- 6 Why batch data?
- 7 How to sentences into the transformer?
- 8 Why feed forward layers in transformer?
- 9 Why Repeating Encoder layers?
- 10 The “Encoder” Class, nn.Module, nn.Sequential
- 11 The “EncoderLayer” Class
- 12 What is Attention: Query, Key, Value vectors
- 13 What is Attention: Matrix Transpose in PyTorch
- 14 What is Attention: Scaling
- 15 What is Attention: Masking
- 16 What is Attention: Softmax
- 17 What is Attention: Value Tensors
- 18 CRUX OF VIDEO: “MultiHeadAttention” Class
- 19 Returning the flow back to “EncoderLayer” Class
- 20 Layer Normalization
- 21 Returning the flow back to “EncoderLayer” Class
- 22 Feed Forward Layers
- 23 Why Activation Functions?
- 24 Finish the Flow of Encoder
- 25 Conclusion & Decoder for next video