Transformer Encoder in 100 Lines of Code

Transformer Encoder in 100 Lines of Code

CodeEmporium via YouTube Direct link

Why Activation Functions?

23 of 25

23 of 25

Why Activation Functions?

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Transformer Encoder in 100 Lines of Code

Automatically move to the next video in the Classroom when playback concludes

  1. 1 What we will cover
  2. 2 Introducing Colab
  3. 3 Word Embeddings and d_model
  4. 4 What are Attention heads?
  5. 5 What is Dropout?
  6. 6 Why batch data?
  7. 7 How to sentences into the transformer?
  8. 8 Why feed forward layers in transformer?
  9. 9 Why Repeating Encoder layers?
  10. 10 The “Encoder” Class, nn.Module, nn.Sequential
  11. 11 The “EncoderLayer” Class
  12. 12 What is Attention: Query, Key, Value vectors
  13. 13 What is Attention: Matrix Transpose in PyTorch
  14. 14 What is Attention: Scaling
  15. 15 What is Attention: Masking
  16. 16 What is Attention: Softmax
  17. 17 What is Attention: Value Tensors
  18. 18 CRUX OF VIDEO: “MultiHeadAttention” Class
  19. 19 Returning the flow back to “EncoderLayer” Class
  20. 20 Layer Normalization
  21. 21 Returning the flow back to “EncoderLayer” Class
  22. 22 Feed Forward Layers
  23. 23 Why Activation Functions?
  24. 24 Finish the Flow of Encoder
  25. 25 Conclusion & Decoder for next video

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.