CMU Multilingual NLP - Machine Translation-Sequence-to-Sequence Models

CMU Multilingual NLP - Machine Translation-Sequence-to-Sequence Models

Graham Neubig via YouTube Direct link

Transformer Training Tricks

23 of 26

23 of 26

Transformer Training Tricks

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

CMU Multilingual NLP - Machine Translation-Sequence-to-Sequence Models

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Language Models • Language models are generative models of text
  3. 3 Conditioned Language Models
  4. 4 Calculating the Probability of a Sentence
  5. 5 Conditional Language Models
  6. 6 One Type of Language Model Mikolov et al. 2011
  7. 7 How to Pass Hidden State?
  8. 8 The Generation Problem
  9. 9 Ancestral Sampling
  10. 10 Greedy Search
  11. 11 Beam Search
  12. 12 Sentence Representations
  13. 13 Calculating Attention (1)
  14. 14 A Graphical Example
  15. 15 Attention Score Functions (1)
  16. 16 Attention is not Alignment! (Koehn and Knowles 2017)
  17. 17 Coverage
  18. 18 Multi-headed Attention
  19. 19 Supervised Training (Liu et al. 2016)
  20. 20 Self Attention (Cheng et al. 2016) • Each element in the sentence attends to other
  21. 21 Why Self Attention?
  22. 22 Transformer Attention Tricks
  23. 23 Transformer Training Tricks
  24. 24 Masking for Training . We want to perform training in as few operations as possible using big matrix multiplies
  25. 25 A Unified View of Sequence- to-sequence Models
  26. 26 Code Walk

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.