Synthesizer - Rethinking Self-Attention in Transformer Models

Synthesizer - Rethinking Self-Attention in Transformer Models

Yannic Kilcher via YouTube Direct link

- Number of Parameters

9 of 14

9 of 14

- Number of Parameters

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Synthesizer - Rethinking Self-Attention in Transformer Models

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro & High Level Overview
  2. 2 - Abstract
  3. 3 - Attention Mechanism as Information Routing
  4. 4 - Dot Product Attention
  5. 5 - Dense Synthetic Attention
  6. 6 - Random Synthetic Attention
  7. 7 - Comparison to Feed-Forward Layers
  8. 8 - Factorization & Mixtures
  9. 9 - Number of Parameters
  10. 10 - Machine Translation & Language Modeling Experiments
  11. 11 - Summarization & Dialogue Generation Experiments
  12. 12 - GLUE & SuperGLUE Experiments
  13. 13 - Weight Sizes & Number of Head Ablations
  14. 14 - Conclusion

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.