Completed
- Number of Parameters
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Synthesizer - Rethinking Self-Attention in Transformer Models
Automatically move to the next video in the Classroom when playback concludes
- 1 - Intro & High Level Overview
- 2 - Abstract
- 3 - Attention Mechanism as Information Routing
- 4 - Dot Product Attention
- 5 - Dense Synthetic Attention
- 6 - Random Synthetic Attention
- 7 - Comparison to Feed-Forward Layers
- 8 - Factorization & Mixtures
- 9 - Number of Parameters
- 10 - Machine Translation & Language Modeling Experiments
- 11 - Summarization & Dialogue Generation Experiments
- 12 - GLUE & SuperGLUE Experiments
- 13 - Weight Sizes & Number of Head Ablations
- 14 - Conclusion