Completed
- Again the Importance of Training LayerNorm
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Pretrained Transformers as Universal Computation Engines
Automatically move to the next video in the Classroom when playback concludes
- 1 - Intro & Overview
- 2 - Frozen Pretrained Transformers
- 3 - Evaluated Tasks
- 4 - The Importance of Training LayerNorm
- 5 - Modality Transfer
- 6 - Network Architecture Ablation
- 7 - Evaluation of the Attention Mask
- 8 - Are FPTs Overfitting or Underfitting?
- 9 - Model Size Ablation
- 10 - Is Initialization All You Need?
- 11 - Full Model Training Overfits
- 12 - Again the Importance of Training LayerNorm
- 13 - Conclusions & Comments