Pretrained Transformers as Universal Computation Engines

Pretrained Transformers as Universal Computation Engines

Yannic Kilcher via YouTube Direct link

- Is Initialization All You Need?

10 of 13

10 of 13

- Is Initialization All You Need?

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Pretrained Transformers as Universal Computation Engines

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro & Overview
  2. 2 - Frozen Pretrained Transformers
  3. 3 - Evaluated Tasks
  4. 4 - The Importance of Training LayerNorm
  5. 5 - Modality Transfer
  6. 6 - Network Architecture Ablation
  7. 7 - Evaluation of the Attention Mask
  8. 8 - Are FPTs Overfitting or Underfitting?
  9. 9 - Model Size Ablation
  10. 10 - Is Initialization All You Need?
  11. 11 - Full Model Training Overfits
  12. 12 - Again the Importance of Training LayerNorm
  13. 13 - Conclusions & Comments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.