Sparse Is Enough in Scaling Transformers - ML Research Paper Explained

Sparse Is Enough in Scaling Transformers - ML Research Paper Explained

Yannic Kilcher via YouTube Direct link

- Recap: Transformer stack

2 of 6

2 of 6

- Recap: Transformer stack

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Sparse Is Enough in Scaling Transformers - ML Research Paper Explained

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro & Overview
  2. 2 - Recap: Transformer stack
  3. 3 - Sparse Feedforward layer
  4. 4 - Sparse QKV Layer
  5. 5 - Terraformer architecture
  6. 6 - Experimental Results & Conclusion

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.