Sparse Is Enough in Scaling Transformers - ML Research Paper Explained

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore an in-depth analysis of the research paper "Sparse is Enough in Scaling Transformers" in this comprehensive video lecture. Delve into the innovative Terraformer architecture, which leverages sparsity in Transformer blocks to significantly enhance inference speed while maintaining accuracy and reducing memory consumption. Learn about sparse variants for all Transformer layers, including the sparse feedforward and QKV layers. Discover how Scaling Transformers efficiently scale and perform unbatched decoding faster than standard Transformers. Examine experimental results and conclusions, gaining insights into the potential of sparse layers in achieving competitive performance on long text summarization tasks. Enhance your understanding of cutting-edge developments in Transformer models and their applications in natural language processing.

Syllabus

- Intro & Overview
- Recap: Transformer stack
- Sparse Feedforward layer
- Sparse QKV Layer
- Terraformer architecture
- Experimental Results & Conclusion

Taught by

Yannic Kilcher

Reviews

Start your review of Sparse Is Enough in Scaling Transformers - ML Research Paper Explained

Taught by

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Feedback Transformers - Addressing Some Limitations of Transformers with Feedback Memory

XCiT- Cross-Covariance Image Transformers - Facebook AI Machine Learning Research Paper Explained

Big Bird- Transformers for Longer Sequences

Transformers Explained - Part 1: Generative Music AI

TransGAN - Two Transformers Can Make One Strong GAN - Machine Learning Research Paper Explained

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.