Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the world of Sparse Expert Models in this comprehensive interview with Google Brain researchers Barret Zoph and William Fedus. Delve into the fundamentals, history, strengths, and weaknesses of these innovative models, including Switch Transformers and GLAM, which can scale up to trillions of parameters. Learn how sparse expert models distribute parts of Transformers across large arrays of machines, using routing functions to efficiently activate only specific parts of the model. Discover the advantages of this approach, its applications in natural language processing, and potential future developments. Gain insights into the comparison between sparse and dense models, the improvements made by GLAM, and the possibilities of distributing experts beyond data centers. Whether you're a machine learning enthusiast or a seasoned researcher, this in-depth discussion provides valuable knowledge on the current state of the art in sparse expert models and their potential impact on the field of artificial intelligence.

Syllabus

- Intro
- What are sparse expert models?
- Start of Interview
- What do you mean by sparse experts?
- How does routing work in these models?
- What is the history of sparse experts?
- What does an individual expert learn?
- When are these models appropriate?
- How comparable are sparse to dense models?
- How does the pathways system connect to this?
- What improvements did GLAM make?
- The "designing sparse experts" paper
- Can experts be frozen during training?
- Can the routing function be improved?
- Can experts be distributed beyond data centers?
- Are there sparse experts for other domains than NLP?
- Are sparse and dense models in competition?
- Where do we go from here?
- How can people get started with this?

Taught by

Yannic Kilcher

Reviews

Start your review of Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Taught by

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

How Well Do Sparse Models Transfer? - Exploring Transfer Performance in Computer Vision and NLP

Never Stop Learning.