Completed
- How does routing work in these models?
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors
Automatically move to the next video in the Classroom when playback concludes
- 1 - Intro
- 2 - What are sparse expert models?
- 3 - Start of Interview
- 4 - What do you mean by sparse experts?
- 5 - How does routing work in these models?
- 6 - What is the history of sparse experts?
- 7 - What does an individual expert learn?
- 8 - When are these models appropriate?
- 9 - How comparable are sparse to dense models?
- 10 - How does the pathways system connect to this?
- 11 - What improvements did GLAM make?
- 12 - The "designing sparse experts" paper
- 13 - Can experts be frozen during training?
- 14 - Can the routing function be improved?
- 15 - Can experts be distributed beyond data centers?
- 16 - Are there sparse experts for other domains than NLP?
- 17 - Are sparse and dense models in competition?
- 18 - Where do we go from here?
- 19 - How can people get started with this?