Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Yannic Kilcher via YouTube Direct link

- When are these models appropriate?

8 of 19

8 of 19

- When are these models appropriate?

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Sparse Expert Models - Switch Transformers, GLAM, and More With the Authors

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro
  2. 2 - What are sparse expert models?
  3. 3 - Start of Interview
  4. 4 - What do you mean by sparse experts?
  5. 5 - How does routing work in these models?
  6. 6 - What is the history of sparse experts?
  7. 7 - What does an individual expert learn?
  8. 8 - When are these models appropriate?
  9. 9 - How comparable are sparse to dense models?
  10. 10 - How does the pathways system connect to this?
  11. 11 - What improvements did GLAM make?
  12. 12 - The "designing sparse experts" paper
  13. 13 - Can experts be frozen during training?
  14. 14 - Can the routing function be improved?
  15. 15 - Can experts be distributed beyond data centers?
  16. 16 - Are there sparse experts for other domains than NLP?
  17. 17 - Are sparse and dense models in competition?
  18. 18 - Where do we go from here?
  19. 19 - How can people get started with this?

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.