Overview
Syllabus
- Intro
- What are sparse expert models?
- Start of Interview
- What do you mean by sparse experts?
- How does routing work in these models?
- What is the history of sparse experts?
- What does an individual expert learn?
- When are these models appropriate?
- How comparable are sparse to dense models?
- How does the pathways system connect to this?
- What improvements did GLAM make?
- The "designing sparse experts" paper
- Can experts be frozen during training?
- Can the routing function be improved?
- Can experts be distributed beyond data centers?
- Are there sparse experts for other domains than NLP?
- Are sparse and dense models in competition?
- Where do we go from here?
- How can people get started with this?
Taught by
Yannic Kilcher