Completed
Binary tree MoE fast feed forward
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Understanding Mixture of Experts in Large Language Models
Automatically move to the next video in the Classroom when playback concludes
- 1 GPT-3, GPT-4 and Mixture of Experts
- 2 Why Mixture of Experts?
- 3 The idea behind Mixture of Experts
- 4 How to train MoE
- 5 Problems training MoE
- 6 Adding noise during training
- 7 Adjusting the loss function for router evenness
- 8 Is MoE useful for LLMs on laptops?
- 9 How might MoE help big companies like OpenAI?
- 10 Disadvantages of MoE
- 11 Binary tree MoE fast feed forward
- 12 Data on GPT vs MoE vs FFF
- 13 Inference speed up with binary tree MoE
- 14 Recap - Does MoE make sense?
- 15 Why might big companies use MoE?