Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the concept of Mixture of Experts (MoE) in this 28-minute video lecture. Delve into the rationale behind MoE, its training process, and potential challenges. Learn about techniques like adding noise during training and adjusting loss functions for router evenness. Examine the applicability of MoE for large language models on laptops and its potential benefits for major AI companies. Investigate the binary tree MoE (fast feed forward) approach and compare performance data between GPT, MoE, and FFF models. Analyze the inference speed improvements with binary tree MoE and evaluate the overall viability of MoE in various contexts. Gain insights into why large companies might adopt MoE technology for their AI systems.
Syllabus
GPT-3, GPT-4 and Mixture of Experts
Why Mixture of Experts?
The idea behind Mixture of Experts
How to train MoE
Problems training MoE
Adding noise during training
Adjusting the loss function for router evenness
Is MoE useful for LLMs on laptops?
How might MoE help big companies like OpenAI?
Disadvantages of MoE
Binary tree MoE fast feed forward
Data on GPT vs MoE vs FFF
Inference speed up with binary tree MoE
Recap - Does MoE make sense?
Why might big companies use MoE?
Taught by
Trelis Research