Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Understanding Mixture of Experts in Large Language Models

Trelis Research via YouTube

Overview

Explore the concept of Mixture of Experts (MoE) in this 28-minute video lecture. Delve into the rationale behind MoE, its training process, and potential challenges. Learn about techniques like adding noise during training and adjusting loss functions for router evenness. Examine the applicability of MoE for large language models on laptops and its potential benefits for major AI companies. Investigate the binary tree MoE (fast feed forward) approach and compare performance data between GPT, MoE, and FFF models. Analyze the inference speed improvements with binary tree MoE and evaluate the overall viability of MoE in various contexts. Gain insights into why large companies might adopt MoE technology for their AI systems.

Syllabus

GPT-3, GPT-4 and Mixture of Experts
Why Mixture of Experts?
The idea behind Mixture of Experts
How to train MoE
Problems training MoE
Adding noise during training
Adjusting the loss function for router evenness
Is MoE useful for LLMs on laptops?
How might MoE help big companies like OpenAI?
Disadvantages of MoE
Binary tree MoE fast feed forward
Data on GPT vs MoE vs FFF
Inference speed up with binary tree MoE
Recap - Does MoE make sense?
Why might big companies use MoE?

Taught by

Trelis Research

Reviews

Start your review of Understanding Mixture of Experts in Large Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.