Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Mixture of Experts (MoE) in Large Language Models - A Simple Guide

Discover AI via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn about Mixture of Experts (MoE) systems in Large Language Models through a 23-minute educational video that breaks down complex concepts using three straightforward examples. Explore the evolution of MoE systems from their 2017 Google Brain origins to current implementations, including detailed explanations of sparsely activated expert systems and their role in computational efficiency. Dive into technical aspects of gating networks, including softmax and noisy top-k gating functions, while understanding how backpropagation trains these systems. Examine the revolutionary 'megablocks' development from 2022, which enhanced GPU performance through block sparse mathematical operations. Study the Mixtral 8x7B configuration details and architecture, including its 4096 dimension, 32 layers, and 8-expert system. Access recommended academic papers on MEGABLOCKS and sparsely-gated MoE layers, along with practical implementation resources through the MegaBlocks GitHub repository. Gain insights into data parallelism, model parallelism, and the latest trends in MoE systems, including instruction tuning advancements from 2023.

Syllabus

Mixture of Experts LLM - MoE explained in simple terms

Taught by

Discover AI

Reviews

Start your review of Mixture of Experts (MoE) in Large Language Models - A Simple Guide

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.