Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Stanford University

Stanford Seminar - Mixture of Experts Paradigm and the Switch Transformer

Stanford University via YouTube

Overview

Explore the groundbreaking Mixture of Experts (MoE) paradigm and the Switch Transformer in this Stanford seminar. Delve into how MoE challenges traditional deep learning models by selecting different parameters for each input, resulting in sparsely-activated models with vast numbers of parameters but constant computational cost. Learn about the simplification of MoE routing algorithms, improved model designs with reduced communication and computational costs, and innovative training techniques that address instabilities. Discover how large sparse models can be trained using lower precision formats, leading to significant increases in pre-training speed. Examine the application of these improvements in multilingual settings and the advancement of language model scale to trillion-parameter models. Gain insights from research scientists Barret Zoph and Irwan Bello as they discuss their work on various deep learning topics, including neural architecture search, data augmentation, semi-supervised learning, and model sparsity.

Syllabus

CS25 I Stanford Seminar 2022 - Mixture of Experts (MoE) paradigm and the Switch Transformer

Taught by

Stanford Online

Reviews

Start your review of Stanford Seminar - Mixture of Experts Paradigm and the Switch Transformer

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.