Accelerating Distributed MoE Training and Inference with Lina

Overview

Explore a conference talk that delves into accelerating distributed Mixture of Experts (MoE) training and inference using Lina. Learn about the challenges of scaling model parameters and the potential of sparsely activated models to train larger models at lower costs. Discover the systematic analysis of all-to-all communication overhead in distributed MoE and understand the main causes of bottlenecks in training and inference. Examine Lina's innovative approach to addressing these bottlenecks through tensor partitioning and dynamic resource scheduling. Gain insights into how Lina improves training step time and reduces inference time compared to state-of-the-art systems, as demonstrated through experiments on an A100 GPU testbed.

Syllabus

USENIX ATC '23 - Accelerating Distributed MoE Training and Inference with Lina

Taught by

USENIX

Reviews

Start your review of Accelerating Distributed MoE Training and Inference with Lina

Taught by

Pre-training Mixtral MoE Model with SageMaker HyperPod - Fine-Tuning and Continued Pre-Training

Metis - Fast Automatic Distributed Training on Heterogeneous GPUs

Legion - Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training

BGL - GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Never Stop Learning.