Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Alpa - Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

USENIX via YouTube

Overview

Explore an innovative approach to automating model-parallel training for large deep learning models in this 18-minute conference talk from OSDI '22. Discover how Alpa generates execution plans that unify data, operator, and pipeline parallelism, addressing the limitations of existing model-parallel training systems. Learn about Alpa's hierarchical view of parallelisms, its new space for massive model-parallel execution plans, and the compilation passes designed to derive efficient parallel execution plans. Understand how Alpa's runtime orchestrates two-level parallel execution on distributed compute devices, and examine its performance compared to hand-tuned systems. Gain insights into Alpa's versatility in handling models with heterogeneous architectures and those without manually-designed plans. Access the source code and explore the potential of this groundbreaking approach to scaling out complex deep learning models on distributed systems.

Syllabus

OSDI '22 - Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Taught by

USENIX

Reviews

Start your review of Alpa - Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.