Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Alpa: Simple Large Model Training and Inference on Ray

Anyscale via YouTube

Overview

Explore the capabilities of Alpa, a Ray-native library designed for automated training and serving of large models like GPT-3. Discover how Alpa simplifies model-parallel training of complex deep learning models by generating execution plans that unify data, operator, and pipeline parallelism. Learn about Alpa's innovative approach to distributing training across two hierarchical levels of parallelism: inter-operator and intra-operator. Understand how Alpa constructs a new hierarchical space for massive model-parallel execution plans and uses compilation passes to derive optimal parallel execution plans. Examine Alpa's efficient runtime that orchestrates two-level parallel execution on distributed compute devices. Compare Alpa's performance to hand-tuned model-parallel training systems and explore its versatility in handling models with heterogeneous architectures. Delve into both the algorithmic aspects and the engineering/system implementation, with a focus on Ray's crucial role as a building block of the Alpa runtime. This 31-minute talk from Anyscale at Ray Summit provides valuable insights into advanced techniques for scaling out complex deep learning models on distributed computing environments.

Syllabus

Alpa - Simple large model training and inference on Ray

Taught by

Anyscale

Reviews

Start your review of Alpa: Simple Large Model Training and Inference on Ray

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.