Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

USENIX via YouTube

Overview

Explore a cutting-edge approach to deep learning cluster scheduling in this 14-minute conference talk from OSDI '21. Dive into Pollux, a co-adaptive cluster scheduler that optimizes goodput in deep learning environments. Learn how this innovative system simultaneously considers per-job and cluster-wide factors to improve resource allocation and utilization. Discover the novel goodput metric that combines system throughput with statistical efficiency, and understand how Pollux dynamically reassigns resources to enhance overall cluster performance. Gain insights into the system's ability to reduce average job completion times, promote fairness, and potentially lower costs in cloud environments. Examine the background of distributed deep learning, the impact of batch size on system throughput and statistical efficiency, and the key components of Pollux's cluster scheduler. Delve into the evaluation results and broader implications of this groundbreaking approach to deep learning cluster management.

Syllabus

Intro
Deep Learning Training in Shared Clusters
Example Shared-Cluster DL Training Workflow
Pollux: Co-adaptive Cluster Scheduler for DL
Outline
Background: Distributed DL (Data Parallelism)
System Throughput and Impact of Batch Size
Statistical Efficiency and Impact of Batch Size
illustration of Overall Training Performance
Implications for Cluster Scheduling
Pollux Cluster Scheduler
Key Idea: Goodput, not Throughput
Modeling System Throughput
Modeling Statistical Efficiency
Optimizing Cluster-Wide Allocations
Evaluation of Pollux
Cluster-Wide Statistical Efficiency
More Experiments in our Paper!
Conclusion

Taught by

USENIX

Reviews

Start your review of Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.