Completed
illustration of Overall Training Performance
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Deep Learning Training in Shared Clusters
- 3 Example Shared-Cluster DL Training Workflow
- 4 Pollux: Co-adaptive Cluster Scheduler for DL
- 5 Outline
- 6 Background: Distributed DL (Data Parallelism)
- 7 System Throughput and Impact of Batch Size
- 8 Statistical Efficiency and Impact of Batch Size
- 9 illustration of Overall Training Performance
- 10 Implications for Cluster Scheduling
- 11 Pollux Cluster Scheduler
- 12 Key Idea: Goodput, not Throughput
- 13 Modeling System Throughput
- 14 Modeling Statistical Efficiency
- 15 Optimizing Cluster-Wide Allocations
- 16 Evaluation of Pollux
- 17 Cluster-Wide Statistical Efficiency
- 18 More Experiments in our Paper!
- 19 Conclusion