Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

USENIX via YouTube Direct link

illustration of Overall Training Performance

9 of 19

9 of 19

illustration of Overall Training Performance

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Pollux - Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Deep Learning Training in Shared Clusters
  3. 3 Example Shared-Cluster DL Training Workflow
  4. 4 Pollux: Co-adaptive Cluster Scheduler for DL
  5. 5 Outline
  6. 6 Background: Distributed DL (Data Parallelism)
  7. 7 System Throughput and Impact of Batch Size
  8. 8 Statistical Efficiency and Impact of Batch Size
  9. 9 illustration of Overall Training Performance
  10. 10 Implications for Cluster Scheduling
  11. 11 Pollux Cluster Scheduler
  12. 12 Key Idea: Goodput, not Throughput
  13. 13 Modeling System Throughput
  14. 14 Modeling Statistical Efficiency
  15. 15 Optimizing Cluster-Wide Allocations
  16. 16 Evaluation of Pollux
  17. 17 Cluster-Wide Statistical Efficiency
  18. 18 More Experiments in our Paper!
  19. 19 Conclusion

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.