Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Themis - Fair and Efficient GPU Cluster Scheduling

USENIX via YouTube

Overview

Explore a 22-minute conference talk from USENIX NSDI '20 that introduces Themis, a novel scheduling framework for GPU cluster management in distributed machine learning workloads. Dive into the challenges of fair and efficient GPU allocation across multiple ML jobs, and discover how Themis addresses these issues through a unique two-level scheduling architecture. Learn about the concept of finish-time fairness and how it's implemented using an auction-based resource allocation system. Examine the framework's performance compared to existing schedulers, with insights on improved fairness and cluster efficiency. Gain valuable knowledge on GPU cluster scheduling, resource allocation strategies, and the specific needs of ML training workloads in shared environments.

Syllabus

Intro
Deep Learning at a Large Enterprise
GPU Cluster Scheduler: Goal
Existing GPU Cluster Schedulers
GPU Cluster Scheduler: Drawback 2
GPU Cluster Scheduler: Requirement 2
Towards a new GPU Cluster Scheduler
Themis: Metric
Themis: Finish-Time Fairness Metric
Themis: Interface
Strawman Mechanism: Issues
Themis: Mechanism: Partial Allocation Auction
Themis: Overall Design
Themis: Implementation
Themis: Evaluation
Macrobenchmark: Sharing Incentive
Macrobenchmark: Efficiency
Conclusion

Taught by

USENIX

Reviews

Start your review of Themis - Fair and Efficient GPU Cluster Scheduling

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.