Precision Matters: Scheduling GPU Workloads on Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore a comprehensive conference talk on optimizing GPU workload scheduling in Kubernetes environments. Dive into Uber's approach to supporting AI/ML workloads, including LLM training, with GPU acceleration. Learn about implementing NVidia device plugins for GPU resource management, utilizing cadvisor for GPU metrics, and employing scheduler plugins for efficient workload distribution across heterogeneous clusters. Discover strategies for handling different GPU SKUs, implementing precise scheduling algorithms, and balancing load-aware scheduling with bin packing techniques. Gain insights into future developments, including fractional GPU support, topology-aware scheduling, and expanding support for various GPU providers like AMD and Intel.
Syllabus
Precision Matters: Scheduling GPU Workloads on Kubernetes - Amit Kumar & Gaurav Kumar, Uber
Taught by
CNCF [Cloud Native Computing Foundation]