Multitenancy and Fairness at Scale with Kueue - A Kubernetes Batch System Case Study
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Learn about Kueue's implementation of job queueing and resource management in Kubernetes through this technical conference talk. Explore how Kueue enhances Kubernetes and ClusterAutoscaler to create an end-to-end batch system, focusing on fair resource sharing among multiple teams. Discover the v0.7 release's fair sharing feature designed for large ML platforms, allowing organizations to model team structures while maintaining high resource utilization and equitable access. Understand how the system handles quota guarantees through preemption, performance optimizations in Kueue v0.7 and Kubernetes v1.31 for high throughput, and examine real production implementation challenges. Gain insights into the design considerations for fair sharing and preemption mechanisms, along with future plans for supporting complex hierarchical structures.
Syllabus
Multitenancy and Fairness at Scale with Kueue: A Case Study - Aldo Culquicondor & Rajat Phull
Taught by
CNCF [Cloud Native Computing Foundation]