Incremental GPU Slicing in Kubernetes Clusters - Dynamic Resource Management
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Watch a technical conference talk exploring how to implement incremental GPU slicing for large language model inference services. Learn about replacing Multi-Instance GPU managers with an open-source incremental-slicing controller to enable dynamic GPU resource allocation without requiring new APIs or device plugin modifications. Discover how GPU vendors are developing dynamic slicing capabilities that allow workloads to request fractional compute and memory units on demand, and understand the current work being done by the Kubernetes Device Management Working Group to expose these features. Gain practical insights into achieving incremental slicing in GPU clusters to optimize costs through dynamic model selection and resource utilization.
Syllabus
Incremental GPU Slicing in Action - Abhishek Malvankar & Olivier Tardieu, IBM Research
Taught by
CNCF [Cloud Native Computing Foundation]