Incremental GPU Slicing in Kubernetes Clusters - Dynamic Resource Management

Overview

Watch a technical conference talk exploring how to implement incremental GPU slicing for large language model inference services. Learn about replacing Multi-Instance GPU managers with an open-source incremental-slicing controller to enable dynamic GPU resource allocation without requiring new APIs or device plugin modifications. Discover how GPU vendors are developing dynamic slicing capabilities that allow workloads to request fractional compute and memory units on demand, and understand the current work being done by the Kubernetes Device Management Working Group to expose these features. Gain practical insights into achieving incremental slicing in GPU clusters to optimize costs through dynamic model selection and resource utilization.

Syllabus

Incremental GPU Slicing in Action - Abhishek Malvankar & Olivier Tardieu, IBM Research

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Incremental GPU Slicing in Kubernetes Clusters - Dynamic Resource Management

Taught by

Load-Aware GPU Fractioning for LLM Inference on Kubernetes

GPU Sharing and Container Device Interface in Kubernetes Device Plugins

Unleashing the Power of Dynamic Resource Allocation for Just-in-Time GPU Slicing

Which GPU Sharing Strategy Is Right for You? A Comprehensive Benchmark Study Using Dynamic Resource Allocation

GPU Sharing and Container Device Interface in Kubernetes Device Plugins

GPU Configuration on the Fly Using Dynamic Resource Allocation - A Tale of Two Drivers

9 Best Kubernetes Courses for 2024

Never Stop Learning.