Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Incremental GPU Slicing in Kubernetes Clusters - Dynamic Resource Management

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Watch a technical conference talk exploring how to implement incremental GPU slicing for large language model inference services. Learn about replacing Multi-Instance GPU managers with an open-source incremental-slicing controller to enable dynamic GPU resource allocation without requiring new APIs or device plugin modifications. Discover how GPU vendors are developing dynamic slicing capabilities that allow workloads to request fractional compute and memory units on demand, and understand the current work being done by the Kubernetes Device Management Working Group to expose these features. Gain practical insights into achieving incremental slicing in GPU clusters to optimize costs through dynamic model selection and resource utilization.

Syllabus

Incremental GPU Slicing in Action - Abhishek Malvankar & Olivier Tardieu, IBM Research

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Incremental GPU Slicing in Kubernetes Clusters - Dynamic Resource Management

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.