Accelerating AI Workloads with GPUs in Kubernetes

Overview

Explore the challenges and solutions for leveraging GPUs in Kubernetes to accelerate AI workloads in this keynote presentation. Gain insights into essential GPU resource-sharing mechanisms, flexible accelerator configuration techniques, and advanced scheduling and resource management strategies. Learn about key capabilities needed to address efficiency, configuration, extensibility, and scalability challenges in supporting next-generation AI applications on Kubernetes. Discover the potential for Kubernetes to become the leading platform for accelerated AI/ML in the cloud, drawing parallels to Linux's dominance in the datacenter. Understand current supported capabilities and areas for improvement in scaling multi-node AI/ML jobs in large production clusters.