Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Increasing GPU Utilization on Kubernetes Clusters for AI/ML Workloads

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Explore strategies for optimizing GPU utilization in large-scale Kubernetes clusters dedicated to AI/ML workloads in this informative conference talk. Learn how to maximize the efficiency of 10,000 A100 GPUs across 20 on-premises Kubernetes clusters through various open-source solutions. Discover hardware-level optimizations like NVIDIA MIG, scheduler improvements with Volcano, application-layer enhancements using PaddlePaddle for smarter training job distribution, and multi-cluster management with Armada. Gain valuable insights into pitfalls, best practices, and recommendations based on real-world experiences from four large-scale projects completed in Q4 2023. Enhance your understanding of complex GPU optimization setups and their practical implementation in AI/ML environments.

Syllabus

Increasing GPU Utilisation on K8s Clusters Dedicated for AI/ML Workloads

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Increasing GPU Utilization on Kubernetes Clusters for AI/ML Workloads

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.