Unlocking the Future of GPU Scheduling in Kubernetes with Reinforcement Learning
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore a conference talk that delves into innovative GPU scheduling solutions for Kubernetes using Reinforcement Learning, presented by experts from Adobe Systems and Disney Plus Hotstar. Learn how to address critical challenges in scaling Multi GPU setups for large-scale machine learning projects, focusing on common issues like resource fragmentation and low utilization that impact performance and costs. Discover why Reinforcement Learning emerges as an ideal solution, with its unique capability to adapt to dynamic environments and handle complex, multi-dimensional objectives within Kubernetes clusters. Gain valuable insights into the current GPU scheduling landscape, examine cutting-edge RL algorithms for scheduling, and understand their implementation in Kubernetes, including potential applications of Reinforcement Learning with Human Feedback (RLHF).
Syllabus
Unlocking the Future of GPU Scheduling in Kubernetes with Reinforcemen... Nikunj Goyal & Aditi Gupta
Taught by
CNCF [Cloud Native Computing Foundation]