Explore a 39-minute conference talk that delves into NVIDIA's GeForce NOW infrastructure, a game streaming platform serving over 20 million gamers worldwide. Learn how Kubernetes forms the backbone of their infrastructure, managing game workloads and containerized services across regional clusters equipped with tens of thousands of GPUs. Discover the innovative GPU maintenance API developed by NVIDIA for automated lifecycle management, enabling coordinated driver updates, GPU maintenance, and Kubernetes upgrades at massive scale. Gain insights into how KubeVirt and Kubernetes power GeForce NOW, understand the implementation of self-healing operators for failure detection and remediation, and explore NVIDIA's strategic approach to automated GPU maintenance in Kubernetes environments.
All Your GPUs Are Belong to Us: An Inside Look at NVIDIA's Self-Healing GeForce NOW Infrastructure
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Syllabus
All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA's Self-H... Ryan Hallisey & Piotr Prokop PL
Taught by
CNCF [Cloud Native Computing Foundation]