AI and ML: The Critical Operational Side of Running Applications in Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the operational challenges and solutions for running AI and ML applications in this 28-minute conference talk from CNCF. Dive into critical aspects of managing compute resources, GPU workloads, and maintaining reliability while ensuring proper dataset separation and training process isolation. Learn how implementing a service mesh can address real-world ML application challenges, streamline operations, and enhance observability. Follow along as Principal Rob Koch demonstrates practical implementations using Linkerd with multiple Kubernetes clusters, covering essential topics like IPv6 integration, GPU utilization, multitenancy considerations, and scaling strategies for ML deployments.
Syllabus
AI and ML: Let’s Talk About the Boring (yet Critical!) Operational Side- Rob Koch & Milad Vafaeifard
Taught by
CNCF [Cloud Native Computing Foundation]