Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Enabling HPC and ML Workloads with Latest Kubernetes Job Features

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Explore the latest Kubernetes Job API features for running distributed Batch, AI, and HPC workloads at scale in this conference talk. Learn how Indexed Jobs simplify parallel workloads requiring pod-to-pod communication, with examples from DeepMind's distributed machine learning applications. Discover the Flux Operator's ability to orchestrate HPC workloads by creating a "Mini Cluster" within Kubernetes. Understand how Pod Failure Policy can maintain job execution despite pod disruptions while optimizing costs. Gain insights from real-world experiences at DeepMind and Lawrence Livermore National Laboratory to enhance your ability to manage complex computational workloads in Kubernetes environments.

Syllabus

Enabling HPC & ML Workloads with the Latest Kubernetes Job Features- Michał Woźniak & Vanessa Sochat

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Enabling HPC and ML Workloads with Latest Kubernetes Job Features

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.