Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Distributed Cache Empowers AI/ML Workloads on Kubernetes Cluster

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Watch a 33-minute conference talk exploring how distributed cache systems enhance AI/ML workloads in Kubernetes environments. Learn about the critical role of storage technologies in AI/ML operations, focusing on the challenges of achieving optimal read performance when moving datasets from storage to AI accelerators. Discover an innovative distributed cache system designed specifically for AI/ML workloads, implemented on a 1024+ GPUs Kubernetes cluster in a multi-tenancy setting. Explore real-world solutions developed over two years of operation, addressing challenges in I/O libraries, load balancers, and storage backends. Gain insights into achieving remarkable performance metrics, including 50+ GB/s throughput and sub-2ms latency, through practical examples and implementation strategies shared by engineers from Preferred Networks, Inc.

Syllabus

Distributed Cache Empowers AI/ML Workloads on Kubernetes Cluster - Yuichiro Ueno & Toru Komatsu

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Distributed Cache Empowers AI/ML Workloads on Kubernetes Cluster

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.