Distributed Cache Empowers AI/ML Workloads on Kubernetes Cluster

Overview

Watch a 33-minute conference talk exploring how distributed cache systems enhance AI/ML workloads in Kubernetes environments. Learn about the critical role of storage technologies in AI/ML operations, focusing on the challenges of achieving optimal read performance when moving datasets from storage to AI accelerators. Discover an innovative distributed cache system designed specifically for AI/ML workloads, implemented on a 1024+ GPUs Kubernetes cluster in a multi-tenancy setting. Explore real-world solutions developed over two years of operation, addressing challenges in I/O libraries, load balancers, and storage backends. Gain insights into achieving remarkable performance metrics, including 50+ GB/s throughput and sub-2ms latency, through practical examples and implementation strategies shared by engineers from Preferred Networks, Inc.