Vivo's AI Computing Platform on Kubernetes

Overview

Explore Vivo's AI computing platform built on Kubernetes, addressing challenges in resource scheduling for large-scale distributed model training and serving, and achieving high utilization of GPU resources. Learn how this platform, serving hundreds of engineers and researchers, manages thousands of GPU nodes, deploys hundreds of services, and runs numerous ML jobs daily. Discover the implementation using Kubernetes, kube-batch, Kubeflow, and other open-source software, along with insights into encountered issues, best practices, and contributions to the open-source community. Gain valuable knowledge about efficiently managing AI workloads and optimizing resource utilization in a production environment for one of the world's largest smartphone companies.