Democratizing Machine Learning on Kubernetes

Overview

Explore the democratization of machine learning on Kubernetes in this 38-minute Docker conference talk. Learn about data and model parallelization, distributed flow training, and TensorFlow tools. Discover training environments, model performance, and distributed training results. Examine compute-to-communication ratios and other observations. Investigate potential improvements, including Uber's cluster performance, FreeFlow on CNI, GPU resource scheduling, and Fast AI. Gain insights into the importance of making machine learning more accessible and efficient on Kubernetes platforms.

Syllabus

Introduction
Who are we
Why is this important
Data Parallelization
Model Parallelization
Distributed Flow Training
Tensorflow Tools
Demos
Training Environment
Model Performance
Distributed Training
Distributed Training Results
Compute to Communication Ratio
Other Observations
How Can We Improve
Ubers
Cluster Performance
FreeFlow on CNI
GPU Resource Scheduler
Fast AI