Overview
Explore a comprehensive conference talk on building a multi-cloud machine learning platform using Kubernetes for autonomous vehicle development. Discover how Momenta manages training data across diverse environments, addresses multi-user and gang scheduling challenges, and supports heterogeneous hardware. Learn about the intricacies of training ML models in on-premises regions and public clouds with varying GPUs and network interfaces like Infiniband and RoCE. Gain insights into the critical role of hardware-accelerated machine learning in solving autonomous vehicle challenges such as tracking and classification. Delve into the strategies employed to overcome the complexities of multi-cloud environments and optimize ML workflows for enhanced efficiency and performance.
Syllabus
Multi-Cloud Machine Learning Data and Workflow with Kubernetes - Lei Xue & Fei Xue
Taught by
Linux Foundation