A Scalable Platform for Training and Inference Using Kubeflow at CERN
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore a conference talk that delves into the implementation of a Kubeflow-based machine learning platform at CERN. Learn how this scalable platform handles various stages of the machine learning workflow, including data preparation, interactive analysis, distributed training, and inference. Discover the requirements for model repositories, versioning, and metadata in the context of cloud-native technologies for modern ML and AI workloads. Gain insights from a real-world use case from the ATLAS experiment at CERN, which demonstrates the benefits and challenges of utilizing cloud-native technologies in scientific research. Understand how machine learning is being deployed in production for simulation, anomaly detection, and physics analysis at CERN, and how Kubeflow is addressing the evolving needs of complex scientific computing environments.
Syllabus
A Scalable Platform for Training and Inference Using Kubeflow at CERN -Philipp Gadow, Diana Gaponcic
Taught by
CNCF [Cloud Native Computing Foundation]