Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Building and Managing a Centralized ML Platform with Kubeflow at CERN

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Explore the journey of building and managing a centralized machine learning platform using Kubeflow at CERN in this 31-minute conference talk. Discover how CERN leverages ML solutions for various challenges, including particle classification, simulation data generation, and beam calibration. Learn about the recently introduced centralized service that handles data preparation, model training, and serving while optimizing resource usage for different types of accelerators. Gain insights into CERN's experience with Kubeflow on Kubernetes, their integration of on-premises resources, and potential extensions to public clouds. Delve into topics such as cluster layout, deployment strategies, integrations, and automation of distributed training. Witness a demo of job submission and results, and understand the motivations behind CERN's ML platform development.

Syllabus

Introduction
Introductions
What is CERN
Motivation for our service
Reconstruction
Simulations
Goals
Platform
Cluster Layout
Deployment
Integrations
Issues
Burst to Public Clouds
Automating Distributed Training
Service Dashboard
Demo
Submitting jobs
Results
Closing remarks

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Building and Managing a Centralized ML Platform with Kubeflow at CERN

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.