Multi-Instance GPU Deployment for Machine Learning and Particle Beam Simulations at CERN

Overview

Learn about CERN's implementation of Multi Instance GPU (MIG) capabilities in a 28-minute conference talk that explores the deployment of NVIDIA A100 GPUs in their private cloud infrastructure. Discover various deployment models including PCI passthrough and virtual GPUs, understanding their advantages and challenges in supporting diverse applications from machine learning to proton beam simulations. Gain insights into how CERN manages centrally managed GPU resources to serve their user community, with detailed explanations of different deployment approaches and their final chosen model. Follow along as speaker Ulrich Schwickerath breaks down the technical aspects of GPU provisioning, stability considerations, and future development plans for CERN's cloud infrastructure.

Syllabus

Introduction
Outline
Cloud Infrastructure
GPU jobs
GPU types
GPU deployment models
GPU deployment overview
GPU provisioning
Virtual GPUs
A100s
Puppet module
Use cases
Stability issues
Future plans
Conclusions
Questions
Recommendation

Taught by

OpenInfra Foundation

Reviews

Start your review of Multi-Instance GPU Deployment for Machine Learning and Particle Beam Simulations at CERN

Taught by

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.