Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Optimize Your AI Cloud Infrastructure: A Hardware Perspective

Linux Foundation via YouTube

Overview

Explore the intricacies of GPU Cloud infrastructure optimization in this technical conference talk that delves deep into hardware-level considerations for AI systems. Learn how to fine-tune various machine learning models using an H100 Cluster, with detailed analysis of critical components like POD Scheduler, Device Plugin, GPU/NUMA topology, and ROCE/NCCL Stack. Gain valuable insights from first-hand experimental results demonstrating the relationship between model performance and device operator configurations in nodes, focusing particularly on CNN, RNN, and Transformer models from MLPerf. Master the often-overlooked hardware aspects of AI infrastructure that can significantly impact distributed machine learning performance and efficiency.

Syllabus

Optimize Your AI Cloud Infrastructure: A Hardware Perspective - Liang Yan, CoreWeave

Taught by

Linux Foundation

Reviews

Start your review of Optimize Your AI Cloud Infrastructure: A Hardware Perspective

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.