Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Infrastructure Software for Hyperscale AI and GPU Clusters

OpenInfra Foundation via YouTube

Overview

Explore a 31-minute conference talk that delves into the infrastructure challenges and solutions for large-scale AI deployments using GPU clusters. Learn about Moreh's MoAI platform, which addresses critical infrastructure-level challenges in the AI industry including parallelization, cluster scalability, performance portability, orchestration of heterogeneous accelerators, and failover mechanisms. Discover how MoAI provides virtual devices that can seamlessly handle multi-billion or multi-trillion parameter models without complex manual parallelization, making LLM training more accessible. Understand the platform's efficient GPU allocation system that improves infrastructure utilization, its unified software interface for heterogeneous GPUs, and built-in fault tolerance through checkpoint-recovery technology. Presented by Gangwon Jo, this talk offers valuable insights for organizations looking to scale their AI infrastructure effectively while supporting both NVIDIA and non-NVIDIA accelerators.

Syllabus

Infrastructure Software for Hyperscale AI and GPU Clusters

Taught by

OpenInfra Foundation

Reviews

Start your review of Infrastructure Software for Hyperscale AI and GPU Clusters

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.