Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Ray to 10,000 NPUs - Huawei's Hyperscale Journey

Anyscale via YouTube

Overview

Explore a technical conference talk from Ray Summit 2024 where Huawei engineers Boyuan Chen, Chong Yin Tan, and Xiaoshuang Liu present their groundbreaking journey of integrating 10,000 Ascend NPUs into a Ray cluster. Discover the technical challenges and innovative solutions developed while migrating existing business cases to Ray and implementing Huawei Ascend NPU support. Learn about their custom full-stack Ray-observability engine designed for debugging and optimizing massive clusters, and understand the implementation of seamless NPU and GPU task scheduling within the same infrastructure. Gain valuable insights into strategies for maximizing resource utilization and maintaining stability in large-scale AI deployments, including the successful migration of a hyperscale inference pipeline to Ray. Perfect for organizations and engineers interested in scaling distributed computing and AI infrastructure to unprecedented levels.

Syllabus

Scaling Ray to 10K NPUs: Huawei's Hyperscale Journey | Ray Summit 2024

Taught by

Anyscale

Reviews

Start your review of Scaling Ray to 10,000 NPUs - Huawei's Hyperscale Journey

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.