Dive deep into Ray's internal scalability in this 31-minute conference talk from Anyscale. Explore how Ray powers demanding machine learning tasks like training ChatGPT at OpenAI and processing terabytes of data daily at Amazon. Gain insights into tasks, actors, objects, and nodes with concrete examples for developing scalable code that maximizes Ray's potential. Discover post-Ray 2.0 enhancements in health checks, resource broadcasting, and asynchronous actor creation. Learn about the challenges and opportunities of building an unprecedented 4000-node cluster. Understand Ray's scalability improvements since version 2.0 and its pivotal role in addressing the exponential growth of modern ML workloads.
Overview
Syllabus
Ray Scalability Deep Dive: The Journey to Support 4,000 Nodes
Taught by
Anyscale