Overview
Explore the integration of Ray, Google Kubernetes Engine (GKE), and ML accelerators for developing scalable advanced machine learning systems in this 34-minute conference talk. Dive into the challenges of scaling complex ML workloads, including expansive training of large language models and intricate distribution of reinforcement learning systems. Learn about Ray's scalable APIs, its integration with GKE and ML accelerators like tensor processing units (TPUs), and discover how this powerful combination has been applied to LLMs and reimplementing the Muzero reinforcement learning algorithm. Gain insights into overcoming computational demands and complex synchronization issues in distributed ML environments.
Syllabus
Scalable advanced ML systems with Ray, Google Kubernetes Engine, and ML accelerators
Taught by
Google Cloud Tech