Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling LLMs on Google Cloud - Synergy Between Ray, TPU, and GKE

Anyscale via YouTube

Overview

Watch a technical conference talk from Ray Summit 2024 where Google engineers Fanhai Lu and Richard Liu present an advanced serving stack for deploying Large Language Models (LLMs) at scale. Learn how to overcome key LLM deployment challenges by combining Ray's distributed computing capabilities with TPU acceleration and Google Kubernetes Engine (GKE) orchestration. Discover architectural strategies for optimizing latency and throughput, managing hardware memory constraints, and scaling cloud compute resources in production environments. Gain practical insights from real-world deployments of models like Llama 3 and explore best practices for implementing GenAI solutions on Google Cloud Platform using XLA+TPUs for computation, Ray for multi-host deployments, and GKE for TPU pod slice orchestration.

Syllabus

Scaling LLMs on Google Cloud: Synergy Between Ray, TPU, and GKE | Ray Summit 2024

Taught by

Anyscale

Reviews

Start your review of Scaling LLMs on Google Cloud - Synergy Between Ray, TPU, and GKE

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.