Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Distributed Multi-Node Model Inference Using the LeaderWorkerSet API

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Learn about multi-node model inference deployment in this technical conference talk from KubeCon that explores the LeaderWorkerSet API for managing large language models across distributed systems. Dive into the challenges of deploying computationally intensive LLMs like Gemini, Claude, and GPT4, which are too large for single GPU/TPU devices and require multi-node server deployment solutions. Explore how the new Kubernetes API enables efficient orchestration of state-of-the-art model servers including vLLM and JetStream across both GPU and TPU infrastructures. Master practical approaches to handling distributed processes across multiple nodes while maximizing accelerator memory utilization for optimal model performance and response times.

Syllabus

Distributed Multi-Node Model Inference Using the LeaderWorkerSet API- Abdullah Gharaibeh, Rupeng Liu

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Distributed Multi-Node Model Inference Using the LeaderWorkerSet API

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.