Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling LLM Inference - AWS Inferentia Meets Ray Serve on EKS

Anyscale via YouTube

Overview

Learn how to achieve high-performance, cost-effective inference for large language models in this 13-minute conference talk from Ray Summit 2024. Explore the powerful combination of Ray Serve and AWS Inferentia on Amazon EKS for deploying models like Llama2 and Mistral-7B. Follow along as speakers Vara Bonthu and Ratnopam Chakrabarti demonstrate building scalable inference infrastructure that overcomes GPU availability limitations. Discover how integrating Ray Serve, AWS Neuron SDK, and Karpenter autoscaler on Amazon EKS creates a flexible environment for AI workloads. Master strategies for optimizing costs while maintaining high performance, and gain insights into deploying and scaling advanced language models in production environments.

Syllabus

Scaling LLM Inference: AWS Inferentia Meets Ray Serve on EKS | Ray Summit 2024

Taught by

Anyscale

Reviews

Start your review of Scaling LLM Inference - AWS Inferentia Meets Ray Serve on EKS

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.