Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Accelerated LLM Inference with Anyscale - Ray Summit 2024

Anyscale via YouTube

Overview

Explore cutting-edge advancements in LLM inference optimization and scalability in this 30-minute conference talk from Ray Summit 2024. Dive into Anyscale's latest enterprise and production features for accelerated LLM inference, presented by Co-Founder and CTO Philipp Moritz and Cody Yu. Learn about the team's collaborative efforts with the vLLM open-source project, including key improvements such as FP8 support, chunked prefill, multi-step decoding, and speculative decoding. Discover how these optimizations have doubled both throughput and latency efficiency in vLLM. Gain insights into Anyscale-specific enhancements, including custom kernels, batch inference optimizations, and accelerated large model loading for autoscaling deployments. Essential viewing for those interested in state-of-the-art techniques for improving LLM inference efficiency and scalability in enterprise and production environments.

Syllabus

Accelerated LLM Inference with Anyscale | Ray Summit 2024

Taught by

Anyscale

Reviews

Start your review of Accelerated LLM Inference with Anyscale - Ray Summit 2024

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.