Optimizing vLLM for Intel CPUs and XPUs - Ray Summit 2024

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the optimization of vLLM for Intel CPUs and XPUs in this 30-minute conference talk from Ray Summit 2024. Dive into Ding Ke and Yuan Zhou's presentation on enhancing vLLM performance for Intel architectures, addressing the growing demands of GenAI inference. Gain insights into key technical advancements, challenges, and solutions encountered during the optimization process. Learn about the collaboration with the open-source community and its impact on refining approaches and accelerating progress. Examine initial performance data showcasing the efficiency improvements of vLLM on Intel hardware. Acquire valuable knowledge for developers and organizations aiming to maximize GenAI inference performance on Intel platforms. Delve into a technical perspective on hardware-specific optimizations for large language models, essential for those working on high-performance AI applications.

Syllabus

Optimizing vLLM for Intel CPUs and XPUs | Ray Summit 2024

Taught by

Anyscale

Reviews

Start your review of Optimizing vLLM for Intel CPUs and XPUs - Ray Summit 2024

Taught by

Databricks' vLLM Optimization for Cost-Effective LLM Inference - Ray Summit 2024

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

Roblox's Journey to Supporting Multimodality on vLLM - Ray Summit 2024

The State of vLLM - Advancements in LLM Inference and Serving

Achieving vLLM Platform Portability with Triton Autotuning - Ray Summit 2024

Optimizing vLLM Performance Through Quantization - Model Compression Techniques

Never Stop Learning.