Uber's GenAI Leap: Batch Predictions Using Ray and vLLM - Ray Summit 2024

Overview

Explore Uber's innovative approach to large-scale Generative AI batch prediction in this Ray Summit 2024 presentation. Learn how Uber integrates Ray and vLLM within their Michelangelo machine learning platform to enhance GenAI application development. Discover how this new method addresses limitations in traditional Spark-based approaches, particularly for GPU-intensive tasks. Gain insights into the architecture of Uber's new system, its integration with Kubernetes and Michelangelo's LLM evaluation workflow, and its application to various Uber services. Understand the benchmarking results and lessons learned from developing and implementing this solution. Acquire valuable knowledge for scaling Generative AI capabilities, leveraging Ray and vLLM to improve prediction tasks, reduce latency, and enhance overall GenAI performance.

Syllabus

Uber's GenAI Leap: Batch Predictions Using Ray and vLLM | Ray Summit 2024

Taught by

Anyscale

Reviews

Start your review of Uber's GenAI Leap: Batch Predictions Using Ray and vLLM - Ray Summit 2024

Taught by

Databricks' vLLM Optimization for Cost-Effective LLM Inference - Ray Summit 2024

Optimizing vLLM for Intel CPUs and XPUs - Ray Summit 2024

Roblox's Journey to Supporting Multimodality on vLLM - Ray Summit 2024

Enabling End-to-End LLMOps on Michelangelo with Ray

Uber's Michelangelo: Strategic AI Overhaul and Impact - MLOps Podcast

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

9 Best Kubernetes Courses for 2024

Never Stop Learning.