Overview
Explore the latest developments in Ray Serve 2.0 in this 32-minute conference talk from Anyscale. Discover the motivation behind Ray Serve, its user base, and the reasons for its adoption. Gain insights into recent features and updates, including a working example of content understanding and its architecture. Learn about the requirements for online inference and how Ray Serve addresses multi-model inference challenges. Delve into model composition requirements and the solution provided by the Model Composition API. Understand various model composition patterns and how Ray Serve implements them. Examine autoscaling capabilities for ML models and production hardening techniques. Finally, learn about chaos testing methods used to achieve 99.99% uptime, ensuring robust and reliable performance in real-world applications.
Syllabus
Intro
Working Example: Content Understanding
Content Understanding Architecture
Requirements for Online Inference
Basic Solution: Multi-model Monolith
Ray Serve is built for Multi-model Inference
Model Composition Requirements
Solution: Model Composition API
Model Composition Pattern
Ray Serve Model Composition API
Autoscaling for ML Models
Production Hardening
Chaos Testing: 99.99% uptime
Taught by
Anyscale