State of Ray Serve in 2.0 - Features and Updates for Multi-model Inference

Overview

Explore the latest developments in Ray Serve 2.0 in this 32-minute conference talk from Anyscale. Discover the motivation behind Ray Serve, its user base, and the reasons for its adoption. Gain insights into recent features and updates, including a working example of content understanding and its architecture. Learn about the requirements for online inference and how Ray Serve addresses multi-model inference challenges. Delve into model composition requirements and the solution provided by the Model Composition API. Understand various model composition patterns and how Ray Serve implements them. Examine autoscaling capabilities for ML models and production hardening techniques. Finally, learn about chaos testing methods used to achieve 99.99% uptime, ensuring robust and reliable performance in real-world applications.

Syllabus

Intro
Working Example: Content Understanding
Content Understanding Architecture
Requirements for Online Inference
Basic Solution: Multi-model Monolith
Ray Serve is built for Multi-model Inference
Model Composition Requirements
Solution: Model Composition API
Model Composition Pattern
Ray Serve Model Composition API
Autoscaling for ML Models
Production Hardening
Chaos Testing: 99.99% uptime

Taught by

Anyscale

Reviews

Start your review of State of Ray Serve in 2.0 - Features and Updates for Multi-model Inference

Taught by

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.