Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve

Overview

Explore the collaboration between Ray Serve and NVIDIA Triton Inference Server in this conference talk from Ray Summit 2024. Learn about the new Python API for Triton Inference Server and its seamless integration with Ray Serve applications. Discover how this partnership enhances capabilities for scaling inference deployments, combining the strengths of both open-source platforms. Gain insights into improving ML model performance through a stable diffusion demo and understand the benefits of utilizing Triton's advanced optimization tools like Performance and Model Analyzer. See how to fine-tune model configurations based on specific throughput and latency requirements, empowering you to optimize your inference deployments effectively.

Syllabus

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024

Taught by

Anyscale

Reviews

Start your review of Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve

Taught by

Develop Custom Object Detection Models with NVIDIA and Azure Machine Learning

High-Performance AI Model Serving with Ray Serve - A Rubrik Case Study

Scaling LLM Inference - AWS Inferentia Meets Ray Serve on EKS

Accelerated LLM Inference with Anyscale - Ray Summit 2024

Building Intelligent AI Infrastructure with ORI - Dynamic Query Routing and Model Management

Implementing Real-time Vision AI Apps Using NVIDIA DeepStream SDK

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.