Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Overview

Discover how to significantly reduce the cost of loading deep learning models for inference in production environments through zero-copy model loading techniques using PyTorch and Ray. Learn about storing model weights in shared memory for near-instantaneous access across processes, and explore practical code examples demonstrating implementation. Gain insights into the open-source zerocopy library, which simplifies the process of applying zero-copy model loading to PyTorch models with minimal code changes. Examine a benchmark study showcasing the performance benefits of running NLP models with stateless Ray tasks, resulting in a self-tuning model deployment that outperforms traditional Ray Serve deployments. Delve into topics such as model serving basics, loading PyTorch tensors without data copying, and implementing pre- and post-processing with Ray Serve.

Syllabus

Intro
Model Serving 101
Loading PyTorch tensors without copying data
Model inference on Ray using stateless tasks
Summary: Model inference with zero-copy loading
A simple benchmark
Pre- and post-processing with Ray Serve
Benchmark implementation
Benchmark Results

Taught by

Anyscale

Reviews

Start your review of Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

3000+ Courses from California Community Colleges

Most common

Popular subjects

Popular courses

Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Overview

Syllabus

Taught by

Reviews

3000+ Courses from California Community Colleges

Taught by

10 Best Deep Learning Courses for 2024

Never Stop Learning.