Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Anyscale via YouTube

Overview

Discover how to significantly reduce the cost of loading deep learning models for inference in production environments through zero-copy model loading techniques using PyTorch and Ray. Learn about storing model weights in shared memory for near-instantaneous access across processes, and explore practical code examples demonstrating implementation. Gain insights into the open-source zerocopy library, which simplifies the process of applying zero-copy model loading to PyTorch models with minimal code changes. Examine a benchmark study showcasing the performance benefits of running NLP models with stateless Ray tasks, resulting in a self-tuning model deployment that outperforms traditional Ray Serve deployments. Delve into topics such as model serving basics, loading PyTorch tensors without data copying, and implementing pre- and post-processing with Ray Serve.

Syllabus

Intro
Model Serving 101
Loading PyTorch tensors without copying data
Model inference on Ray using stateless tasks
Summary: Model inference with zero-copy loading
A simple benchmark
Pre- and post-processing with Ray Serve
Benchmark implementation
Benchmark Results

Taught by

Anyscale

Reviews

Start your review of Zero-Copy Model Loading with Ray and PyTorch for Efficient Deep Learning Inference

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.