SHEPHERD - Serving DNNs in the Wild

Overview

Explore a groundbreaking model serving system called SHEPHERD in this 15-minute conference talk from NSDI '23. Discover how SHEPHERD tackles the challenges of scalability, high system goodput, and maximum resource utilization across compute units for inference requests in interactive web services. Learn about its innovative two-level design that separates planning and serving modules, leveraging request stream aggregation for improved predictability and resource utilization. Understand the novel online algorithm employed by SHEPHERD for guaranteed goodput under unpredictable workloads, utilizing preemptions and model-specific batching properties. Gain insights into the system's performance, which achieves up to 18.1X higher goodput and 1.8X better utilization compared to prior state-of-the-art solutions, while scaling to hundreds of workers.

Syllabus

NSDI '23 - SHEPHERD: Serving DNNs in the Wild

Taught by

USENIX

Reviews

Start your review of SHEPHERD - Serving DNNs in the Wild

Taught by

Bamboo - Making Preemptible Instances Resilient for Affordable Training of Large DNNs

Data-Parallel Actors - A Programming Model for Scalable Query Serving Systems

Clipper - A Low-Latency Online Prediction Serving System

Nu - Achieving Microsecond-Scale Resource Fungibility with Logical Processes

ExoPlane - An Operating System for On-Rack Switch Resource Augmentation

Serving DNNs like Clockwork - Performance Predictability from the Bottom Up

Never Stop Learning.