Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences

Overview

Explore a groundbreaking conference talk from OSDI '22 that introduces REEF, a revolutionary GPU-accelerated DNN inference serving system. Discover how REEF enables microsecond-scale kernel preemption and controlled concurrent execution in GPU scheduling, addressing the challenges of running both latency-critical and best-effort DNN inference tasks on GPUs. Learn about the innovative reset-based preemption scheme and dynamic kernel padding mechanism that allow REEF to achieve microsecond-scale preemption and maximize GPU utilization. Examine the evaluation results using a new DNN inference serving benchmark (DISB) and real-world trace, demonstrating REEF's ability to maintain low latency for real-time tasks while significantly increasing overall throughput. Gain insights into the potential applications of this technology in intelligent systems such as autonomous driving and virtual reality, and understand its implications for improving GPU scheduling efficiency in various domains.

Syllabus

OSDI '22 - Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences

Taught by

USENIX

Reviews

Start your review of Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences

Taught by

Never Stop Learning.