Explore a groundbreaking conference talk from OSDI '22 that introduces REEF, a revolutionary GPU-accelerated DNN inference serving system. Discover how REEF enables microsecond-scale kernel preemption and controlled concurrent execution in GPU scheduling, addressing the challenges of running both latency-critical and best-effort DNN inference tasks on GPUs. Learn about the innovative reset-based preemption scheme and dynamic kernel padding mechanism that allow REEF to achieve microsecond-scale preemption and maximize GPU utilization. Examine the evaluation results using a new DNN inference serving benchmark (DISB) and real-world trace, demonstrating REEF's ability to maintain low latency for real-time tasks while significantly increasing overall throughput. Gain insights into the potential applications of this technology in intelligent systems such as autonomous driving and virtual reality, and understand its implications for improving GPU scheduling efficiency in various domains.
Overview
Syllabus
OSDI '22 - Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences
Taught by
USENIX