Power-aware Deep Learning Model Serving with μ-Serve

Overview

Explore power-aware deep learning model serving with μ-Serve in this 21-minute conference talk from USENIX ATC '24. Discover how researchers from the University of Illinois Urbana-Champaign and IBM Research address the challenge of reducing energy consumption in model-serving clusters while maintaining performance requirements. Learn about the benefits of GPU frequency scaling for power saving in model serving and the importance of co-designing fine-grained model multiplexing with GPU frequency scaling. Examine μ-Serve, a novel power-aware model-serving system that optimizes power consumption and performance for serving multiple ML models in a homogeneous GPU cluster. Gain insights into evaluation results showing significant power savings through dynamic GPU frequency scaling without compromising service level objectives.

Syllabus

USENIX ATC '24 - Power-aware Deep Learning Model Serving with μ-Serve

Taught by

USENIX

Reviews

Start your review of Power-aware Deep Learning Model Serving with μ-Serve

Taught by

AlpaServe - Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Klaviyo's Journey to Robust Model Serving with Ray Serve

Scaling Deep Learning Model Training

10 Best Deep Learning Courses for 2024

Never Stop Learning.