Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore power-aware deep learning model serving with μ-Serve in this 21-minute conference talk from USENIX ATC '24. Discover how researchers from the University of Illinois Urbana-Champaign and IBM Research address the challenge of reducing energy consumption in model-serving clusters while maintaining performance requirements. Learn about the benefits of GPU frequency scaling for power saving in model serving and the importance of co-designing fine-grained model multiplexing with GPU frequency scaling. Examine μ-Serve, a novel power-aware model-serving system that optimizes power consumption and performance for serving multiple ML models in a homogeneous GPU cluster. Gain insights into evaluation results showing significant power savings through dynamic GPU frequency scaling without compromising service level objectives.