Overview
Explore a conference talk from USENIX ATC '23 that introduces EnvPipe, an innovative DNN training framework designed to save energy without compromising performance. Learn how EnvPipe maximizes energy efficiency in multi-GPU DNN training by leveraging slack time created by pipeline parallelism bubbles. Discover the framework's approach to stretching execution time of pipeline units through SM frequency reduction, while maintaining the original accuracy of training tasks. Gain insights into EnvPipe's implementation as a PyTorch library and its impressive energy-saving results: up to 25.2% in single-node training with 4 GPUs and 28.4% in multi-node training with 16 GPUs, all while keeping performance degradation below 1%. Understand the significance of this research in addressing energy consumption challenges in data centers, particularly for DNN training and inference services.
Syllabus
USENIX ATC '23 - EnvPipe: Performance-preserving DNN Training Framework for Saving Energy
Taught by
USENIX