Habitat - A Runtime-Based Computational Performance Predictor for Deep Neural Network Training

Overview

Explore a runtime-based computational performance predictor for deep neural network training in this USENIX ATC '21 conference talk. Dive into the challenges of selecting GPUs for DNN training, considering both performance and cost factors. Learn about Habitat, a new technique that leverages existing GPUs to make informed predictions about performance on other GPU options. Understand the concept of wave scaling and its application in predicting execution times across different GPU architectures. Discover how Habitat achieves accurate iteration execution time predictions for various DNN models across multiple GPU architectures. Gain insights into the implementation of Habitat as a Python library supporting PyTorch, and explore its potential for helping researchers and practitioners make cost-efficient GPU selections for their deep learning projects.

Syllabus

Intro
What this talk is about The problem: • Many GPUs available for deep neural network (DNN) training . Each has a different cost and performance
A Cambrian explosion in hardware for training
Choosing a GPU: The paradox of choice
Key observations • Deep learning users may already have an existing GPU
Habitat: A runtime-based performance predictor
One last wrinkle: Kernel-varying operations Wave scaling assumes the same kernel is used across GPUS
Evaluation
How accurate is Habitat?
Rent a GPU in the cloud? Scenario: Want to train GNMT, have access to a P4000. Which cloud GPU to use, if any?
Key takeaways . DNN computation is special (repetitive), enabling new analysis opportunities

Taught by

USENIX

Reviews

Start your review of Habitat - A Runtime-Based Computational Performance Predictor for Deep Neural Network Training

Taught by

EnvPipe - Performance-Preserving DNN Training Framework for Saving Energy

Centimani - Enabling Fast AI Accelerator Selection for DNN Training

ZeRO-Offload - Democratizing Billion-Scale Model Training

Refurbish Your Training Data - Reusing Partially Augmented Samples for Faster Deep Neural Network Training

10 Best Python Courses for 2024: Charming the Snake

Never Stop Learning.