Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

ZeRO-Offload - Democratizing Billion-Scale Model Training

USENIX via YouTube

Overview

Explore ZeRO-Offload, a groundbreaking technology democratizing billion-scale model training, in this 14-minute conference talk from USENIX ATC '21. Learn how this innovative approach enables training models with over 13 billion parameters on a single GPU, a tenfold increase compared to popular frameworks like PyTorch. Discover the techniques used to offload data and compute to CPU while minimizing data movement and maximizing GPU memory savings. Understand how ZeRO-Offload achieves impressive computational efficiency, scaling near-linearly on up to 128 GPUs, and its potential to work with model parallelism for even larger models. Gain insights into the unique optimal offload strategy, scheduling for single and multi-GPU setups, and optimized CPU execution. Examine evaluation results showcasing the technology's impact on model scale, training throughput, and scalability. By the end of this talk, grasp how ZeRO-Offload is making large-scale model training accessible to data scientists with limited GPU resources, potentially revolutionizing the field of deep learning.

Syllabus

Intro
The Size of Deep Learning Model is increasing Quickly
Billon-Scale Model Training - Scale Out Large
Mixed-precision training
Limiting CPU Computation
Minimizing Communication Volume
ZeRO-Offload enables large model training , offloading data and compute to CPU
Unique Optimal Offload Strategy
ZERO-Offload Single GPU Schedule
ZERO-Offload Multi-GPUs Schedule
Optimized CPU Execution
Evaluation
Model Scale
Training Throughput - Single GPU
Training Throughput - Multiple GPUs
Throughput Scalability
One-step Delayed Parameter Update (DPU)
Conclusions

Taught by

USENIX

Reviews

Start your review of ZeRO-Offload - Democratizing Billion-Scale Model Training

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.