ZeRO-Offload - Democratizing Billion-Scale Model Training

ZeRO-Offload - Democratizing Billion-Scale Model Training

USENIX via YouTube Direct link

Unique Optimal Offload Strategy

8 of 18

8 of 18

Unique Optimal Offload Strategy

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

ZeRO-Offload - Democratizing Billion-Scale Model Training

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 The Size of Deep Learning Model is increasing Quickly
  3. 3 Billon-Scale Model Training - Scale Out Large
  4. 4 Mixed-precision training
  5. 5 Limiting CPU Computation
  6. 6 Minimizing Communication Volume
  7. 7 ZeRO-Offload enables large model training , offloading data and compute to CPU
  8. 8 Unique Optimal Offload Strategy
  9. 9 ZERO-Offload Single GPU Schedule
  10. 10 ZERO-Offload Multi-GPUs Schedule
  11. 11 Optimized CPU Execution
  12. 12 Evaluation
  13. 13 Model Scale
  14. 14 Training Throughput - Single GPU
  15. 15 Training Throughput - Multiple GPUs
  16. 16 Throughput Scalability
  17. 17 One-step Delayed Parameter Update (DPU)
  18. 18 Conclusions

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.