Completed
Checkpoint Challenges
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Check-N-Run - A Checkpointing System for Training Deep Learning Recommendation Models
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Recommendation Models are important . Use cases include
- 3 Recommendation Model Architecture
- 4 High Performance Training at Meta
- 5 The Criticality of Checkpointing • Failure recovery ensure progress
- 6 Checkpoint Challenges
- 7 Check-n-Run
- 8 Checkpointing Workflow
- 9 Reducing WB with Differential Checkpointing
- 10 Approaches for Differential Checkpointing • One-Shot Differential Checkpoint . Consecutive Incremental Checkpoint - Intermittent Differential Checkpoint
- 11 Checkpoint Quantization Compress checkpoint without degrading training accuracy
- 12 Comparing Quantization Strategies . Uniform quantization . Non-uniform quantization using kmeans • Adaptive uniform quantization
- 13 Quantization Bit-width Selection
- 14 Overall Reduction
- 15 Summary