Overview
Syllabus
Intro
Recommendation Models are important . Use cases include
Recommendation Model Architecture
High Performance Training at Meta
The Criticality of Checkpointing • Failure recovery ensure progress
Checkpoint Challenges
Check-n-Run
Checkpointing Workflow
Reducing WB with Differential Checkpointing
Approaches for Differential Checkpointing • One-Shot Differential Checkpoint . Consecutive Incremental Checkpoint - Intermittent Differential Checkpoint
Checkpoint Quantization Compress checkpoint without degrading training accuracy
Comparing Quantization Strategies . Uniform quantization . Non-uniform quantization using kmeans • Adaptive uniform quantization
Quantization Bit-width Selection
Overall Reduction
Summary
Taught by
USENIX