Completed
Megatron-LM paper tensor/model parallelism
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Ultimate Guide to Scaling ML Models - Megatron-LM - ZeRO - DeepSpeed - Mixed Precision
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro to training Large ML models trillions of params!
- 2 sponsored AssemblyAI's speech transcription API
- 3 Data parallelism
- 4 Pipeline/model parallelism
- 5 Megatron-LM paper tensor/model parallelism
- 6 Splitting the MLP block vertically
- 7 Splitting the attention block vertically
- 8 Activation checkpointing
- 9 Combining data + model parallelism
- 10 Scaling is all you need and 3D parallelism
- 11 Mixed precision training paper
- 12 Single vs half vs bfloat number formats
- 13 Storing master weights in single precision
- 14 Loss scaling
- 15 Arithmetic precision matters
- 16 ZeRO optimizer paper DeepSpeed library
- 17 Partitioning is all you need?
- 18 Where did all the memory go?
- 19 Outro