Ultimate Guide to Scaling ML Models - Megatron-LM - ZeRO - DeepSpeed - Mixed Precision

Ultimate Guide to Scaling ML Models - Megatron-LM - ZeRO - DeepSpeed - Mixed Precision

Aleksa Gordić - The AI Epiphany via YouTube Direct link

Megatron-LM paper tensor/model parallelism

5 of 19

5 of 19

Megatron-LM paper tensor/model parallelism

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Ultimate Guide to Scaling ML Models - Megatron-LM - ZeRO - DeepSpeed - Mixed Precision

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro to training Large ML models trillions of params!
  2. 2 sponsored AssemblyAI's speech transcription API
  3. 3 Data parallelism
  4. 4 Pipeline/model parallelism
  5. 5 Megatron-LM paper tensor/model parallelism
  6. 6 Splitting the MLP block vertically
  7. 7 Splitting the attention block vertically
  8. 8 Activation checkpointing
  9. 9 Combining data + model parallelism
  10. 10 Scaling is all you need and 3D parallelism
  11. 11 Mixed precision training paper
  12. 12 Single vs half vs bfloat number formats
  13. 13 Storing master weights in single precision
  14. 14 Loss scaling
  15. 15 Arithmetic precision matters
  16. 16 ZeRO optimizer paper DeepSpeed library
  17. 17 Partitioning is all you need?
  18. 18 Where did all the memory go?
  19. 19 Outro

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.