Completed
Multi-GPU Distributed Training
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Multi-GPU Fine-tuning with DDP and FSDP
Automatically move to the next video in the Classroom when playback concludes
- 1 Multi-GPU Distributed Training
- 2 Video Overview
- 3 Choosing a GPU setup
- 4 Understanding VRAM requirements in detail
- 5 Understanding Optimisation and Gradient Descent
- 6 How does the Adam optimizer work?
- 7 How the Adam optimiser affects VRAM requirements
- 8 Effect of activations, model context and batch size on VRAM
- 9 Tip for GPU setup - start with a small batch size
- 10 Reducing VRAM with LoRA and quantisation
- 11 Quality trade-offs with quantisation and LoRA
- 12 Choosing between MP, DDP or FSDP
- 13 Distributed Data Parallel
- 14 Model Parallel and Fully Sharded Data Parallel FSDP
- 15 Trade-offs with DDP and FSDP
- 16 How does DeepSpeed compare to FSDP
- 17 Using FSDP and DeepSpeed with Accelerate
- 18 Code examples for MP, DDP and FSDP
- 19 Using SSH with rented GPUs Runpod
- 20 Installation
- 21 slight detour Setting a username and email for GitHub
- 22 Basic Model Parallel MP fine-tuning script
- 23 Fine-tuning script with Distributed Data Parallel DDP
- 24 Fine-tuning script with Fully Shaded Data Parallel FSDP
- 25 Running ‘accelerate config’ for FSDP
- 26 Saving a model after FSDP fine-tuning
- 27 Quick demo of a complete FSDP LoRA training script
- 28 Quick demo of an inference script after training
- 29 Wrap up