Overview
Syllabus
Multi-GPU Distributed Training
Video Overview
Choosing a GPU setup
Understanding VRAM requirements in detail
Understanding Optimisation and Gradient Descent
How does the Adam optimizer work?
How the Adam optimiser affects VRAM requirements
Effect of activations, model context and batch size on VRAM
Tip for GPU setup - start with a small batch size
Reducing VRAM with LoRA and quantisation
Quality trade-offs with quantisation and LoRA
Choosing between MP, DDP or FSDP
Distributed Data Parallel
Model Parallel and Fully Sharded Data Parallel FSDP
Trade-offs with DDP and FSDP
How does DeepSpeed compare to FSDP
Using FSDP and DeepSpeed with Accelerate
Code examples for MP, DDP and FSDP
Using SSH with rented GPUs Runpod
Installation
slight detour Setting a username and email for GitHub
Basic Model Parallel MP fine-tuning script
Fine-tuning script with Distributed Data Parallel DDP
Fine-tuning script with Fully Shaded Data Parallel FSDP
Running ‘accelerate config’ for FSDP
Saving a model after FSDP fine-tuning
Quick demo of a complete FSDP LoRA training script
Quick demo of an inference script after training
Wrap up
Taught by
Trelis Research