Generalized Pipeline Parallelism for DNN Training - PipeDream System Overview
Databricks via YouTube
Overview
Syllabus
Intro
Model Parallelism: An alternative to data parallelism
Pipelining in DNN training != Traditional pipelining
Challenge 1: Pipelining leads to weight version mismatches
Weight stashing: A solution to version mismatches
Challenge 2: How do we assign operators to pipeline stages?
Pipe Dream vs. Data Parallelism on Time-to-Accuracy
but modern Deep Neural Networks are becoming extremely large!
Double-buffered weight updates: weight semantics
2BW has weight update semantics similar to data parallelism
Taught by
Databricks