Explore a 14-minute conference talk from USENIX OSDI '24 that introduces nnScaler, a framework for generating efficient parallelization plans in deep learning training. Learn how nnScaler addresses limitations in existing search spaces by empowering domain experts to construct custom search spaces using three primitives: op-trans, op-assign, and op-order. Discover how this approach captures model transformation and temporal-spatial scheduling for any parallelization plan. Understand the application of constraints to prevent space explosion and how nnScaler can compose both existing and new search spaces. Examine experimental results demonstrating up to 3.5× speedup compared to solutions like DeepSpeed, Megatron-LM, and Alpa for popular DNN models such as SwinTransformer and AlphaFold2.
Overview
Syllabus
OSDI '24 - nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training
Taught by
USENIX