Better Together - Jointly Optimizing ML Collective Scheduling and Execution Planning Using SYNDICATE

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore a 22-minute conference talk from NSDI '23 that introduces SYNDICATE, a novel framework designed to optimize communication bottlenecks in large-scale machine learning training. Delve into the challenges posed by emerging ML training deployments, including larger models and hybrid-parallel training techniques. Discover how SYNDICATE addresses these issues through a new abstraction called "motif" and joint optimization of scheduling and execution planning. Learn about the framework's ability to break down large communication tasks into smaller pieces, maximizing network utilization and overlap with compute. Understand how this approach can significantly improve the speed of training state-of-the-art large models by 21-74%. Gain insights into the future of ML training optimization and its potential impact on the field of machine learning and distributed systems.

Syllabus

NSDI '23 - Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning...

Taught by

USENIX

Reviews

Start your review of Better Together - Jointly Optimizing ML Collective Scheduling and Execution Planning Using SYNDICATE

Taught by

TACCL - Guiding Collective Algorithm Synthesis Using Communication Sketches

MLaaS in the Wild - Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters

TopoOpt - Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks

Zeus - Understanding and Optimizing GPU Energy Consumption of DNN Training

On Modular Learning of Distributed Systems for Predicting End-to-End Latency

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.