Overview
Syllabus
Intro
Training in Distributed ML Systems
Parameters in Distributed ML Systems
Issues with Empirical Parameter Tuning
Proposals for Automatic Parameter Adaptation
Open Challenges
Existing Approaches for Adaptation
KungFu Overview
Adaptation Policies
Example: Adaptation Policy for GNS
Embedding Monitoring Inside Dataflow Problem: High monitoring cost reduces adaptation benefit Idea: Improve efficiency by adding monitoring operators to dataflow graph
Challenges of Dataflow Collective Communication
Making Collective Communication Asynchronous Idea: Use asynchronous collective communication
Issues When Adapting System Parameters
Distributed Mechanism for Parameter Adaptation
How Effectively Does KungFu Adapt?
Conclusions: Kung Fu
Taught by
USENIX