KungFu - Making Training in Distributed Machine Learning Adaptive

KungFu - Making Training in Distributed Machine Learning Adaptive

USENIX via YouTube Direct link

Challenges of Dataflow Collective Communication

12 of 17

12 of 17

Challenges of Dataflow Collective Communication

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

KungFu - Making Training in Distributed Machine Learning Adaptive

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Training in Distributed ML Systems
  3. 3 Parameters in Distributed ML Systems
  4. 4 Issues with Empirical Parameter Tuning
  5. 5 Proposals for Automatic Parameter Adaptation
  6. 6 Open Challenges
  7. 7 Existing Approaches for Adaptation
  8. 8 KungFu Overview
  9. 9 Adaptation Policies
  10. 10 Example: Adaptation Policy for GNS
  11. 11 Embedding Monitoring Inside Dataflow Problem: High monitoring cost reduces adaptation benefit Idea: Improve efficiency by adding monitoring operators to dataflow graph
  12. 12 Challenges of Dataflow Collective Communication
  13. 13 Making Collective Communication Asynchronous Idea: Use asynchronous collective communication
  14. 14 Issues When Adapting System Parameters
  15. 15 Distributed Mechanism for Parameter Adaptation
  16. 16 How Effectively Does KungFu Adapt?
  17. 17 Conclusions: Kung Fu

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.