KungFu - Making Training in Distributed Machine Learning Adaptive

KungFu - Making Training in Distributed Machine Learning Adaptive

USENIX via YouTube Direct link

Intro

1 of 17

1 of 17

Intro

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

KungFu - Making Training in Distributed Machine Learning Adaptive

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Training in Distributed ML Systems
  3. 3 Parameters in Distributed ML Systems
  4. 4 Issues with Empirical Parameter Tuning
  5. 5 Proposals for Automatic Parameter Adaptation
  6. 6 Open Challenges
  7. 7 Existing Approaches for Adaptation
  8. 8 KungFu Overview
  9. 9 Adaptation Policies
  10. 10 Example: Adaptation Policy for GNS
  11. 11 Embedding Monitoring Inside Dataflow Problem: High monitoring cost reduces adaptation benefit Idea: Improve efficiency by adding monitoring operators to dataflow graph
  12. 12 Challenges of Dataflow Collective Communication
  13. 13 Making Collective Communication Asynchronous Idea: Use asynchronous collective communication
  14. 14 Issues When Adapting System Parameters
  15. 15 Distributed Mechanism for Parameter Adaptation
  16. 16 How Effectively Does KungFu Adapt?
  17. 17 Conclusions: Kung Fu

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.