Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Inside TensorFlow - Parameter Server Training

TensorFlow via YouTube

Overview

Explore parameter server training, a data-parallel method for scaling up model training across multiple machines, in this TensorFlow video. Learn about adaptive learning rates, synchronous parameter server training, and evaluation techniques. Discover the challenges and benefits of multi-client and single-client setups, and delve into custom training loops with parameter servers. Examine current API limitations, variable sharding, and ongoing developments in runtime, performance, and scalability. Gain insights into distributed functions, large embedding models, and performance comparisons with Estimator. Understand worker profiles, multi-step packing, and fault tolerance strategies for preemptions and failures. Explore large-scale fault tolerance testing, running jobs with preemptible resources, and utilizing the multi-worker testing framework and MLCompass dashboard.

Syllabus

Intro
Parameter Server Training Overview
Adaptive Learning Rate
Synchronous Parameter Server Training
Evaluation by Estimator
Problems with Multi-Client Setup
Benefits of Single-Client Setup
Problems of Single-Client Setup
Schedule/Join APIs
Custom Training Loop with PS
Current Limitations of the APIs
Benefits of Inline Evaluation
Current Limitations of Inline Evaluation
Variable Sharding
Ongoing and Future Work
Runtime, Performance, and Scalability
Parameter server training in runtime
Invoke model func with async schedule API
Distributed functions in PS training
Large embedding model
Performance compared with Estimator
Worker profiles with multi-step packing
Multi-step packing: pros and cons
Preemptions and failures
Fault tolerance: worker failures
Large-scale fault tolerance testing
Run jobs with preemptible resources
Multi-worker testing framework
MLCompass dashboard

Taught by

TensorFlow

Reviews

Start your review of Inside TensorFlow - Parameter Server Training

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.