Overview
Syllabus
Intro
A class with multiple implementations
Data parallelism
Parameter servers and workers
Central Storage
Mirrored Variables
All-reduce algorithm
Ring all-reduce
Hierarchical all-reduce
OneDevice Strategy
Parallel input preprocessing: coming
What changes when you switch strategies?
# Training with Keras
# Training with Estimator
Concept: Mirrored vs. per-replica values
Support computations following this pattern
setup
loss, optimizer
# Custom training loop, part 3: each replica
Concept: Modes
all replicas
outer loop
Default Strategy
# Average loss using the global batch size
# Optimizer implementation, part 1
merge_call(fn, args) is our secret weapon
# Optimizer implementation, part 2
Concept: Replica vs. variable locality
One standard pattern for updating state
# Example: Mean metric
Questions?
Taught by
TensorFlow