Distributed TensorFlow - TensorFlow at O'Reilly AI Conference, San Francisco '18
TensorFlow via YouTube
Overview
Learn distributed TensorFlow training using Keras high-level APIs in this 33-minute conference talk from the O'Reilly AI Conference in San Francisco. Explore TensorFlow's distributed architecture, set up a distributed cluster with Kubeflow and Kubernetes, and discover how to distribute Keras models. Dive into concepts like data parallelism, mirrored variables, ring all-reduce, and synchronous training. Understand performance on multi-GPU setups and learn to configure and deploy Kubernetes clusters. Gain insights into hierarchical all-reduce and how model code is automatically distributed. Access additional resources on distribution strategies and APIs to enhance your understanding of distributed TensorFlow training.
Syllabus
Training can take a long time
Data parallelism
Mirrored Variables
Ring All-reduce
Synchronous training
Performance on Multi-GPU
Setting up multi-node Environment
Deploy your Kubernetes cluster
Hierarchical All-Reduce
Model Code is Automatically Distributed
Configuring Cluster
Taught by
TensorFlow