Explore cutting-edge techniques for accelerating distributed deep neural network (DNN) training in this 50-minute conference talk by Manya Ghobadi at SPCL_Bcast. Delve into the challenges posed by increasing dataset and model sizes, and discover innovative solutions to overcome network bottlenecks in datacenter environments. Learn about a novel optical fabric that optimizes network topology and parallelization strategies for DNN clusters. Examine the limitations of fair-sharing in congestion control algorithms and understand a new scheduling approach that strategically places jobs on network links to enhance performance. Gain insights into the future of machine learning infrastructure and network design for improved training efficiency.
Next-Generation Networks for Machine Learning
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube
Overview
Syllabus
Introduction
Talk
Announcements
Taught by
Scalable Parallel Computing Lab, SPCL @ ETH Zurich