Scaling Distributed Machine Learning with the Parameter Server

Overview

Explore a conference talk from OSDI '14 that introduces a parameter server framework for distributed machine learning. Learn about the framework's ability to manage asynchronous data communication between nodes, support flexible consistency models, elastic scalability, and continuous fault tolerance. Discover how this approach distributes both data and workloads over worker nodes while maintaining globally shared parameters on server nodes. Examine experimental results demonstrating the framework's scalability on petabytes of real data with billions of examples and parameters, covering problems from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching.

Syllabus

OSDI '14 - Scaling Distributed Machine Learning with the Parameter Server

Taught by

USENIX

Reviews

Start your review of Scaling Distributed Machine Learning with the Parameter Server

Taught by

Scalable Machine Learning with the Microsoft Machine Learning Server

Inside TensorFlow - Parameter Server Training

KungFu - Making Training in Distributed Machine Learning Adaptive

Alpa - Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Never Stop Learning.