Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Horovod - Distributed Deep Learning for Reliable MLOps

Linux Foundation via YouTube

Overview

Explore distributed deep learning techniques and reliable MLOps practices at Uber in this 30-minute conference talk by Travis Addair. Dive into the early adoption of Horovod, understand distributed deep learning concepts, and compare parameter servers with the Allreduce technique. Examine benchmarking results, learn about deep learning applications in research and production environments, and discover feature stores for efficient model training. Investigate preprocessing techniques, Spark ML pipelines, and Petastorm for data access in deep learning. Address challenges of training on large datasets, explore Spark 3.0's resource-aware scheduling, and learn about Horovod Lambda for CPU-based data processing. Gain insights into online prediction using Neuropod, workflow authoring, and the process of ideating, defining, evaluating, and deploying deep learning models within a single script. Conclude with an overview of feature engineering, model construction, deployment, and Elastic Horovod's control flow capabilities.

Syllabus

Intro
Early Adoption of Horovod
Deep Learning Refresher
Distributed Deep Learning
Early Distributed Training - Parameter Servers
Parameter Servers - Tradeoffs
Horovod Technique: Allreduce
Benchmarking
Deep Learning in Research
Deep Learning in Production
Feature Store
Model Training
Preprocessing
Spark ML Pipelines
Petastorm: Data Access for Deep Learning Training Challenges of Training on Large Datasets
Spark 3.0: Resource Aware Scheduling
What if my Spark cluster doesn't have GPUs? Horovod Lambda - Run data processing on CPUs with Spark
Online Prediction
Neuropod: Out-of-Process Execution
Workflow Authoring Can we ideate, define, evaluate and deploy a Deep Learning model all within a single script?
Feature Engineering
Model Construction
Model Deployment
Elastic Horovod: Control Flow

Taught by

Linux Foundation

Reviews

Start your review of Horovod - Distributed Deep Learning for Reliable MLOps

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.