Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Production Machine Learning Systems

Google via Google Cloud Skills Boost

Go to class Write review

Overview

This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous training; static and dynamic inference; and batch and online processing. You delve into TensorFlow abstraction levels, the various options for doing distributed training, and how to write distributed training models with custom estimators. This is the second course of the Advanced Machine Learning on Google Cloud series. After completing this course, enroll in the Image Understanding with TensorFlow on Google Cloud course.

Syllabus

Introduction to Advanced Machine Learning on Google Cloud

Advanced Machine Learning on Google Cloud
Welcome

Architecting Production ML Systems

Architecting ML systems
Data extraction, analysis, and preparation
Model training, evaluation, and validation
Trained model, prediction service, and performance monitoring
Training design decisions
Serving design decisions
Designing from scratch
Using Vertex AI
Lab introduction: Structured data prediction
Structured data prediction using Vertex AI Platform
Quiz: Architecting production ML systems
Readings: Architecting production ML systems

Designing Adaptable ML Systems

Introduction
Adapting to data
Changing distributions
Lab: Adapting to data
Right and wrong decisions
System failure
Concept drift
Actions to mitigate concept drift
TensorFlow data validation
Components of TensorFlow data validation
Lab Introduction: Introduction to TensorFlow Data Validation
Introduction to TensorFlow Data Validation
Lab Introduction: Advanced Visualizations with TensorFlow Data Validation
Advanced Visualizations with TensorFlow Data Validation
Mitigating training-serving skew through design
Vertex AI: Training and Serving a Custom Model
Diagnosing a production model
Quiz: Designing adaptable ML systems
Readings: Designing adaptable ML systems

Designing High-Performance ML Systems

Introduction
Training
Predictions
Why distributed training is needed
Distributed training architectures
TensorFlow distributed training strategies
Mirrored strategy
Multi-worker mirrored strategy
TPU strategy
Parameter server strategy
Lab Introduction: Distributed Training with Keras
Distributed Training with Keras
Training on large datasets with tf.data API
Lab Introduction: TPU-speed Data Pipelines
TPU Speed Data Pipelines
Inference
Quiz: Designing high-performance ML systems
Readings: Designing high-performance ML systems

Building Hybrid ML Systems

Introduction
Machine Learning on Hybrid Cloud
Kubeflow
Lab Introduction: Kubeflow Pipelines with AI Platform
Running Pipelines on Vertex AI 2.5
TensorFlow Lite
Optimizing TensorFlow for mobile
Summary
Quiz: Hybrid ML systems
Readings: Hybrid ML systems

Summary

Course summary
Production Machine learning systems - readings
All quiz questions and answers

Course Resources

Architecting Production ML Systems Course Resources

Your Next Steps

Course Badge

Tags

united states

Reviews

Start your review of Production Machine Learning Systems