This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous training; static and dynamic inference; and batch and online processing. You delve into TensorFlow abstraction levels, the various options for doing distributed training, and how to write distributed training models with custom estimators. This is the second course of the Advanced Machine Learning on Google Cloud series. After completing this course, enroll in the Image Understanding with TensorFlow on Google Cloud course.
Overview
Syllabus
- Introduction to Advanced Machine Learning on Google Cloud
- Advanced Machine Learning on Google Cloud
- Welcome
- Architecting Production ML Systems
- Architecting ML systems
- Data extraction, analysis, and preparation
- Model training, evaluation, and validation
- Trained model, prediction service, and performance monitoring
- Training design decisions
- Serving design decisions
- Designing from scratch
- Using Vertex AI
- Lab introduction: Structured data prediction
- Structured data prediction using Vertex AI Platform
- Quiz: Architecting production ML systems
- Readings: Architecting production ML systems
- Designing Adaptable ML Systems
- Introduction
- Adapting to data
- Changing distributions
- Lab: Adapting to data
- Right and wrong decisions
- System failure
- Concept drift
- Actions to mitigate concept drift
- TensorFlow data validation
- Components of TensorFlow data validation
- Lab Introduction: Introduction to TensorFlow Data Validation
- Introduction to TensorFlow Data Validation
- Lab Introduction: Advanced Visualizations with TensorFlow Data Validation
- Advanced Visualizations with TensorFlow Data Validation
- Mitigating training-serving skew through design
- Vertex AI: Training and Serving a Custom Model
- Diagnosing a production model
- Quiz: Designing adaptable ML systems
- Readings: Designing adaptable ML systems
- Designing High-Performance ML Systems
- Introduction
- Training
- Predictions
- Why distributed training is needed
- Distributed training architectures
- TensorFlow distributed training strategies
- Mirrored strategy
- Multi-worker mirrored strategy
- TPU strategy
- Parameter server strategy
- Lab Introduction: Distributed Training with Keras
- Distributed Training with Keras
- Training on large datasets with tf.data API
- Lab Introduction: TPU-speed Data Pipelines
- TPU Speed Data Pipelines
- Inference
- Quiz: Designing high-performance ML systems
- Readings: Designing high-performance ML systems
- Building Hybrid ML Systems
- Introduction
- Machine Learning on Hybrid Cloud
- Kubeflow
- Lab Introduction: Kubeflow Pipelines with AI Platform
- Running Pipelines on Vertex AI 2.5
- TensorFlow Lite
- Optimizing TensorFlow for mobile
- Summary
- Quiz: Hybrid ML systems
- Readings: Hybrid ML systems
- Summary
- Course summary
- Production Machine learning systems - readings
- All quiz questions and answers
- Course Resources
- Architecting Production ML Systems Course Resources
- Your Next Steps
- Course Badge