In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. We will examine tools and techniques for troubleshooting and optimizing pipeline performance. We will then review testing, deployment, and reliability best practices for Dataflow pipelines. We will conclude with a review of Templates, which makes it easy to scale Dataflow pipelines to organizations with hundreds of users. These lessons will help ensure that your data platform is stable and resilient to unanticipated circumstances.
Overview
Syllabus
- Introduction
- Course Introduction
- Monitoring
- Job List
- Job Info
- Job Graph
- Job Metrics
- Metrics Explorer
- Quiz
- Additional Resources
- Logging and Error Reporting
- Logging
- Error Reporting
- Quiz
- Additional Resources
- Troubleshooting and Debug
- Troubleshooting workflow
- Types of troubles
- Quiz
- Serverless Data Processing with Dataflow - Monitoring, Logging and Error Reporting for Dataflow Jobs
- Additional Resources
- Performance
- Pipeline Design
- Data Shape
- Source, Sinks & external systems
- Shuffle and streaming engine
- Quiz
- Additional Resources
- Testing and CI/CD
- Testing and CI/CD Overview
- Unit Testing
- Integration Testing
- Artifact Building
- Deployment
- Quiz
- Serverless Data Processing with Dataflow - Testing with Apache Beam (Java)
- Serverless Data Processing with Dataflow - Testing with Apache Beam (Python)
- Serverless Data Processing with Dataflow - CI/CD with Dataflow
- Additional Resources
- Reliability
- Introduction to Reliability
- Monitoring
- Geolocation
- Disaster Recovery
- High Availability
- Quiz
- Additional Resources
- Flex Templates
- Classic templates
- Flex templates
- Using flex templates
- Google provided templates
- Quiz
- Serverless Data Processing with Dataflow - Custom Dataflow Flex Templates (Java)
- Serverless Data Processing with Dataflow - Custom Dataflow Flex Templates (Python)
- Additional Resources
- Summary
- Course Summary
- Your Next Steps
- Course Badge