Learn how to design and build big data pipelines on Google Cloud Platform.
Overview
Syllabus
Introduction
- What goes into a data pipeline?
- Data science modules covered
- GCP data pipeline options
- Cloud Dataproc
- Cloud Dataflow
- Cloud Pub/Sub
- What is Apache Beam?
- Beam pipelines
- PCollections
- Transforms
- Pipeline I/O
- Runners
- Setting up GCP for Dataflow
- Setting up Python
- Creating a simple pipeline
- Executing in Dataflow
- Reading text files
- ParDo
- GroupBy
- Map
- Combine
- Writing data to text files
- Other capabilities
- What is Pub/Sub?
- Topics and messages
- Publishers
- Subscribers
- Create a topic
- Create a subscription
- Publish and receive
- Python SDK
- Streaming with Dataflow
- Windowing with Dataflow
- Streaming and windowing example
- Next steps
Taught by
Kumaran Ponnambalam