In addition to batch pipelines, Data Fusion also allows you to create realtime pipelines, that can process events as they are generated. Currently, realtime pipelines execute using Apache Spark Streaming on Cloud Dataproc clusters. In this lab you you will learn how to build a streaming pipeline using Data Fusion.
Overview
Syllabus
- GSP808
- Overview
- Setup and requirements
- Task 1. Project permissions
- Task 2. Ensure that the Dataflow API is successfully enabled
- Task 3. Load the data
- Task 4. Setting up Pub/Sub Topic
- Task 5. Add a Pub/Sub subscription
- Task 6. Add necessary permissions for your Cloud Data Fusion instance
- Task 7. Navigate the Cloud Data Fusion UI
- Task 8. Build a realtime pipeline
- Task 9. Send messages into Cloud Pub/Sub
- Task 10. Viewing your pipeline metrics
- Congratulations!