This fundamental-level Quest offers hands-on practice with Cloud Data Fusion, a cloud-native, code-free, data integration platform. ETL Developers, Data Engineers and Analysts can greatly benefit from the pre-built transformations and connectors to build and deploy their pipelines without worrying about writing code. This Quest starts with a quickstart lab that familiarises learners with the Cloud Data Fusion UI. Learners then get to try running batch and realtime pipelines as well as using the built-in Wrangler plugin to perform some interesting transformations on data.
Overview
Syllabus
- Getting Started with Cloud Data Fusion
- In this lab you will learn how to create a Data Fusion instance and deploy a sample pipeline
- Building Batch Pipelines in Cloud Data Fusion
- This lab will teach you how to use the Pipeline Studio in Cloud Data Fusion to build an ETL pipeline. Pipeline Studio exposes the building blocks and built-in plugins for you to build your batch pipeline, one node at a time. You will also use the Wrangler plugin to build and apply transformations to your data that goes through the pipeline.
- Building Transformations and Preparing Data with Wrangler in Cloud Data Fusion
- In this lab youâll be working with Wrangler directives which are used by the Wrangler plugin, the âSwiss Army Knifeâ of plugins in the Data Fusion platform, so that your transformations are encapsulated in one place and we can group transformation tasks into manageable blocks.
- Building Realtime Pipelines in Cloud Data Fusion
- In addition to batch pipelines, Data Fusion also allows you to create realtime pipelines, that can process events as they are generated. Currently, realtime pipelines execute using Apache Spark Streaming on Cloud Dataproc clusters. In this lab you you will learn how to build a streaming pipeline using Data Fusion.