Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the fundamental concepts and skills required to design and develop data processing to pass the Microsoft Azure Data Engineer Associate (DP-203) certification exam.

Syllabus

1. Ingest and Transform Data

Learning objectives
Transform data by using Apache Spark
Transform data by using Transact-SQL
Transform data by using Data Factory
Transform data by using Azure Synapse pipelines
Transform data by using Stream Analytics

2. Work with Transformed Data

Learning objectives
Cleanse data
Split data
Shred JSON
Encode and decode data

3. Troubleshoot Data Transformations

Learning objectives
Configure error handling for the transformation
Normalize and denormalize values
Transform data by using Scala
Perform data exploratory analysis

4. Design a Batch Processing Solution

Learning objectives
Develop batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse pipelines, PolyBase, and Azure Databricks
Create data pipelines
Design and implement incremental data loads
Design and develop slowly changing dimensions
Handle security and compliance requirements
Scale resources

5. Develop a Batch Processing Solution

Learning objectives
Configure the batch size
Design and create tests for data pipelines
Integrate Jupyter and Python Notebooks into a data pipeline
Handle duplicate data
Handle missing data
Handle late-arriving data

6. Configure a Batch Processing Solution

Learning objectives
Upsert data
Regress to a previous state
Design and configure exception handling
Configure batch retention
Revisit batch processing solution design
Debug Spark jobs by using the Spark UI

7. Design a Stream Processing Solution

Learning objective
Develop a stream processing solution by using Stream Analytics, Azure Databricks, and Azure Event Hubs
Process data by using Spark structured streaming
Monitor for performance and functional regressions
Design and create windowed aggregates
Handle schema drift

8. Process Data in a Stream Processing Solution