Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the fundamental concepts and skills required to design and develop data processing to pass the Microsoft Azure Data Engineer Associate (DP-203) certification exam.
Syllabus
1. Ingest and Transform Data
- Learning objectives
- Transform data by using Apache Spark
- Transform data by using Transact-SQL
- Transform data by using Data Factory
- Transform data by using Azure Synapse pipelines
- Transform data by using Stream Analytics
- Learning objectives
- Cleanse data
- Split data
- Shred JSON
- Encode and decode data
- Learning objectives
- Configure error handling for the transformation
- Normalize and denormalize values
- Transform data by using Scala
- Perform data exploratory analysis
- Learning objectives
- Develop batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse pipelines, PolyBase, and Azure Databricks
- Create data pipelines
- Design and implement incremental data loads
- Design and develop slowly changing dimensions
- Handle security and compliance requirements
- Scale resources
- Learning objectives
- Configure the batch size
- Design and create tests for data pipelines
- Integrate Jupyter and Python Notebooks into a data pipeline
- Handle duplicate data
- Handle missing data
- Handle late-arriving data
- Learning objectives
- Upsert data
- Regress to a previous state
- Design and configure exception handling
- Configure batch retention
- Revisit batch processing solution design
- Debug Spark jobs by using the Spark UI
- Learning objective
- Develop a stream processing solution by using Stream Analytics, Azure Databricks, and Azure Event Hubs
- Process data by using Spark structured streaming
- Monitor for performance and functional regressions
- Design and create windowed aggregates
- Handle schema drift
- Learning objectives
- Process time series data
- Process across partitions
- Process within one partition
- Configure checkpoints and watermarking during processing
- Scale resources
- Design and create tests for data pipelines
- Optimize pipelines for analytical or transactional purposes
- Learning objectives
- Handle interruptions
- Design and configure exception handling
- Upsert data
- Replay archived stream data
- Design a stream processing solution
- Learning objectives
- Trigger batches
- Handle failed batch loads
- Validate batch loads
- Manage data pipelines in Data Factory and Synapse pipelines
- Schedule data pipelines in Data Factory and Synapse pipelines
- Implement version control for pipeline artifacts
- Manage Spark jobs in a pipeline
Taught by
Microsoft Press and Tim Warner