What you'll learn:
- Learn how to model data management solutions on Databricks Lakehouse
- Build data processing pipelines using the Spark and Delta Lake APIs
- Understand how to use and the benefits of using the Databricks platform and its tools
- Build production pipelines using best practices around security and governance
- Learn how to monitor and log production jobs
- Follow best practices for deploying code on Databricks
If you are interested in becoming a Certified Data Engineer Professional from Databricks, you have come to the right place! This study guide will help you with preparing for this certification exam.
By the end of this course, you should be able to:
Model data management solutions, including:
Lakehouse (bronze/silver/gold architecture, tables, views, and the physical layout)
General data modeling concepts (constraints, lookup tables, slowly changing dimensions)
Build data processing pipelines using the Spark and Delta Lake APIs, including:
Building batch-processed ETL pipelines
Building incrementally processed ETL pipelines
Deduplicating data
Using Change Data Capture (CDC) to propagate changes
Optimizing workloads
Understand how to use and the benefits of using the Databricks platform and its tools, including:
Databricks CLI (deploying notebook-based workflows)
Databricks REST API (configure and trigger production pipelines)
Build production pipelines using best practices around security and governance, including:
Managing clusters and jobs permissions with ACLs
Creating row- and column-oriented dynamic views to control user/group access
Securely delete data as requested according to GDPR & CCPA
Configure alerting and storage to monitor and log production jobs, including:
Recording logged metrics
Debugging errors
Follow best practices for managing, testing and deploying code, including:
Relative imports
Scheduling Jobs
Orchestration Jobs
With the knowledge you gain during this course, you will be ready to take the certification exam.
I am looking forward to meeting you!