Overview
Explore a comprehensive 26-minute talk on integrating Data Build Tool (DBT) with Databricks and Delta for efficient data lake management. Learn how this open-source, SQL-first technology enhances data quality and documentation throughout the data lake lifecycle. Discover the basics of DBT and its synergy with Databricks for powerful data processing. Examine how DBT supports Delta to enable SQL-based upserts. Investigate the integration of DBT and Databricks within the Azure cloud environment. Gain insights into emitting pipeline metrics to Azure Monitor for improved observability. Dive into topics such as DBT as a SQL runner and compiler, documentation generation, testing, incremental ingestion, DBT macros, and the use of Hive UDFs. Master the art of maintaining high-quality data pipelines using software engineering best practices.
Syllabus
Intro
GoDataDriven
Data Build Tool
SOL with some Ninja2 sauce
DBT as a SOL Runner
DBT as a SOL Compiler
Next to the SOL there is documentation
dbt docs generate dbt docs serve
Testing
How does DBT communicate with Spark?
Switch to incremental ingestion
Switch to incremental Delta
In practice
DBT Macro's
Observability is king
Very simple Hive UDF
Small snippet of Scala
Use the UDF in DBT
Be proactive
Feedback
Taught by
Databricks