Overview
Explore the "dag Stack" - a robust data pipeline solution combining dbt, Airflow, and Great Expectations. Learn how to build a transformation layer with dbt, validate source data and add complex tests using Great Expectations, and orchestrate the entire pipeline with Apache Airflow. Discover practical examples of how these tools complement each other to ensure data quality, prevent "garbage in - garbage out" scenarios, and create comprehensive data documentation. Gain insights into automatic profiling, data testing, and validation techniques. Follow along with sample code demonstrations and technical pointers to implement this powerful stack in your own data engineering projects.
Syllabus
Intro
Who am I
Overview
dbt
sample code
dbt run
What dbt doesnt have
Apache Airflow
dbt in Airflow
Airflow dag file
What is Great Expectations
What is Great Expectations Statement
Typical Great Expectations Workflow
Automatic Profiling
Databox
Great Expectations Operator
Recap
Test your data
Where do we start
Technical pointers
Data testing
Data validation
Putting it all together
Airflow dag
Source data load validation
Running tests during development
Test integrity
Wrap up
QA
Taught by
Open Data Science