In this course, you'll build pipelines leveraging Airflow DAGs to organize your tasks along with AWS resources such as S3 and Redshift.
Overview
Syllabus
- Introduction to Automating Data Pipelines
- Welcome to Automating Data Pipelines. In this lesson, you'll be introduced to the topic, prerequisites for the course, and the environment and tools you'll be using to build data pipelines.
- Data Pipelines
- In this lesson, you'll learn about the components of a data pipeline including Directed Acyclic Graphs (DAGs). You'll practice creating data pipelines with DAGs and Apache Airflow
- Airflow and AWS
- This lesson creates connections between Airflow and AWS first by creating credentials, then copying S3 data, leveraging connections and hooks, and building S3 data to the Redshift DAG.
- Data Quality
- Students will learn how to track data lineage and set up data pipeline schedules, partition data to optimize pipelines, investigating Data Quality issues, and write tests to ensure data quality.
- Production Data Pipelines
- In this last lesson, students will learn how to build Pipelines with maintainability and reusability in mind. They will also learn about pipeline monitoring.
- Data Pipelines
- Students work on a music streaming company’s data infrastructure by creating and automating a set of data pipelines with Airflow, monitoring and debugging production pipelines
Taught by
Sean Murdock