Gain the knowledge you need to build data pipelines in a data-driven world.
Overview
Syllabus
Introduction
- Create an ETL in Python and SQL
- Tools used in this course
- What are ETLs and how do you create them?
- ETL process overview
- Exploring your data with pandas (Python) and SQL
- Understanding your data
- Challenge: Reading data using Python
- Solution: Reading data using Python
- Loading data from different sources
- Extracting your data
- Cleaning, preprocessing data, and data formatting
- Standardization, handling duplicates, and missing values
- Challenge: Extract and transform data using pandas
- Solution: Extract and transform data using pandas
- Introduction to data warehouses and data lakes
- Loading data into relational databases
- Data quality checks and validation with SQL
- Challenge: Transform the data and remove duplicates and nulls
- Solution: Transform the data and remove duplicates and nulls
- Querying your data with SQL
- Scheduling ETL jobs with Airflow: Part 1
- Scheduling ETL jobs with Airflow: Part 2
- Challenge: Load the data into a database and automate
- Solution: Load the data into a database and automate
- Expand your knowledge of ETLs
Taught by
Jennifer Ebe