Overview
Embark on a comprehensive end-to-end data engineering journey, focusing on building a Reddit data pipeline using AWS services. Learn to extract data from Reddit's API, orchestrate ETL processes with Apache Airflow and Celery, and efficiently store data in Amazon S3. Discover how to leverage AWS Glue for data cataloging and ETL jobs, query and transform data using Amazon Athena, and set up a Redshift cluster for analytics. Gain insights into best practices for loading data into Amazon Redshift and explore data visualization techniques. Through hands-on demonstrations, master the integration of various tools and technologies to create a seamless ETL process, enhancing your skills in data pipeline engineering and AWS cloud services.
Syllabus
Introduction
Setting up Apache airflow with Celery Backend and Postgres
Reddit Data Pipeline with airflow
Cleaning and Transforming Reddit Data
Connecting to AWS from Airflow
AWS Glue data transformation
Querying Data with Athena
Setting up Redshift Data Warehouse
Redshift Data Warehouse Query Tool
Loading Data into Data Warehouse
Charting with Redshift Data Warehouse
Taught by
CodeWithYu