Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Reddit Data Pipeline Engineering with AWS - End-to-End Data Engineering

CodeWithYu via YouTube

Overview

Embark on a comprehensive end-to-end data engineering journey, focusing on building a Reddit data pipeline using AWS services. Learn to extract data from Reddit's API, orchestrate ETL processes with Apache Airflow and Celery, and efficiently store data in Amazon S3. Discover how to leverage AWS Glue for data cataloging and ETL jobs, query and transform data using Amazon Athena, and set up a Redshift cluster for analytics. Gain insights into best practices for loading data into Amazon Redshift and explore data visualization techniques. Through hands-on demonstrations, master the integration of various tools and technologies to create a seamless ETL process, enhancing your skills in data pipeline engineering and AWS cloud services.

Syllabus

Introduction
Setting up Apache airflow with Celery Backend and Postgres
Reddit Data Pipeline with airflow
Cleaning and Transforming Reddit Data
Connecting to AWS from Airflow
AWS Glue data transformation
Querying Data with Athena
Setting up Redshift Data Warehouse
Redshift Data Warehouse Query Tool
Loading Data into Data Warehouse
Charting with Redshift Data Warehouse

Taught by

CodeWithYu

Reviews

Start your review of Reddit Data Pipeline Engineering with AWS - End-to-End Data Engineering

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.