- Discover the principles of data engineering and its role in building scalable, cloud-based systems.
- Explore the challenges of the end of Moore's Law and learn to develop distributed systems.
- Gain hands-on experience with big data technologies and best practices for implementing solutions.
- Learn to build serverless data engineering pipelines and apply effective data governance strategies.
- Develop expertise in key data engineering tasks, including ETL, cloud databases, and cloud storage.
Overview
Syllabus
Here is the course structure formatted with bullets for each module:
1. Module 1: Methodologies in Data Engineering (12 hours)
- Videos:
- Introduction and Course Overview (4 minutes)
- The End of Moore's Law and Concurrency in Python (7 minutes)
- Using CUDA, Numba, and ASICs (13 minutes)
- Exploring Colab Pro and Colab AI (9 minutes)
- Distributed Systems Concepts (9 minutes)
- Debugging Python Code (25 minutes)
- Exploring Google BigQuery (12 minutes)
- Introduction to Big Data and Data Lakes (4 minutes)
- Big Data Processing (3 minutes)
- AWS Data Engineering Design Principles (20 minutes)
- Processing Big Data with AWS (25 minutes)
- Transform Data with Databricks Spark SQL (5 minutes)
- Readings (22 readings, 220 minutes)
- Quizzes (5 quizzes, 150 minutes)
- Discussion Prompts (4 discussion prompts, 40 minutes)
- Ungraded Labs (3 ungraded labs, 180 minutes)
2. Module 2: Principles of Data Engineering (11 hours)
- Videos:
- Introduction to Data Engineering (1 minute)
- Data Driven Organizations (19 minutes)
- Batch vs. Streaming vs. Events (1 minute)
- Ingesting by Batch or Stream (20 minutes)
- Building CLI Tools with Click (33 minutes)
- Building Containerized Command-line Tools (12 minutes)
- Rust and Python (5 minutes)
- Python Calculator CLI and Caesar Cipher CLI (7 minutes)
- Advanced Testing with Amazon CodeGuru and AWS CodeBuild (44 minutes)
- Mapping Functions to CLI (58 minutes)
- AWS CodeWhisperer CLI and SDK (7 minutes)
- Readings (10 readings, 100 minutes)
- Quizzes (4 quizzes, 120 minutes)
- Discussion Prompts (3 discussion prompts, 30 minutes)
- Ungraded Labs (4 ungraded labs, 240 minutes)
3. Module 3: Building Data Engineering Pipelines (6 hours)
- Videos:
- Introduction to Serverless Data Engineering (0 minutes)
- Automating Pipelines (21 minutes)
- Serverless Concepts (17 minutes)
- AWS Lambda (42 minutes)
- Build a Serverless Data Pipeline (37 minutes)
- Serverless Cookbook with AWS and GCP (49 minutes)
- Introduction to Data Governance (0 minutes)
- The Principle of Least Privilege (1 minute)
- Cloud Security with IAM on AWS (30 minutes)
- Encrypt at Rest and Transit (3 minutes)
- Readings (7 readings, 70 minutes)
- Quizzes (3 quizzes, 90 minutes)
- Discussion Prompts (2 discussion prompts, 20 minutes)
4. Module 4: Applying Key Data Engineering Tasks (10 hours)
- Videos:
- Introduction to Extract, Transform, Load (ETL) (0 minutes)
- Ingesting and Preparing Data on AWS (19 minutes)
- Using Amazon Athena with AWS Glue (22 minutes)
- Real-World Problems in ETL (13 minutes)
- Introduction to Cloud Databases (6 minutes)
- MySQL Overview and Usage (28 minutes)
- Big Query with Prompt Engineering and Colab Pipeline (14 minutes)
- Introduction to Cloud Storage (0 minutes)
- Cloud Storage Deep Dive (13 minutes)
- Using Amazon S3 (4 minutes)
- Readings (10 readings, 100 minutes)
- Quizzes (4 quizzes, 120 minutes)
- Discussion Prompts (3 discussion prompts, 30 minutes)
- Ungraded Labs (4 ungraded labs, 240 minutes)
Taught by
Noah Gift