Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

edX

Advanced Data Engineering

Pragmatic AI Labs via edX

Overview

Master Scalable Data Engineering with Cutting-Edge Tools

  • Learn to handle massive datasets efficiently with this advanced course
  • Gain practical expertise in scaling data systems using modern technologies
  • Ideal for data scientists, engineers & professionals with data handling experience

Course Highlights:

  • Leverage Celery & RabbitMQ for scalable data consumption
  • Optimize workflows with Apache Airflow for efficient management
  • Utilize Vector & Graph databases for robust data management at scale
  • Hands-on projects for real-world experience in solving data challenges
  • Create scalable systems & analyze performance for optimum results

Upskill to design, build & optimize data engineering pipelines that can handle complex, large-scale datasets. Prepare for demanding data roles by mastering advanced techniques with this comprehensive training.

Syllabus

Module 1: Queues and Databases-RabbitMQ and MySQL (6 hours)

\\- Video: Meet your instructor: Alfredo Deza (1 minute, Preview module)

\\- Video: About this course (2 minutes)

\\- Reading: Connect with your instructor (10 minutes)

\\- Reading: Meet your instructor: Noah Gift (10 minutes)

\\- Reading: Course structure and discussion etiquette (10 minutes)

\\- Video: Introduction (1 minute)

\\- Video: Overview of Queues (5 minutes)

\\- Video: What is Celery? (3 minutes)

\\- Reading: Key Terms (10 minutes)

\\- Reading: Introduction to Celery (10 minutes)

\\- Video: Use cases for RabbitMQ (3 minutes)

\\- Reading: Using RabbitMQ with Docker (10 minutes)

\\- Reading: External lab: Start RabbitMQ in a development environment (10 minutes)

\\- Video: Overview of a Flask and Celery application (3 minutes)

\\- Video: Summary (1 minute)

\\- Quiz: Introduction to RabbitMQ and Flask (30 minutes)

\\- Video: Introduction (0 minutes)

\\- Video: Configuring Celery with Flask (4 minutes)

\\- Video: Connecting Celery with RabbitMQ (5 minutes)

\\- Reading: Key Terms (10 minutes)

\\- Reading: Build a web app by using Python and Flask (10 minutes)

\\- Reading: Background tasks with Celery (10 minutes)

\\- Video: Defining a Celery task in Flask (3 minutes)

\\- Video: Fire and forget task in Flask (2 minutes)

\\- Video: Retrieve values from asynchronous tasks (3 minutes)

\\- Reading: External lab: Add a new Celery task for RabbitMQ (10 minutes)

\\- Video: Summary (1 minute)

\\- Quiz: RabbitMQ with Celery and Flask (30 minutes)

\\- Video: MySQL Overview (2 minutes)

\\- Reading: Key Terms (10 minutes)

\\- Reading: Getting Started with MySQL (10 minutes)

\\- Video: MySQL from Terminal (3 minutes)

\\- Video: Archive and Drop Database (5 minutes)

\\- Video: Import external database Sakila (7 minutes)

\\- Video: Modify database Sakila (4 minutes)

\\- Video: Bash pipelines with MySQL (5 minutes)

\\- Video: MySQL to Python Standard Library Web Server (4 minutes)

\\- Ungraded Lab: Linux Hacking with MySQL (60 minutes)

\\- Quiz: Quiz-MySQL for Data Engineering (30 minutes)

\\- Reading: Lesson Reflection (10 minutes)

\\- Discussion Prompt: Meet and greet (optional) (10 minutes)

\\- Quiz: Queues and Databases - Final week quiz (30 minutes)

****

Module 2: Optimizing Workflow Management at Scale with Apache Airflow (5 hours)

- Video: Introduction (1 minute, Preview module)

- Video: What is Apache Airflow? (6 minutes)

- Reading: Key Terms (10 minutes)

- Reading: What is Apache Airflow (10 minutes)

- Video: Installing Apache Airflow from PyPI (5 minutes)

- Video: Using Apache Airflow with Docker (6 minutes)

- Reading: Exploring the Airflow User Interface (10 minutes)

- Reading: External lab: Install Apache Airflow (10 minutes)

- Video: Exploring the Airflow UI (6 minutes)

- Quiz: Quiz-Installing Apache Airflow (30 minutes)

- Reading: Lesson Reflection (10 minutes)

- Video: Introduction (0 minutes)

- Video: Exploring directed acyclic graphs (DAG) (10 minutes)

- Reading: Key Terms (10 minutes)

- Reading: External lab: Create a DAG (10 minutes)

- Video: Creating a DAG (7 minutes)

- Video: Running a backfill (4 minutes)

- Reading: Architecture overview (10 minutes)

- Video: Testing and validation (7 minutes)

- Video: Summary (0 minutes)

- Quiz: Quiz-Apache Airflow Fundamentals (30 minutes)

- Reading: Lesson Reflection (10 minutes)

- Video: Introduction (1 minute)

- Video: Identifying a task to build a DAG (4 minutes)

- Reading: Key Terms (10 minutes)

- Reading: External Lab: Build a data pipeline for census data (10 minutes)

- Video: Retrieving remote data (4 minutes)

- Video: Cleaning and normalizing data (4 minutes)

- Video: Inspecting the UI for results (4 minutes)

- Reading: Build Data Pipelines with Apache Airflow (10 minutes)

- Video: Summary (1 minute)

- Reading: Lesson Reflection (10 minutes)

- Quiz: Quiz-Creating a pipeline (30 minutes)

- Quiz: Final Week Quiz-Optimizing Workflow Management at Scale with Apache Airflow (30 minutes)

****

Module 3: Achieving Scalability with Vector, Graph, and Key/Value Databases (5 hours)

- Video: Picking the proper database (3 minutes, Preview module)

- Video: What are vector databases and how they work (2 minutes)

- Reading: Key Terms (10 minutes)

- Reading: What is a Vector Database? (10 minutes)

- Video: Implementing Semantic search (4 minutes)

- Video: Quickstart Qdrant (3 minutes)

- Reading: External Lab: Run Quickstart of Qdrant (10 minutes)

- Video: Qdrant Rust Client (3 minutes)

- Reading: External Lab: Extend Semantic Search (10 minutes)

- Video: Vector Database Architectures (2 minutes)

- Video: Hands-on lab: Enhance Semantic Search (3 minutes)

- Reading: Jaccard index (10 minutes)

- Quiz: Quiz-Introduction to Vector Databases (30 minutes)

- Reading: Lesson Reflection (10 minutes)

- Video: Graph data models and database concepts (2 minutes)

- Reading: Key Terms (10 minutes)

- Reading: Rust CLI with Clap (10 minutes)

- Video: Introduction to Amazon Neptune (2 minutes)

- Reading: External Lab: Rust Graph CLI Tool (10 minutes)

- Video: Graph algorithms: UFC graph centrality in Rust (4 minutes)

- Video: Kosaraju Community Detection in Graphs (4 minutes)

- Video: Shortest Path with Graphs (3 minutes)

- Reading: Amazon Neptune (10 minutes)

- Video: Key Components of Rust CLI Tool (1 minute)

- Video: Lab Walkthrough: Building a Rust Graph CLI Tool (2 minutes)

- Quiz: Quiz-Introduction to Graph Databases (30 minutes)

- Reading: Lesson Reflection (10 minutes)

- Quiz: Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases (30 minutes)

- Ungraded Lab: Social Media Recommender (60 minutes)

****

Module 4: Real-world Advanced Data Engineering Projects (5 hours)

- Video: Learn AWS CloudShell for Dynamo Development (4 minutes, Preview module)

- Video: Learn AWS CodeCatalyst for Dynamo Development (5 minutes)

- Reading: Key Terms (10 minutes)

- Reading: Amazon CodeCatalyst (10 minutes)

- Video: Leveraging AWS CodeWhisperer for Dynamo Development (4 minutes)

- Video: Create a Table with CLI (1 minute)

- Video: Populate a Table With Batching Records (1 minute)

- Video: Query a Table with Records (2 minutes)

- Reading: External Lab: Extended DynamoDB (10 minutes)

- Video: Project Walkthrough (2 minutes)

- Quiz: Quiz-Building a solution with DynamoDB with the AWS CLI (30 minutes)

- Reading: Lesson Reflection (10 minutes)

- Video: Introduction (1 minute)

- Video: Overview of a pipeline requirements (3 minutes)

- Reading: Key Terms (10 minutes)

- Reading: Quick start for SQLAlchemy (10 minutes)

- Video: Using SqlAlchemy with Pandas (6 minutes)

- Reading: Explore and analyze data with Python (10 minutes)

- Video: Persisting data in a task (6 minutes)

- Video: Reviewing the results (4 minutes)

- Video: Summary (1 minute)

- Quiz: Quiz-Persisting data through a multi-task DAG with Pandas (30 minutes)

- Reading: Lesson Reflection (10 minutes)

- Reading: Recommended Next Steps (10 minutes)

- Quiz: Final Quiz-Advanced Data Engineering (30 minutes)

- Ungraded Lab: Jupyter Sandbox (60 minutes)

- Ungraded Lab: VS Code Sandbox (60 minutes)

Taught by

Alfredo Deza and Noah Gift

Reviews

Start your review of Advanced Data Engineering

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.