Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Data Engineering Pipeline Management with Apache Airflow

via LinkedIn Learning

Go to class Write review

Overview

Explore ways to work with role-based access control, manage SLAs, schedule DAGs with datasets, work with Airflow Plugins, and scale Airflow.

Syllabus

Introduction

Features for data engineering pipeline management

1. Working with Role-Based Access Control

Prerequisites
Quick install overview
Creating an admin user and exploring roles
Creating users with different roles
Executing a simple branching DAG
Executing a simple SQL DAG
The public and viewer roles
The user role
The op role
Actions, resources, and permissions
Adding permissions to the public role
Creating and configuring a custom role

2. Managing SLAs

Configuring emails for SLA management
Configuring task-level SLAs
Triggering and viewing SLA misses
Configuring DAG-level SLAs
Configuring DAG failed action

3. Scheduling DAGs with Datasets

Dataset producer pipeline
Dataset consumer pipeline
Data-aware scheduling
Purchases producer pipeline and join pipeline
Data-aware scheduling with multiple datasets

4. Working with Airflow Plugins

Introducing plugins
Adding menu items using plugins
Exploring the CSV reader plugin
Implementing the CSV reader plugin

5. Scaling Airflow

Scaling Apache Airflow
Basic setup for the transformation pipeline
DAG for the transformation pipeline
Install RabbitMQ on macOS and Linux
Set up an admin user for RabbitMQ
Configuring the CeleryExecutor for Airflow
Executing tasks on a single Celery worker
Executing tasks on multiple Celery workers
Assigning tasks to queues

Conclusion

Summary and next steps

Taught by

Janani Ravi

Reviews

4.9 rating at LinkedIn Learning based on 22 ratings

Start your review of Data Engineering Pipeline Management with Apache Airflow