Explore ways to work with role-based access control, manage SLAs, schedule DAGs with datasets, work with Airflow Plugins, and scale Airflow.
Overview
Syllabus
Introduction
- Features for data engineering pipeline management
- Prerequisites
- Quick install overview
- Creating an admin user and exploring roles
- Creating users with different roles
- Executing a simple branching DAG
- Executing a simple SQL DAG
- The public and viewer roles
- The user role
- The op role
- Actions, resources, and permissions
- Adding permissions to the public role
- Creating and configuring a custom role
- Configuring emails for SLA management
- Configuring task-level SLAs
- Triggering and viewing SLA misses
- Configuring DAG-level SLAs
- Configuring DAG failed action
- Dataset producer pipeline
- Dataset consumer pipeline
- Data-aware scheduling
- Purchases producer pipeline and join pipeline
- Data-aware scheduling with multiple datasets
- Introducing plugins
- Adding menu items using plugins
- Exploring the CSV reader plugin
- Implementing the CSV reader plugin
- Scaling Apache Airflow
- Basic setup for the transformation pipeline
- DAG for the transformation pipeline
- Install RabbitMQ on macOS and Linux
- Set up an admin user for RabbitMQ
- Configuring the CeleryExecutor for Airflow
- Executing tasks on a single Celery worker
- Executing tasks on multiple Celery workers
- Assigning tasks to queues
- Summary and next steps
Taught by
Janani Ravi