Discover how to scale and optimize data workflows for massive datasets using Apache Airflow in this 30-minute PyCon US talk. Learn to programmatically author, schedule, and monitor workflows without limiting pipeline scope. Explore best practices for implementing Airflow in various use cases, including ETL pipelines, manufacturing process scheduling, and enterprise batch process coordination. Gain insights on dynamically generating thousands of DAGs from JSON configuration files, automating DAG and infrastructure updates through CI/CD pipelines, and running tasks simultaneously. Suitable for both beginner and intermediate developers, this talk provides practical knowledge on scaling DAG factories beyond Airflow's out-of-the-box capabilities. Access a code repository and live demo to master techniques for handling thousands of DAGs efficiently, allowing more time for big data exploration and analysis.
Overview
Syllabus
Talks - Calvin Hendryx-Parker: Too Big for DAG Factories?
Taught by
PyCon US