Cross-Platform Data Lineage with OpenLineage - Tracing Data Across Apache Spark and Airflow
DataGalaxy via YouTube
Overview
Learn about cross-platform data lineage tracking in this technical conference talk from the DataGalaxy Tech Summit 2023. Explore how OpenLineage provides a standardized approach to lineage collection across multiple platforms including Apache Airflow, Apache Spark, Flink, and dbt. Discover how data lineage helps map relationships between datasets across distributed organizational environments, enabling teams to identify and resolve data quality and efficiency issues in real-time. Through a live demonstration, observe how to implement data lineage tracking between Apache Spark and Apache Airflow, while gaining insights into the OpenLineage architecture and its practical applications in modern data environments. Perfect for data engineers and architects looking to better understand and manage complex data relationships across their technology stack.
Syllabus
Cross-Platform Data Lineage with OpenLineage | DataGalaxy Tech Summit 2023
Taught by
DataGalaxy