Apache Arrow and Substrait - The Secret Foundations of Data Engineering
EuroPython Conference via YouTube
Overview
Discover the transformative impact of Apache Arrow and Substrait on data engineering in this 44-minute conference talk from EuroPython 2023. Explore how PyArrow, the Python library for Apache Arrow, is becoming the de facto standard for data transfer and interoperability across libraries and languages. Learn about the growing adoption of Substrait as the standard representation for query plans, enabling seamless routing and decomposition of queries across different engines. Gain insights into how popular Python libraries like Pandas and Polars leverage Arrow, and understand how compute engines such as Velox, Datafusion, and Acero are embracing both Arrow and Substrait. Witness the construction of a basic database system using Arrow and Substrait with minimal code, showcasing the powerful foundations these technologies provide for modern data engineering.
Syllabus
Apache Arrow and Substrait, the secret foundations of Data Engineering — Alessandro Molina
Taught by
EuroPython Conference