Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the evolution and future of Apache Spark in this keynote from Spark + AI Summit 2020 featuring Matei Zaharia, the original creator of Apache Spark, and Brooke Wenig. Delve into the major community developments with the release of Apache Spark 3.0, designed to enhance usability, speed, and compatibility with various data sources and runtime environments. Discover how Spark 3.0 advances the project's goal of making data processing more accessible through improvements to SQL and Python APIs, as well as automatic tuning and optimization features. Reflect on Spark's 10-year journey since its initial open source release, examining the project's growth, user base expansion, and the evolving ecosystem around it, including Koalas, Delta Lake, and visualization tools. Gain insights into the latest developments in the open-source community, including Apache Spark 3.0 and DBR 7.0, and learn about Databricks' unified data analytics platform powered by Apache Spark.
Syllabus
This is a Special Year for Apache Spark
2008: Datacenter-scale computing
2009: Back to Berkeley
2010: Open Source Spark
2012-15: Expand Access to Spark
Apache Spark Today: Python
Apache Spark Today: SOL
Major Lessons
Apache Spark 3.0
Spark 3.0: SOL Engine
Spark 3.0: Python Usability Python type hints for Pandas UDFs
Spark 3.0: Python and R Performance
Spark 3.0: Other Features
Other Apache Spark Ecosystem Projects
Announcing Koalas 1.0!
Learning Spark 2nd Edition
OSS Spark Development Initiatives at Databricks
Taught by
Databricks