Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the architectural tradeoffs between map/reduce and parallel databases in this 25-minute conference talk from Databricks. Dive deep into the architectures of Presto and Apache Spark, focusing on key differentiators like disaggregated shuffle. Learn about the Presto-on-Spark project, a specialized Data Frame application that combines Presto's low-latency evaluation with Spark's robust execution engine. Discover the motivation, design, and current status of this initiative aimed at enabling a unified SQL experience for both interactive and batch use cases. Gain insights into Facebook's experience scaling both Presto and Spark for large-scale batch workloads, and understand the potential for greater collaboration between the Spark and Presto communities.
Syllabus
Intro
SOL Use Cases @ Facebook
Towards an Unified SOL Experience
Presto and Spark Architecture
Why Presto (or Other MPPs) Doesn't Scale?
Presto Unlimited
Why Presto-on-Spark
Presto-on-Spark Design Principles
Planning
Translating to RDD
Columnar Format to Row Format Conversion
Broadcast Join
Spark DAG
Execution
Threading Model
Classloader Isolation
Current Status
Taught by
Databricks