Modernizing Apache Spark 3.0 Applications - Best Practices and Optimization Techniques
Databricks via YouTube
Overview
Syllabus
Intro
Denis Diderot and the Diderot effect
The Diderot effect in data processing systems
The Diderot effect in Spark: Project Tungsten (2015)
The Diderot effect, revised for 2021
What's your oldest Spark application?
Abstractions can leak in performance tuning
Choosing the right partition size is difficult
Adaptive query execution: coalescing
Sidebar: some basics on joins
Adaptive query execution: partition pruning
Enabling adaptive query execution
Accelerating Spark with NVIDIA GPUs
Case study: predicting customer churn
What's next?
Taught by
Databricks