Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Modernizing Apache Spark 3.0 Applications - Best Practices and Optimization Techniques

Databricks via YouTube

Overview

Explore strategies for modernizing Apache Spark applications to leverage the full potential of Spark 3.0 and beyond in this 25-minute talk by Databricks. Learn about common sources of technical debt in mature Spark applications and how to address them, discover when to replace manual configurations with Adaptive Query Execution, and understand how to optimize queries for columnar processing and GPU execution. Gain insights from concrete examples of customer churn modeling, recent experiences in modernizing Spark applications, and lessons learned from maintaining Spark extensions across multiple versions. Delve into topics such as the Diderot effect in data processing systems, Project Tungsten, adaptive query execution techniques, and accelerating Spark with NVIDIA GPUs. Acquire valuable knowledge to enhance your analytics workloads and incorporate accelerated ML training directly into your Spark applications.

Syllabus

Intro
Denis Diderot and the Diderot effect
The Diderot effect in data processing systems
The Diderot effect in Spark: Project Tungsten (2015)
The Diderot effect, revised for 2021
What's your oldest Spark application?
Abstractions can leak in performance tuning
Choosing the right partition size is difficult
Adaptive query execution: coalescing
Sidebar: some basics on joins
Adaptive query execution: partition pruning
Enabling adaptive query execution
Accelerating Spark with NVIDIA GPUs
Case study: predicting customer churn
What's next?

Taught by

Databricks

Reviews

Start your review of Modernizing Apache Spark 3.0 Applications - Best Practices and Optimization Techniques

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.