Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Apache Spark Core - Practical Optimization Techniques - Partition Shaping and Job Optimization

Databricks via YouTube

Overview

Dive into a comprehensive conference talk on Apache Spark Core optimization techniques. Learn how to properly shape partitions and jobs to enable powerful optimizations, eliminate skew, and maximize cluster utilization. Explore various Spark Partition shaping methods along with several optimization strategies, including join optimizations, aggregate optimizations, salting, and multi-dimensional parallelism. Gain insights into software hierarchy, hardware considerations, and practical demonstrations. Discover techniques such as lazy loading, data skipping, and shuffle partition management. Understand the importance of input and output partitions, workload balancing, and persistence strategies. Delve into advanced topics like DBIO Cache, Joint Optimization, Broadcast Join, and Skew Joins. By the end of this 1 hour and 32 minutes talk, master the skills needed to optimize Apache Spark Core for improved performance and efficiency in data analytics tasks.

Syllabus

Introduction
About Daniel
Agenda
Software Hierarchy
Demo
Hardware
Baseline
CP Utilization
ganglia reports
lazy loading
code
data skipping
optimizations
output
shuffle partitions
workload
shuffle partition example
shuffle partition summary
input partition summary
what does this do
output partitions
workload example
Partitions
Balance
Persistence
DBIO Cache
Joint Optimization
Broadcast Join
Skew Joins
Group Buys
The Beast

Taught by

Databricks

Reviews

Start your review of Apache Spark Core - Practical Optimization Techniques - Partition Shaping and Job Optimization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.