Apache Spark Core - Practical Optimization Techniques - Partition Shaping and Job Optimization
Databricks via YouTube
Overview
Syllabus
Introduction
About Daniel
Agenda
Software Hierarchy
Demo
Hardware
Baseline
CP Utilization
ganglia reports
lazy loading
code
data skipping
optimizations
output
shuffle partitions
workload
shuffle partition example
shuffle partition summary
input partition summary
what does this do
output partitions
workload example
Partitions
Balance
Persistence
DBIO Cache
Joint Optimization
Broadcast Join
Skew Joins
Group Buys
The Beast
Taught by
Databricks