Overview
Syllabus
Intro
Spark in Workday Prism Analytics
Example: Data Validation
About Complex Plans
Common Subexpression Elimination (CSE)
CSE Benchmark
Logging Complex Plans (10s of MBs in Size)
Problems with Large Case Expressions
Handling Large Case Expressions in Catalyst
Large Case Expression Benchmark
Example: Generate New Filter
Example: Prune Redundant Filter
Example: New Filter on Other Side of Join
Current Constraint Propagation Algorithm
Current Algorithm Takes High Memory
Recall: Fix for Large Case Expressions
Optimized Constraint Propagation (SPARK-33152)
Constraint Propagation Algorithms Comparison
Constraint Propagation Benchmark
Effect on Customer Pipeline
Tuning Tips
Future Work
Taught by
Databricks