Overview
Syllabus
Intro
Kubernetes is a new cluster manager for Spark
The Spark on Kubernetes Journey
Spark on YARN: architecture & pain points
Spark on Kubernetes: architecture & benefits
Our background - Ocean for Apache Spark
Spot instances
How does Spark cope with spot interruptions?
Best practice: run driver OD, execs on Spot
This is how your cluster may look like
Limitation: Avoid cross-Az data transfer
We ran an experiment to measure the impact
Experiment results
Since Spark 3.1: Graceful Exec Decommissioning
Spark 3.1 - Graceful Exec Decommissioning
Graceful Exec Decommissioning - Experiment
Since Spark 3.2: Executor PVC Reuse
What's new in Spark 3.3 for Spark-on-kes
DATA+AI SUMMIT 2022
Taught by
Databricks