Overview
Syllabus
Intro
Autoscaling on cloud
Upscale easy, downscale difficult
How are nodes used?
Factors affecting node downscaling
Terminology Any cluster generally comprises of following entities: • Resource Manager
Current resource allocation strategy
Example revisited with new allocation strategy
Downscale issues with Min Executors
Min executors distribution without packing
Min executors distribution with packing
How Shuffle data is produced / consumed?
External Shuffle Service
ESS at Qubole
Recap
Shuffle Cleanup • Shuffle data is deleted at the end of application by ESS
Issues with long running applications
Shuffle reuse in Spark
Downscaling a Node
Spark - Disaggregation of Compute and Storage • Mount some NFS endpoint on all the nodes of cluster • Change shuffle manager in Spark to something which can read/write shuffle from NFS mount point
Summary and Future Work
Taught by
Databricks