Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Downscaling Apache Spark Clusters - Challenges and Solutions

Databricks via YouTube

Overview

Explore the challenges and solutions for downscaling Apache Spark clusters in this 36-minute conference talk by Prakhar Jain from Databricks. Dive into the complexities of removing nodes from running Spark-on-Yarn clusters when workload decreases, addressing issues like container fragmentation and shuffle data retention. Learn about innovative approaches to improve downscaling, including changes in YARN's container allocation strategy and Spark's task scheduler for better container packing. Discover enhancements to Spark driver and External Shuffle Service (ESS) that enable proactive deletion of consumed shuffle data, facilitating faster node reclamation. Gain insights into terminology, resource allocation strategies, and the impact of minimum executors on downscaling. Examine the production and consumption of shuffle data, the role of ESS, and potential solutions for long-running applications. Conclude with an overview of Spark's compute and storage disaggregation and future directions for cluster downscaling optimization.

Syllabus

Intro
Autoscaling on cloud
Upscale easy, downscale difficult
How are nodes used?
Factors affecting node downscaling
Terminology Any cluster generally comprises of following entities: • Resource Manager
Current resource allocation strategy
Example revisited with new allocation strategy
Downscale issues with Min Executors
Min executors distribution without packing
Min executors distribution with packing
How Shuffle data is produced / consumed?
External Shuffle Service
ESS at Qubole
Recap
Shuffle Cleanup • Shuffle data is deleted at the end of application by ESS
Issues with long running applications
Shuffle reuse in Spark
Downscaling a Node
Spark - Disaggregation of Compute and Storage • Mount some NFS endpoint on all the nodes of cluster • Change shuffle manager in Spark to something which can read/write shuffle from NFS mount point
Summary and Future Work

Taught by

Databricks

Reviews

Start your review of Downscaling Apache Spark Clusters - Challenges and Solutions

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.