Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Tackling Scaling Challenges of Apache Spark at LinkedIn - Infrastructure Optimization and User Productivity

Databricks via YouTube

Overview

Explore the scaling challenges and solutions for Apache Spark at LinkedIn in this 26-minute conference talk from Databricks. Dive into the company's journey of transitioning Spark from an experiment to the dominant production compute engine, handling a 3X growth in daily applications. Learn how LinkedIn tackled major infrastructure scaling bottlenecks, balanced limited compute resources with increasing demands, improved user development productivity, and boosted job efficiency. Discover optimizations made to core Spark components, improvements to cluster resource scheduling, and automation of job failure root cause analysis. Gain insights into innovative solutions like Grid Bench for performance analysis, tuning heuristics and recommendations, scaling Spark History Server, and the next-generation Spark shuffle service with Push-Merge Shuffle. Walk away with valuable takeaways on managing large-scale Spark deployments and empowering users in a rapidly growing environment.

Syllabus

Intro
Challenges of Scaling Spark
Tackling Scaling Challenges
Typical Spark User Questions
Automatic Failure Root Cause Analysis
Platform Failure Reason Breakdown
Grid Bench - Performance Analysis
Tuning Heuistics & Recommendations
Scaling Spark History Server
A Low-Latency Solution
Issues with Spark Shuffle Service
Next-gen Spark shuffle service
Push-Merge Shuffle
Fetch Merged Shuffle Data
Magnet Shuffle Service Recap
Takeaways

Taught by

Databricks

Reviews

Start your review of Tackling Scaling Challenges of Apache Spark at LinkedIn - Infrastructure Optimization and User Productivity

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.