Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Migrating Pinterest Apache Spark Clusters from HDFS to S3

Databricks via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the migration process of Pinterest's critical Apache Spark clusters from HDFS to S3 in this 30-minute presentation. Dive into the motivations behind the transition, including the shift from Mesos to YARN as the resource scheduler. Learn about the technical challenges faced, such as S3 performance, consistency, and access control, and how they were addressed to match HDFS capabilities. Discover the changes made to job submission processes to accommodate differences between Mesos and YARN. Gain insights into Spark performance optimization through profiling and EC2 instance type selection. Examine the performance results and smooth migration process achieved by Pinterest. Understand key takeaways, including read-after-write consistency solutions, performance comparisons between S3 and HDFS, strategies for dealing with metadata operations, and improvements to S3Committer. Explore the benefits of S3 over HDFS, cost savings, and the current state of Spark at Pinterest.

Syllabus

Intro
Agenda
Big Data Platform
Old vs New cluster
Old Cluster: Performance Bottleneck
A Simple Aggregation Query
9k Mappers * 9k Reducers
New Cluster: Choose the right EC2 instance
Key Takeaways
Read after write consistency
How often does this happen
Solution. Considerations
Our Approach
Performance Comparison: S3 vs HDFS
Dealing with Metadata Operation
Reduce Move Operations
Multipart Upload API
The Last Move Operation
Fix Bucket Rate Limit Issue (503)
Improving S3Committer
S3 Benefit Compare to HDFS
Things We Miss in Mesos
Cost Saving
Spark at Pinterest

Taught by

Databricks

Reviews

Start your review of Migrating Pinterest Apache Spark Clusters from HDFS to S3

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.