Completed
Big Data Platform
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Migrating Pinterest Apache Spark Clusters from HDFS to S3
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Agenda
- 3 Big Data Platform
- 4 Old vs New cluster
- 5 Old Cluster: Performance Bottleneck
- 6 A Simple Aggregation Query
- 7 9k Mappers * 9k Reducers
- 8 New Cluster: Choose the right EC2 instance
- 9 Key Takeaways
- 10 Read after write consistency
- 11 How often does this happen
- 12 Solution. Considerations
- 13 Our Approach
- 14 Performance Comparison: S3 vs HDFS
- 15 Dealing with Metadata Operation
- 16 Reduce Move Operations
- 17 Multipart Upload API
- 18 The Last Move Operation
- 19 Fix Bucket Rate Limit Issue (503)
- 20 Improving S3Committer
- 21 S3 Benefit Compare to HDFS
- 22 Things We Miss in Mesos
- 23 Cost Saving
- 24 Spark at Pinterest