Overview
Explore how RocksDB is utilized in LinkedIn's FollowFeed and Apache Samza in this 29-minute conference talk from the RocksDB Meetup in December 2014. Gain insights into the backend system powering LinkedIn's feed applications and the distributed stream processing framework leveraging Apache Kafka and YARN. Discover the challenges of stream processing, the assembly of page view streams, and the importance of YARN. Learn about remote stores, stateful tasks, and the Samza Store API. Examine usage patterns for stateful processing, RocksDB's write and read performance, and the Timeline Storage cluster. Delve into FollowFeed's write architecture, feed query paths, and database structure. Understand backup strategies, RocksDB usage specifics, and overall performance metrics. Benefit from LinkedIn's experience with RocksDB implementation in their systems.
Syllabus
Intro
Outline
Stream Processing is Hard
Assembling Page View
Streams
Jobs
Data flow
Why YARN?
Remote Stores
Problems with remote store
Stateful Tasks
Samza Store API
Usage patterns for stateful processing
Rocks DB Write Performance
Read Performance
Timeline Storage cluster
FollowFeed write architecture Timeline storage
Feed query path for Person:1
Timeline storage DB structure
Backups
RocksDB usage
RocksDB Performance
RocksDB experience
Taught by
Meta Developers