Explore LinkedIn's derived data storage system, Venice, in this conference talk from Strange Loop 2022. Discover how Venice provides high-throughput ingestion of data from batch and stream processing jobs while offering low latency online serving. Learn about its production usage, hosting ~1500 datasets that are rewritten daily and used for AI model inference workloads. Understand Venice's role in the "People you may know" feature, which performs online deep learning with millions of reads and computations per second. Examine how client applications can utilize Venice's data plane and APIs for both eager loading and network queries. Delve into Venice's architecture, designed for massive scale and operability, supporting self-healing, linear scalability, multi-tenancy, and multi-datacenter replication. Gain insights from Felix GV, Principal Staff Engineer at LinkedIn, as he shares his experience developing Venice from its inception to its current state as a crucial component of LinkedIn's data infrastructure.
Overview
Syllabus
Intro
Derived Data Store
Hybrid Workloads
Stream Processing
Streaming Writes
Partial Updates
Correctness
Scale
Single Get Use Case
Read Compute Use Case
Eager Cache
Da Vinci Use Cases
Scalability
What is Venice for?
Taught by
Strange Loop Conference