Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the development of Mnemosyne, a distributed indexing layer for big data, in this 52-minute Devoxx conference talk. Dive into the challenges faced by Brandwatch Audiences product and learn how they built a system capable of handling hundreds of millions of social network profiles, billions of posts, and tens of billions of follower graph edges in real-time. Discover the fusion of succinct data structures, free text search, in-memory computing with JVM, CUDA, and Kafka to create a high-performance solution. Gain insights into CAP theorem trade-offs, brute force approaches versus indexes, data structures for sorting billions of records in milliseconds, GPU problem-solving, and JVM implementation. Understand key design principles, data ingestion and storage techniques, in-memory computing concepts, and the use of bitmaps and sparse data structures. Evaluate the project's success and consider the feasibility and advisability of undertaking similar endeavors in your own work.
Syllabus
Intro
PROBLEMS
WHAT TO DO?
OPTIONS?
DESIGN PRINCIPLES
KNOW YOUR DATA.
DON'T REINVENT THE WHEEL.
KEEP IT SIMPLE.
KNOW YOUR USERS.
KNOW YOUR HARDWARE.
DATA INGESTION AND STORAGE
WHAT TO STORE?
HOW TO BE RESILIENT?
HOW TO SCALE?
COMPACTED TOPICS
WINDOWED DATA
DATA MODEL
IN-MEMORY COMPUTING
RAM IS VOLATILE
ALGEBRA OF SETS
BITMAPS
BACK TO MNEMOSYNE
SPARSITY
AGGREGATIONS
WAS IT WORTH IT?
CAN YOU DO IT?
SHOULD YOU DO IT?
THANK YOU!
Taught by
Devoxx