Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the world of sketching algorithms for big data analysis in this 29-minute talk from Databricks. Dive into the challenges of processing massive datasets and learn how specialized algorithms called 'sketches' can provide accurate approximate answers to problem queries. Discover how this technology has helped Yahoo reduce data processing times from days to minutes and enabled subsecond queries on real-time platforms. Get an introduction to DataSketches, an open-source library of core sketching algorithms designed for large production analysis and AI systems. Understand the properties of sketches, including query space partitioning, speed, and time windowing. Learn about the benefits of sketching, such as lower system costs and improved scalability. Gain insights into the future of sketching algorithms and their potential impact on big data analysis.
Syllabus
Introduction
Challenges with Big Data
Common Big Data Queries
Difficulty
Parallelization
Last 30 Days
The Sketch
Properties
Major Properties
Query Space
Partitioning
Query Speed
Time Windowing
Example
Lower System Cost
Team
Mission
Family Groups
The Future
Taught by
Databricks