Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

DataSketches: A Production Quality Sketching Library for Big Data Analysis

Databricks via YouTube

Overview

Explore the world of sketching algorithms for big data analysis in this 29-minute talk from Databricks. Dive into the challenges of processing massive datasets and learn how specialized algorithms called 'sketches' can provide accurate approximate answers to problem queries. Discover how this technology has helped Yahoo reduce data processing times from days to minutes and enabled subsecond queries on real-time platforms. Get an introduction to DataSketches, an open-source library of core sketching algorithms designed for large production analysis and AI systems. Understand the properties of sketches, including query space partitioning, speed, and time windowing. Learn about the benefits of sketching, such as lower system costs and improved scalability. Gain insights into the future of sketching algorithms and their potential impact on big data analysis.

Syllabus

Introduction
Challenges with Big Data
Common Big Data Queries
Difficulty
Parallelization
Last 30 Days
The Sketch
Properties
Major Properties
Query Space
Partitioning
Query Speed
Time Windowing
Example
Lower System Cost
Team
Mission
Family Groups
The Future

Taught by

Databricks

Reviews

Start your review of DataSketches: A Production Quality Sketching Library for Big Data Analysis

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.