Overview
Explore probabilistic data structures in Python for efficient handling of large-scale data in this PyCon US talk. Discover how to count distinct items from a data firehose and determine if an item has been seen before, while balancing accuracy with speed and resource efficiency. Learn about the Hyperloglog and Bloom Filter, their high-level functioning, and practical applications in Python. Gain insights into scenarios where absolute accuracy may be impractical and how these structures provide fast, scalable solutions for problems like counting social media likes or tracking user interactions on websites. Access the accompanying GitHub repository and slides for hands-on examples and further study.
Syllabus
Introduction
The Problem
Probabilistic Data Structures
Hyperlog Log
Hyperlog Log Algorithm
Hyperlog Log Example
Bloom Filter
Python Code
When to Use
Taught by
PyCon US