No, Maybe and Close Enough - Using Probabilistic Data Structures in Python

Overview

Explore probabilistic data structures in Python for efficient handling of large-scale data in this PyCon US talk. Discover how to count distinct items from a data firehose and determine if an item has been seen before, while balancing accuracy with speed and resource efficiency. Learn about the Hyperloglog and Bloom Filter, their high-level functioning, and practical applications in Python. Gain insights into scenarios where absolute accuracy may be impractical and how these structures provide fast, scalable solutions for problems like counting social media likes or tracking user interactions on websites. Access the accompanying GitHub repository and slides for hands-on examples and further study.