Overview
Explore realtime distributed computing at scale using Apache Storm and Streamparse in this 30-minute EuroPython Conference talk. Learn how to handle large data pipelines with low latency and high availability while processing thousands of items per second in pure Python. Discover the basics of Apache Storm, its elegant solution for realtime distributed computing, and how Streamparse enables writing Storm components in Python. Gain insights into Parsely's production use of Storm to manage billions of realtime events monthly. Understand Storm's architecture, including spouts, bolts, and topologies, as well as its advantages over other Python data streaming solutions. By the end of the talk, grasp the fundamentals of Apache Storm, its applications, and how to implement it using Streamparse in high-availability, low-latency production environments.
Syllabus
Intro
Queues and workers
Distributed Implications
A Storm Is Coming
Elegant data dashboards
Python can't do this
Storm Abstractions
Storm Topology Example
Types of sources for a spout
Streams, Grouping and Parallelism
Tuple Tree
Nimbus and Storm UI
So, Storm is pretty amazing!
Multi-Lang Protocol
Enter Streamparse
Running and debugging
Submitting to remote cluster Single comment
Bolts for Real-Time ETL
Types of Bolts
Overhead considerations
Topology Considerations
Questions?
Taught by
EuroPython Conference