Overview
Explore pattern matching at scale using finite state machines in this conference talk from Strange Loop. Dive into the challenges of locating data that fits patterns within big data from non-homogeneous sources, focusing on Netflix's approach to improving the sign-up experience through experimentation. Learn about a framework for expressing user journey patterns translated into a Non-Deterministic Finite State Machine, inspired by Ken Thompson's 1968 CACM paper. Discover how this state machine is applied across billions of events using Spark, and how it's made accessible to Data Engineers, Scientists, and Analysts. Gain insights into the development of the "Conduit" framework, including design decisions and challenges encountered. The talk covers topics such as graph data models, wildcards, events in sequence, abstract syntax trees, regular expressions, Apache Spark optimizations, and matching multiple patterns simultaneously. Presented by Ajit Koti and Rashmi Shamprasad, experienced engineers from Netflix's Growth Data Engineering team, this session offers valuable knowledge for those interested in large-scale distributed systems, big data solutions, and data engineering.
Syllabus
Introduction
Example
Challenges
Common Solutions
Graph Data Models
Requirements
Demo
Questions
Wildcard
Events
Events in Sequence
Results
Who did that
Changing the expression
Summary statistics
Conclusion
Ajith Cody
Guiding Principles
Building Blocks
Abstract Syntax Trees
Finite State Machine
Regular Expressions
Syntax Tree
State Machine
Bounded Repeat
Methodology
Un unbounded repeat
Match state
Evaluation
Plan Selection
Provide Payment
Login Event
Apache Spark
Map Partition
Optimizations
Matching multiple patterns simultaneously
Taught by
Strange Loop Conference