Explore a groundbreaking approach to data extraction in this 18-minute conference talk from OOPSLA2 2023. Delve into the concept of semantic regexes, an innovative extension of regular expressions that combines syntactic pattern matching with semantic reasoning for more effective textual data analysis. Learn about a novel learning algorithm that synthesizes semantic regexes from minimal examples, utilizing neural sketch generation and compositional type-directed synthesis. Discover how the new tool Smore implements these ideas and outperforms existing solutions, including state-of-the-art neural networks and program synthesis tools, in complex data extraction tasks. Gain insights into the practical applications and superior performance of semantic regexes compared to standard regular expressions across various textual datasets.
Overview
Syllabus
[OOPSLA23] Data Extraction via Semantic Regular Expression Synthesis
Taught by
ACM SIGPLAN