Extracting Structured Data from Multi-Modal Input: Neural and Neuro-Symbolic Approaches
Institute for Pure & Applied Mathematics (IPAM) via YouTube
Overview
Watch a technical lecture exploring methods for extracting structured data from images containing both visual and textual elements like tables, charts, and maps. Discover how programs can serve as interpretable representations for accurately reproducing multi-modal content while preserving semantic relationships. Learn about various approaches including neural networks, neuro-symbolic systems, and large language models for translating complex visual-textual data into code, with a particular focus on table structure extraction. Examine the strengths and limitations of different techniques, implementation challenges, and how programmatic representations enable precise manipulation and generalization across datasets. Delivered at UCLA's Institute for Pure & Applied Mathematics during their Naturalistic Approaches to Artificial Intelligence Workshop, this 27-minute presentation demonstrates how to effectively process and digitize information from sources that seamlessly integrate vision and language.
Syllabus
Aishni Parab - Extracting Structured Data from Multi-Modal Input - IPAM at UCLA
Taught by
Institute for Pure & Applied Mathematics (IPAM)