Extracting Structured Data from Multi-Modal Input: Neural and Neuro-Symbolic Approaches

Overview

Watch a technical lecture exploring methods for extracting structured data from images containing both visual and textual elements like tables, charts, and maps. Discover how programs can serve as interpretable representations for accurately reproducing multi-modal content while preserving semantic relationships. Learn about various approaches including neural networks, neuro-symbolic systems, and large language models for translating complex visual-textual data into code, with a particular focus on table structure extraction. Examine the strengths and limitations of different techniques, implementation challenges, and how programmatic representations enable precise manipulation and generalization across datasets. Delivered at UCLA's Institute for Pure & Applied Mathematics during their Naturalistic Approaches to Artificial Intelligence Workshop, this 27-minute presentation demonstrates how to effectively process and digitize information from sources that seamlessly integrate vision and language.

Syllabus

Aishni Parab - Extracting Structured Data from Multi-Modal Input - IPAM at UCLA

Taught by

Institute for Pure & Applied Mathematics (IPAM)

Reviews

Start your review of Extracting Structured Data from Multi-Modal Input: Neural and Neuro-Symbolic Approaches

Taught by

From NLP to NLU: Why We Need Varied, Comprehensive, and Stratified Knowledge - Neuro-Symbolic AI

Neuro-Symbolic Graph Reasoning and Knowledge Graph Analysis - From Theory to Implementation

Compositional Approaches to Modelling Language and Concepts

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training for Video-text Retrieval

Obtaining Answers from Social Media Data Using Neural Databases

Extracting Structured Data from Images with OCR and LLM

Never Stop Learning.