Overview
Syllabus
Intro
Challenge for Automatic Speech Recognition
A Perspective on Spoken Language Processing Most (-9%) of the worlds languages have not been addressed by resource and expert intensive supervised
Crossing the Vision Language Boundary
Learning an Audio/Visual Embedding Space?
Joint Audio-Visual Analysis Architecture
Crowdsourcing Audio-Visual Data
Evaluation: Image and Search Annotation
Evaluating via Image Search
Evaluating via Image Annotation
Time-varying Audio-Visual Affiliation
Audio-Visual Grounding for Localization
Examples of Audio-Visual Clusters
Cluster Analysis
Spatial Distribution of Speech Clusters
Final Message
Taught by
MITCBMM