Multimodal Medical Research of Vision and Language - Jean-Benoit Delbrouck

Overview

Explore a 50-minute conference talk on multimodal medical research at the intersection of vision and language, presented by Jean-Benoit Delbrouck at Stanford University. Delve into emerging multimodal medical tasks such as Medical Visual Question Answering, Radiology Report Generation, and Summarization using x-rays. Learn how multimodal architectures and pre-training techniques can enhance results in these areas. Discover insights from Delbrouck's research, which applies proven and novel methods to multimodal medical tasks. Gain understanding of supervised visual linguistic learning, VQA-MED, multilingual machine translation, and unsupervised machine translation in the context of medical imaging and natural language processing. Engage with the presentation's outline, covering topics like visual sensitivity, mixing hypotheses, contrastive pretraining, and systematic error discovery. Participate in the interactive discussion and Q&A session following the talk, part of the MedAI Group Exchange Sessions at Stanford University.

Syllabus

Introduction
Presentation
Outline
Supervised visual linguistic learning
VQAMED
Motivation
Architecture
Visual question answering
Multilingual machine translation
Question
Results
Visual sensitivity
Mixing hypotheses
Contrastive pretraining
Systematic error discovery
Constructive learning
Translation
Unsupervised Machine Translation
Unaligned Machine Translation
Outcome
Questions