Overview
Syllabus
Intro
Interpretability in different stages of Al evolution
Approaches for visual explanations
Visualize any decision
Visualizing Image Captioning models
Visualizing Visual Question Answering models
Analyzing Failure modes
Grad-CAM for predicting patient outcomes
Extensions to Multi-modal Transformer based Architectures
Desirable properties of Visual Explanations
Equalizer
Biases in Vision and Language models
Human Importance-aware Network Tuning (HINT)
Contrastive Self-Supervised Learning (SSL)
Why SSL methods fail to generalize to arbitrary images?
Does improved SSL grounding transfer to downstream tasks?
CAST makes models resilient to background changes
VQA for visually impaired users
Sub-Question Importance-aware Network Tuning
Explaining Model Decisions and Fixing them via Human Feedback
Grad-CAM for multi-modal transformers
Taught by
Stanford MedAI