Visual Question Answering: Grounded Systems and Transformer Capsules
University of Central Florida via YouTube
Overview
Syllabus
Intro
Grounded Visual Question Answering
Limitations of Existing VQA Systems
Grounded VQA Systems
Problem Setup
Transformers with Capsules
Approach
Capsule-based Tokens
Input to Intermediate Transformer layers
Text-based Residual Connection
Pre-training Tasks
Masked Language Modeling (MLM)
Image Text Matching
Pre-training Datasets
Fine-tuning on Downstream Task
Qualitative comparison - GQA
Evaluation Metrics
Results - GQA
Conclusion and Future Work
Taught by
UCF CRCV