Generalization to Video Capsules - From Convolutional to Video Capsule Networks

Overview

Explore generalization techniques for video capsules in this comprehensive lecture by Kevin Duarte from the University of Central Florida. Delve into advanced topics in computer vision, including capsule networks, video object segmentation, and multi-modal approaches. Learn about the computational costs of capsule voting, convolutional capsule layers, and capsule pooling. Discover the architecture and training of VideoCapsuleNet for action detection and localization. Examine synthetic dataset experiments and qualitative results for entire videos. Investigate the combination of video and text modalities using capsule routing algorithms. Study semi-supervised video object segmentation techniques, including attention routing and memory modules. Analyze quantitative results, speed performance, and the effects of various modules on object segmentation tasks.

Syllabus

Intro
Computational Cost of Capsule Voting
Conventional Convolutional Layers
Convolutional Capsule Layers
Capsule Pooling
Video Capsule Networks
Video Action Detection Networks
VideoCapsuleNet Architecture
Coordinate Addition
Capsule Masking
VideoCapsuleNet Training
Action Localization Accuracy
Qualitative Results - Entire Videos
Synthetic Dataset Experiments
Summary
Capsules in multiple modalities
Combining Video and Text
Overall Approach
Multi-modal Capsule Routing Algorithm
Full Architecture
Sentence Encoder
Merging Modalities and Masking
Upsampling Network
Quantitative Results - A2D Dataset
Semi-Supervised Video Object Segmentation
VOS using Capsules
Attention Routing
Video Encoder
Frame Encoder with Memory Module
Conv Capsule Layer and Decoder Network
Objective Function
Quantitative Results -Speed Analysis
Qualitative Results - Single Object
Qualitative Results - Multiple Objects
Effect of Memory Module
Effect of the Zooming Module
Effect of Zooming Module