Overview
Explore cutting-edge approaches to sensor fusion and representation learning for autonomous driving in this 33-minute conference talk. Delve into the limitations of geometry-based methods and discover TransFuser, a novel Multi-Modal Fusion Transformer that integrates image and LiDAR data using attention mechanisms. Learn about NEural ATtention fields (NEAT), an innovative representation for reasoning about semantic, spatial, and temporal aspects of driving scenes. Examine state-of-the-art performance results on the CARLA simulator, and gain insights into attention map visualizations, BEV semantics, and architectural details of these advanced driving models.
Syllabus
Covered Papers
Collaborators
Motivation
Sensors
Research Questions
Geometric Fusion lacks global context
TransFuser Architecture
Attention-based Feature Fusion
Experiments
Infraction Analysis
CARLA Leaderboard
Qualtitative Results
Attention Map Visualizations
BEV Semantics for Driving
Representation
Architecture. Encoder
Architecture: NEAT and Decoder
Architecture Sampling and Control
Attention Visualizations
Summary Conclusions
Taught by
Andreas Geiger