Towards High-Fidelity Open-Vocabulary 3D Scene Understanding

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore cutting-edge research on open-vocabulary 3D scene understanding in this insightful conference talk. Delve into Transformer-based networks and their applications in various 3D scene understanding tasks, including object segmentation, human body part segmentation, and vectorized floorplan reconstruction. Discover the limitations of fully-supervised models in real-world scenarios and learn about innovative open-vocabulary approaches that leverage foundation models like CLIP and SAM. Gain valuable insights into the current challenges and future directions of this rapidly evolving field. Presented by Francis Engelmann, a PostDoc at ETH Zurich and visiting researcher at Google, this talk offers a comprehensive overview of recent advancements in 3D scene understanding and their potential impact on computer vision applications.

Syllabus

Francis Engelmann: Towards High-Fidelity Open-Vocabulary 3D Scene Understanding

Taught by

Montreal Robotics

Reviews

Start your review of Towards High-Fidelity Open-Vocabulary 3D Scene Understanding

Taught by

3D Scenes - Understanding and Rendering

KITTI-360 - A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D

Never Stop Learning.