Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore cutting-edge research on open-vocabulary 3D scene understanding in this insightful conference talk. Delve into Transformer-based networks and their applications in various 3D scene understanding tasks, including object segmentation, human body part segmentation, and vectorized floorplan reconstruction. Discover the limitations of fully-supervised models in real-world scenarios and learn about innovative open-vocabulary approaches that leverage foundation models like CLIP and SAM. Gain valuable insights into the current challenges and future directions of this rapidly evolving field. Presented by Francis Engelmann, a PostDoc at ETH Zurich and visiting researcher at Google, this talk offers a comprehensive overview of recent advancements in 3D scene understanding and their potential impact on computer vision applications.