Learning to See the Physical World

Overview

Explore cutting-edge research on physical scene understanding in this Stanford University lecture by Assistant Professor Jiajun Wu. Delve into the development of versatile, data-efficient, and generalizable machines that learn to see, reason about, and interact with the physical world. Discover how integrating knowledge from computer graphics, physics, and language with deep learning can create approximate simulation engines that exploit the generic, causal structure behind the world. Learn about building infant machines, scaling up data, and addressing challenges in inversion, intermediate representation, and physical modeling. Examine concepts such as learning to augment, generation of average level networks, augmented graphics, and dynamic engines. Gain insights into key principles, collaborations, and real-world applications through examples and audience questions. Understand how this research aims to go beyond pattern recognition, enabling machines to explain, reconstruct, predict, and plan based on visual input.

Syllabus

Introduction
Building infant machines
Scaling up data
Challenges
Inversion
Intermediate Representation
Physical Model
Recap
Learning to Augment
Generation of Average Level Networks
Augmented Graphics
Dynamic Engine
Physics Engine
Summary
Example
Key principle
Collaborators
Questions
Dynamics
Audience Questions
Object as Parts
Other Physical Properties