Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

NÜWA - Visual Synthesis Pre-training for Neural Visual World Creation

Yannic Kilcher via YouTube

Overview

Explore a comprehensive explanation of the NÜWA research paper, which introduces a unified multimodal pre-trained model for visual synthesis tasks. Delve into the architecture's ability to process text, images, and videos using a 3D transformer encoder-decoder framework and the novel 3D Nearby Attention mechanism. Learn about the model's applications in text-to-image generation, text-guided video manipulation, and sketch-to-video tasks. Examine the shared latent space creation, latent representation transformation, and pre-training objectives. Analyze experimental results across eight different visual generation tasks and gain insights into the model's state-of-the-art performance and zero-shot capabilities.

Syllabus

- Intro & Outline
- Sponsor: ClearML
- Tasks & Naming
- The problem with recurrent image generation
- Creating a shared latent space w/ Vector Quantization
- Transforming the latent representation
- Recap: Self- and Cross-Attention
- 3D Nearby Self-Attention
- Pre-Training Objective
- Experimental Results
- Conclusion & Comments

Taught by

Yannic Kilcher

Reviews

Start your review of NÜWA - Visual Synthesis Pre-training for Neural Visual World Creation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.