Understanding Text-to-Video Diffusion Models - From Core Concepts to Latest Developments
Neural Breakdown with AVB via YouTube
Overview
Explore a comprehensive 13-minute video lecture that delves into the cutting-edge world of Video Diffusion Generative AI models. Learn about the fundamental challenges and solutions in text-to-video generation while examining influential papers from major tech companies including Google's Imagen, Meta's Make-a-video, Nvidia's Video Latent Diffusion Model, and OpenAI's SORA. Master essential concepts of Image Diffusion models, including Forward and Reverse Diffusion, UNet, convolution, and diffusion transformers, while gaining insights into the evolution of video generation technology through key developments like VDM (2022), Factorized 3D Unet models, and various industry implementations. Access supplementary materials including related videos on conditional image diffusion models, latent space, and LLM image generation, along with an extensive collection of research papers and technical resources for deeper understanding.
Syllabus
- Intro
- Text to Image Conditional Diffusion Models
- Challenges with Video Diffusion Models
- VDM 2022
- Factorized 3D Unet models
- Meta Make A Video
- Google Imagen Video
- Nvidia Video LDM
- OpenAI SORA
Taught by
Neural Breakdown with AVB