Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Stable Video Diffusion: Model Architecture and Training Pipeline

AI Bites via YouTube

Overview

Explore a detailed 14-minute video analysis of Stability AI's groundbreaking Stable Video Diffusion model, examining the architecture, training procedures, and results from their research paper. Learn about the innovative three-stage training process specifically designed for video generation models, capable of producing videos at 14 and 25 frames with customizable frame rates between 3 and 30 frames per second. Delve into crucial components including image pretraining, video curation stages, the LVD dataset development, filtering mechanisms, optical flow implementation, synthetic caption generation, and OCR detection. Understand the significance of ablation studies, high-quality fine-tuning processes, and see practical applications through text-to-video and image-to-video examples that demonstrate how this foundation model outperforms leading closed models from competitors like Runway and Pika Labs.

Syllabus

- Intro
- Model Architecture
- Training Stages
- Image Pretraining Stage
- Motivation for Image Pretraining
- Video Curation Stage
- Video data curation pipeline
- LVD Dataset
- Filtering Mechanisms
- Optical Flow
- Synthetic Captions
- OCR Detection
- LVD dataset summarised
- Ablation studies
- High quality fine-tuning
- Base Model
- Tex-to-video example
- Image-to-video example
- Conclusion

Taught by

AI Bites

Reviews

Start your review of Stable Video Diffusion: Model Architecture and Training Pipeline

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.