Understanding How Large Language Models Generate Images - From Autoencoders to Multimodal LLMs
Neural Breakdown with AVB via YouTube
Overview
Explore the fascinating world of image generation through Large Language Models in this 18-minute educational video that breaks down complex concepts from basic to advanced topics. Starting with fundamental concepts like latent space and autoencoders, progress through detailed explanations of Vector-Quantized Variational Autoencoders (VQ-VAEs), codebooks, and modern multimodal models like Google's Gemini, Parti, and OpenAI's DallE. Learn how these text-based models successfully generate images by understanding the underlying architecture and mechanisms. Supplemented with references to essential research papers, related educational content, and clear timestamps for easy navigation through topics, making it perfect for both beginners and those looking to deepen their understanding of AI image generation technology.
Syllabus
- Intro
- Autoencoders
- Latent Spaces
- VQ-VAE
- Codebook Embeddings
- Multimodal LLMs generating images
Taught by
Neural Breakdown with AVB