Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Understanding How Large Language Models Generate Images - From Autoencoders to Multimodal LLMs

Neural Breakdown with AVB via YouTube

Overview

Explore the fascinating world of image generation through Large Language Models in this 18-minute educational video that breaks down complex concepts from basic to advanced topics. Starting with fundamental concepts like latent space and autoencoders, progress through detailed explanations of Vector-Quantized Variational Autoencoders (VQ-VAEs), codebooks, and modern multimodal models like Google's Gemini, Parti, and OpenAI's DallE. Learn how these text-based models successfully generate images by understanding the underlying architecture and mechanisms. Supplemented with references to essential research papers, related educational content, and clear timestamps for easy navigation through topics, making it perfect for both beginners and those looking to deepen their understanding of AI image generation technology.

Syllabus

- Intro
- Autoencoders
- Latent Spaces
- VQ-VAE
- Codebook Embeddings
- Multimodal LLMs generating images

Taught by

Neural Breakdown with AVB

Reviews

Start your review of Understanding How Large Language Models Generate Images - From Autoencoders to Multimodal LLMs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.