Emu Video Generation - From MAE Pre-training to Multimodal Embeddings
Aleksa Gordić - The AI Epiphany via YouTube
Overview
Explore a 55-minute talk featuring Ishan Misra from Meta discussing self-supervised learning and multimodal data, with a focus on the recent Emu Video project. Dive into topics including the effectiveness of MAE pre-training for billion-scale pretraining, ImageBind's unified embedding approach, and the Emu Video generation model. Learn about qualitative comparisons and human evaluations of the generated videos, and gain insights from the Q&A session. Discover cutting-edge developments in computer vision, multimodal AI, and video generation techniques through this comprehensive discussion.
Syllabus
00:00 - Intro
00:42 - Hyperstack GPUs sponsored
02:23 - Talk intro
04:42 - The effectivenes of MAE pre-training for billion scale pretraining
12:58 - ImageBind: One Embedding to Rule them All
29:26 - Emu Video
50:39 - Qualitative Comparisons, human eval
54:30 - Q&A / outro
Taught by
Aleksa Gordić - The AI Epiphany