Multimodal Embeddings - Introduction and Use Cases with Python

Overview

Learn about multimodal embeddings in this 25-minute technical video that explores how different data types can be represented in the same vector space. Dive into the fundamentals of embeddings before exploring how contrastive learning enables the creation of multimodal embedding spaces. Follow along with Python-based demonstrations of two practical applications: zero-shot image classification and image search systems. Access complementary resources including a detailed blog post and GitHub repository with implementation code. Explore key concepts through a structured progression from basic embedding principles to advanced multimodal applications, supported by references to foundational papers like BERT, ViT, and CLIP. Gain insights into the future directions of multimodal AI while building practical understanding through hands-on examples.

Syllabus

Introduction -
What are embeddings? -
Multimodal Embeddings -
Contrastive Learning -
Contrastive Learning Details -
Example 1: 0-shot Image Classification -
Example 2: Image Search -
What's Next? -