Building Scalable Multimodal Search Applications with Python

Overview

Explore the world of scalable multimodal search applications in this 25-minute conference talk from EuroPython 2024. Dive into the realm of multimodal data processing, covering spoken language, gestures, and various sensory inputs used in robotics. Learn how to leverage open-source multimodal embedding models and large generative multimodal models to perform cross-modal search and multimodal retrieval augmented generation (MM-RAG) at a billion-object scale. Discover techniques for enabling real-time cross-modal retrieval, allowing LLMs to reason over enterprise multimodal data. Gain insights into scaling the usage of multimodal embedding and generative models in production environments through live code demonstrations and practical examples.