Real-Time Data Streaming Architectures for Generative AI

Overview

Explore the evolution of data processing architectures for Generative AI in this 12-minute talk by Emily Ekdahl at the MLOps.community event. Discover how real-time data streaming solutions using Apache Kafka and Apache Flink are revolutionizing the way organizations handle large language models and GenAI applications. Learn about the benefits of shifting from batch processing and lakehouse models to real-time data products, enabling more responsive and context-aware AI applications. Gain insights into integrating streaming data with real-time model inference and the Retrieval Augmented Generation (RAG) method to reduce latency and improve LLM response accuracy. Examine key architectural patterns, potential challenges, and best practices for transitioning to real-time data streaming architectures, illustrated with real-world examples of integrating Kafka and Flink with vector databases for advanced NLP applications.