Production-Scale Retrieval Augmented Generation for Real-Time News Distillation
Qdrant - Vector Database & Search Engine via YouTube
Overview
Learn how to build and deploy a production-scale Retrieval Augmented Generation (RAG) system for real-time news processing in this technical talk from Vector Space Talks. Discover the architecture behind AskNews.app's ability to process over 1 million daily news articles through the integration of four key open-source technologies: Flowdapt for cluster orchestration, Qdrant for vector database management, vLLM for language model serving, and TEI for embedding generation. Explore essential features like efficient batch upserting, fast vector search capabilities, filtering mechanisms, and multi-node scaling while understanding how these tools enable real-time news distillation and enriched chat experiences for thousands of simultaneous users. Gain insights into why modern startups leveraging these foundational tools have competitive advantages over established tech companies, and learn practical implementation strategies for deploying production-ready RAG systems at scale.
Syllabus
Intro
Robert Caulk
Context Engineering
Text embedding inference
Microservice orchestration
Startups vs incumbents
Timestamp filtering
Database retrieval evaluation
Allinone options
Recommendations
Taught by
Qdrant - Vector Database & Search Engine