Production-Scale Retrieval Augmented Generation for Real-Time News Distillation

Overview

Learn how to build and deploy a production-scale Retrieval Augmented Generation (RAG) system for real-time news processing in this technical talk from Vector Space Talks. Discover the architecture behind AskNews.app's ability to process over 1 million daily news articles through the integration of four key open-source technologies: Flowdapt for cluster orchestration, Qdrant for vector database management, vLLM for language model serving, and TEI for embedding generation. Explore essential features like efficient batch upserting, fast vector search capabilities, filtering mechanisms, and multi-node scaling while understanding how these tools enable real-time news distillation and enriched chat experiences for thousands of simultaneous users. Gain insights into why modern startups leveraging these foundational tools have competitive advantages over established tech companies, and learn practical implementation strategies for deploying production-ready RAG systems at scale.

Syllabus

Intro
Robert Caulk
Context Engineering
Text embedding inference
Microservice orchestration
Startups vs incumbents
Timestamp filtering
Database retrieval evaluation
Allinone options
Recommendations

Taught by

Qdrant - Vector Database & Search Engine

Reviews

Start your review of Production-Scale Retrieval Augmented Generation for Real-Time News Distillation

Taught by

Retrieval Augmented Generation - Techniques and Applications

State-of-the-Art Retrieval Augmented Generation at Scale in Spark NLP

Building Chatbots with Retrieval-Augmented Generation Techniques

Improved Retrieval Augmented Generation with ALL-SORT - Advanced Techniques

Retrieval Augmented Generation - Understanding Semantic Search and LLM-Based Text Generation

Introduction to Retrieval Augmented Generation (RAG) with LLMs - Part 1

9 Best Microservices Courses for 2024: Scalability, Block by Block

Never Stop Learning.