Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Building a Production Scale, Totally Private, OSS RAG Pipeline with DBRX, Spark, and LanceDB

Databricks via YouTube

Overview

Discover how to construct a production-scale, fully private, open-source RAG pipeline using DBRX, Spark, and LanceDB in this informative 22-minute conference talk. Learn about the challenges enterprises face when implementing AI in production, particularly regarding data security and the need to use external services for LLMs, embedding models, and vector databases. Explore how the latest release of DBRX offers a breakthrough in open-source model quality, providing enterprises with a viable option for high-quality, self-hosted generative AI responses. Gain insights into LanceDB, an open-source solution that enables real-time serving for billion-scale embedding datasets with lower resource requirements than alternatives. Understand how LanceDB utilizes the Lance columnar format for data storage, allowing large-scale updates to be written quickly via Lance's Spark DataSource. Discover the versatility of using the same dataset for both offline analytics and online serving in LanceDB for AI retrieval in RAG, agents, and more. Learn about LanceDB's embedding function registry and its ability to target custom embedding models served from MLFlow without sending data off-premises. Explore how combining Spark, DBRX, and LanceDB enables the creation of a completely private generative AI pipeline within the lakehouse environment.

Syllabus

Building a Production Scale, Totally Private, OSS RAG Pipeline with DBRX, Spark, and LanceDB

Taught by

Databricks

Reviews

Start your review of Building a Production Scale, Totally Private, OSS RAG Pipeline with DBRX, Spark, and LanceDB

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.