Private RAG with Open Source and Custom LLMs - BentoML and OpenLLM

Overview

Explore practical considerations for building private Retrieval-Augmented Generation (RAG) applications using open source and custom LLMs in this informative talk by Chaoyu Yang, Founder and CEO at BentoML. Discover the benefits of self-hosting open source LLMs or embedding models for RAG, learn common best practices for optimizing inference performance, and understand how BentoML can be used to build RAG as a service. Gain insights into seamlessly chaining language models with various components, including text and multi-modal embedding, OCR pipelines, semantic chunking, classification models, and reranking models. Additionally, learn about OpenLLM and its role in LLM deployments. This 51-minute session, presented by LLMOps Space, offers valuable knowledge for practitioners interested in deploying LLMs into production.