Mastering Retrieval for LLMs - BM25, Fine-tuned Embeddings, and Re-Rankers

Overview

Dive into an extensive video tutorial on mastering retrieval techniques for Large Language Models (LLMs). Explore various methods including BM25, fine-tuned embeddings, and re-rankers to enhance LLM performance. Begin with an overview of baseline performance without retrieval, then delve into document chunking techniques. Learn about BM25 and semantic search methods, comparing cosine and dot product similarity. Generate chunks and embeddings, and analyze the performance differences between BM25 and similarity retrieval. Discover the process of fine-tuning embeddings and encoders, including dataset preparation and training. Examine the impact of fine-tuning on performance. Investigate re-rankers, including cross-encoders and LLM re-rankers, and assess their effectiveness. Conclude with valuable tips for implementing these advanced retrieval techniques in your LLM projects.

Syllabus

Mastering Retrieval RAG for LLMs
Video Overview
Baseline Performance with No Retrieval
Document Chunking - Naive vs Sentence based
BM25
Semantic / Vector / Embeddings Search
Cosine vs Dot Product Similarity
Generating Chunks and Embeddings
Running BM25 and Similarity Retrieval
Performance with BM25 vs Similarity
Fine-tuning embeddings / encoders
Preparing fine-tuning datasets
Embeddings Training Continued
Performance after Fine-tuning
Re-rankers
: Cross-encoders
LLM re-rankers
Re-ranking performance
Final Tips