Overview
Syllabus
- Introduction
- Tatoeba and Multilingual Semantic Search
- What is Multilingual Semantic Search?
- Applications of Multilingual Semantic Search
- How do we achieve multilingual semantic search?
- A Crash Course in LLMs
- What are Vectors and Vector Embeddings?
- Distributional Hypothesis
- What are LLMs anyway?
- How does XLM-RoBERTA work?
- XLM-R: Big Multilingual Datasets
- XLM-R: Tokenization
- XLM-R: Masked Language Modeling
- Getting Doc embeddings
- Why XLM-R Isn't Enough
- Multilingual E5 for Multilingual Search Embeddings
- mE5: Training Data
- mE5: Weakly Supervised Contrastive Pretraining
- mE5: Supervised Finetuning and Dataset Distribution
- Basics of Vector Search with Pinecone
- Using Pinecone Inference
- Querying with Pinecone
- Demo Time: Language Learning with Multilingual Semantic Search
- Demo Architecture
- Live walkthrough of Notebook
- Embedding with Pinecone Inference
- Batch Embedding and Upsertion
- Query Embeddings, and cross-lingual search
- Tips and Tricks for Multilingual Semantic Search
- QA Time
- Evaluating Semantic Search
- Language Embedding Theory
- What happens for Out of Domain Languages? Transfer Theory
- Why isn't Translation Sufficient?
- Handling Negation in Queries
- Handling Cultural Nuance
- Low Resource Languages
Taught by
Pinecone