Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling Vector Database Usage Without Breaking the Bank - Quantization and Adaptive Retrieval

Toronto Machine Learning Series (TMLS) via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to optimize vector search deployment costs and performance in this technical talk from the Toronto Machine Learning Series. Explore practical techniques for scaling vector databases efficiently, focusing on quantization methods and adaptive retrieval strategies. Discover how to perform real-time billion-scale vector searches on modest hardware through various quantization approaches including product, binary, scalar, and matryoshka quantization. Master the implementation of adaptive retrieval, which combines fast low-accuracy searches using compressed vectors with targeted high-accuracy rescoring. Understand how to achieve significant memory cost reductions (up to 32x) while maintaining strong retrieval performance with only minimal accuracy trade-offs in RAG applications. Gain valuable insights from Senior ML Developer Advocate Zain Hassan on balancing memory costs, latency performance, and retrieval accuracy for production-level vector search deployments.

Syllabus

Scaling Vector Database Usage Without Breaking the Bank Quantization and Adaptive Retrieval

Taught by

Toronto Machine Learning Series (TMLS)

Reviews

Start your review of Scaling Vector Database Usage Without Breaking the Bank - Quantization and Adaptive Retrieval

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.