Overview
Explore the rise of vector databases and their significance in handling unstructured data in this 26-minute conference talk by Frank Liu and Charles Xie from Zilliz. Gain insights into the challenges of storing and analyzing rapidly increasing volumes of unstructured data, including text, images, and IoT streams. Learn about embeddings as a solution for representing semantic content and the need for cloud-native, distributed vector databases. Discover real-world production use cases of Milvus, the popular open-source vector database, and understand the potential pitfalls in integrating it into data/ML stacks. Delve into the concept of approximate nearest neighbor search, the complexities of purpose-built databases, and the role of vector databases in addressing challenges like hallucinations in AI models. Conclude with key takeaways on the future of vector databases and their impact on managing unstructured data in the mobile/IoT era.
Syllabus
Intro
Speaker
zilliz
What is Unstructured Data?
The Evolution of Data
Digits
Approximate Nearest Neighbor Search
Vector Database Overview
Why Purpose-built?
Purpose-built is Complex
ChatGPT Craziness
GPTs are Stochastic
Hallucination Example
The Solution to Hallucination
The CVP Framework
Using Vectors to Represent Data
Key Takeaways
Taught by
Linux Foundation