Overview
Syllabus
Continuing discussion around the recursive crawler
GitHub CoPilot, and the tasks it excels at
What do we do with the HTML we extract? How the seeder works
The different types of document splitters you can use
embedDocument and how it works
Why do we split documents when working with a vector database?
Problems that occur if you don’t split documents
Proper chunking improves relevance
You still need to tweak and experiment with your chunk parameters
Chunked upserts
Chat endpoint - how we use the context at runtime
Injecting context in LLMs prompts
Is there a measurable difference in where you put the context in the prompt?
Reviewing the end to end RAG workflow
LLMs conditioned us to be okay with responses taking time being pretty slow!
Cool UX anecdote around what humans consider too long
You have an opportunity to associate chunks with metadata
UI cards - selecting one to show it was used as context in response
How we make it visually clear which chunks and context were used in the LLM
Auditability and why it matters
Testing the live app
Outro chatting - Thursday AI sessions on Twitter spaces
Review GitHub project - this is all open-source!
Inaugural stream conclusion
Vim / VsCode / Cursor AI IDE discussion
Setting up Devtools on Mac OSX
Upcoming stream ideas - Image search / Pokemon search
Taught by
Pinecone