Learn advanced techniques and best practices for deploying and monitoring large language models in production environments.
Overview
Syllabus
Introduction
- Deploying LLMs for production
- Working in Google Colab
- Overview of deployment options
- Deploying via APIs
- Using fine-tuned models for deployment
- Custom models: Building and deployment
- Understanding API limitations
- Strategies to handle endpoint uptime limitations
- Mitigating latency issues in LLM deployment
- Challenge: API limitations for LLM deployment
- Solution: API limitations for LLM deployment
- Vector databases for LLM deployment
- Agents in LLM deployment
- Chains in LLM deployment
- Challenge: Deploy a simple RAG application using an API
- Solution: Deploying a simple RAG application using an API
- Introduction to LLM performance monitoring
- Addressing hallucinations in LLMs
- Prompt management for LLM deployment
- Evaluating LLMs in production
- Challenge: Evaluating LLM systems
- Solution: Evaluating LLM systems
- Security considerations for LLMs in production
- Balancing costs and performance in LLM deployment
- Strategies for cost-effective LLM deployment
- Challenge: Estimating costs of an LLM API
- Solution: Estimating costs of an LLM API
- Next steps
Taught by
Soham Chatterjee and Archana Vaidheeswaran