Overview
Explore the economics of Large Language Models (LLMs) in production through this insightful conference talk from the LLMs in Production Conference. Dive deep into the costs involved in building LLM-based applications, comparing expenses for RAG versus fine-tuning approaches and open-source versus commercial LLMs. Discover eye-opening examples, such as the $360,000 price tag for summarizing Wikipedia using GPT-4's 8k context window. Gain valuable insights into optimizing LLM costs, understanding the trade-offs between different approaches, and learn strategies for maintaining cost-effectiveness as LLM applications move beyond the honeymoon phase into practical realities of production environments.
Syllabus
Intro
Presentation
Introduction
Goal of the talk
Math Presentation
Problem Statement
Disclaimer
GPT4 Model
Selfhosted models
Fine tuning
OpenAI Fine tuning
Key takeaways
Moveworks example
Open source vs commercial
Offloading tasks
True Foundry
Total Cost
Lossless Compression
Open Source Models
Outro
Taught by
MLOps.community