Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover the critical role of high-quality data in developing high-performing large language models (LLMs) for production environments in this 34-minute conference talk from the LLMs in Production Conference. Explore the challenges of building LLMs that work effectively at scale, and learn why data quality is becoming the key differentiator in model performance. Delve into the importance of pre-training, common pitfalls to avoid, and strategies for ensuring data scientists work with top-notch data throughout the machine learning workflow. Gain insights on data-centric AI, fine-tuning techniques, and the significance of predictability in model outcomes. Examine modern ML trends, including open-source models, chain-of-thought prompting, and context retrieval, while understanding the shift towards instruction-based approaches in AI development.
Syllabus
Intro
AI is mainstream
The Hype
The Transformer Paper
DataCentric AI
Modern ML Wave
Open Source Models
Good Quality Data
Importance of Data
Fine Tuning
Predictability
DataCentric Development
Chain of Thought Prompt
Context Retrieval
Instruction
Taught by
MLOps.community