Fine-Tuning LLMs: Best Practices and When to Go Small - Lecture 124

Overview

Explore best practices for fine-tuning Large Language Models (LLMs) and learn when to opt for smaller models in this 54-minute talk by Mark Kim-Huang, Co-Founder and Head of AI at Preemo Inc. Dive into challenges and state-of-the-art techniques for fine-tuning LLMs, including strategies to improve performance, efficient data parallelism, and parameter-efficient fine-tuning with LoRa. Discover the benefits of custom LLMs over closed-source models, understand various types of fine-tuning, and learn how to define tasks and construct effective prompts. The talk also covers data synthesis, cloud platform costs, vector databases, and the trade-offs between fine-tuning and prompting with context windows. Gain insights into Mark's journey to LLMs and participate in a Q&A session addressing topics such as task clustering, LangChain Auto Evaluator, and finding reasoning paths in models.

Syllabus

[] Introduction to Mark Kim-Huang
[] Join the LLMs in Production Conference Part 2 on June 15-16!
[] Fine-Tuning LLMs: Best Practices and When to Go Small
[] Model approaches
[] You might think that you could just use OpenAI but only older base models are available
[] Why custom LLMs over closed source models?
[] Small models work well for simple tasks
[] Types of Fine-Tuning
[] Strategies for improving fine-tuning performance
[] Challenges
[] Define your task
[] Task framework
[] Defining tasks
[] Clustering task diversifies training data and improves out-of-domain performance
[] Prompt engineering
[] Constructing a prompt
[] Synthesize more data
[] Constructing a prompt
[] Increase fine-tuning efficiency with LoRa
[] Naive data parallelism with mixed precision is inefficient
[] Further reading on mixed precision
[] Parameter efficient fine-tuning with LoRa
[] LoRa Data Parallel with Mixed Precision
[] Summary
[] Q&A
[] Mark's journey to LLMs
[] Task clustering mixing with existing data sets
[] LangChain Auto Evaluator evaluating LLMs
[] Cloud platforms costs
[] Vector database used at Preemo
[] Finding a reasoning path of a model on Prompting
[] When to fine-tune versus prompting with a context window
[] Wrap up