Completed
Synthetic Data Generation for GPT-4o-mini fine-tuning
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Synthetic Data Generation and Fine-tuning for OpenAI GPT-4 or Llama 3
Automatically move to the next video in the Classroom when playback concludes
- 1 How to generate synthetic data for fine-tuning
- 2 Video Overview fine-tune OpenAI or Llama 3
- 3 : Synthetic Question Generation
- 4 Synthetic Answer Generation
- 5 Why chain of thought is important in Synthetic Data
- 6 Augmented Synthetic Data
- 7 Generating Synthetic Data from Documents
- 8 Synthetic Data from Structured Data
- 9 Generating data from user conversations
- 10 GPU and notebook setup for Notebooks
- 11 OpenAI Notebook: Data Generation and Fine-tuning
- 12 Data extraction from pdfs
- 13 Synthetic Data Generation for GPT-4o-mini fine-tuning
- 14 Generating synthetic questions using structured outputs
- 15 Generating synthetic answers
- 16 Saving data in jsonl format for OpenAI fine-tuning
- 17 How to fine-tune an openai model on a synthetic dataset
- 18 Using an LLM as a judge for evaluation
- 19 Evaluation of gpt-4o-mini versus fine-tuned model
- 20 How to increase and improve the training data
- 21 Fine-tuning Open Source Models like Llama 3
- 22 Pushing a synthetic dataset to HuggingFace
- 23 Loading a model with transformers or Unsloth
- 24 Setting generation parameters incl. temperature and top p
- 25 Batch generation with transformers or unsloth, incl. padding and chat templating
- 26 Llama 3.2 8B model performance before fine-tuning
- 27 Fine-tuning on synthetic data with unsloth or transformers
- 28 LoRA adapter setup, rescaled LoRa, choice of rank and alpha
- 29 Dataset preparation for fine-tuning, incl. prompt formatting
- 30 SFTTrainer trainer setup incl. epochs, batch size, gradient accumulation
- 31 Defining a custom learning schedule with annealing
- 32 How to train on completions only like openai’s default
- 33 Running training on Llama 3.2 1B
- 34 Performance evaluation after fine-tuning Llama 3.2
- 35 Using augmented synthetic data to improve Maths performance Advanced / Speculative!
- 36 Evaluating the baseline maths performance of Llama 3.2 1B
- 37 Fine-tuning on a training split of the lighteval/MATH dataset
- 38 Training on synthetic data from Llama 3.1 8B instead of the training split
- 39 Comparing results of training on a training split vs on synthetic Llama 3.1 8B answers
- 40 Training on an augmented synthetic dataset generated with Llama 3.1 8B and ground truth answers
- 41 Comparing all results, base vs fine-tuned on the raw dataset vs 8B synth vs 8B synth with augmentation
- 42 How to use augmented data if you have access to user conversations or feedback