Completed
Training on an augmented synthetic dataset generated with Llama 3.1 8B and ground truth answers
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Synthetic Data Generation and Fine-tuning for OpenAI GPT-4 or Llama 3
Automatically move to the next video in the Classroom when playback concludes
- 1 How to generate synthetic data for fine-tuning
- 2 Video Overview fine-tune OpenAI or Llama 3
- 3 : Synthetic Question Generation
- 4 Synthetic Answer Generation
- 5 Why chain of thought is important in Synthetic Data
- 6 Augmented Synthetic Data
- 7 Generating Synthetic Data from Documents
- 8 Synthetic Data from Structured Data
- 9 Generating data from user conversations
- 10 GPU and notebook setup for Notebooks
- 11 OpenAI Notebook: Data Generation and Fine-tuning
- 12 Data extraction from pdfs
- 13 Synthetic Data Generation for GPT-4o-mini fine-tuning
- 14 Generating synthetic questions using structured outputs
- 15 Generating synthetic answers
- 16 Saving data in jsonl format for OpenAI fine-tuning
- 17 How to fine-tune an openai model on a synthetic dataset
- 18 Using an LLM as a judge for evaluation
- 19 Evaluation of gpt-4o-mini versus fine-tuned model
- 20 How to increase and improve the training data
- 21 Fine-tuning Open Source Models like Llama 3
- 22 Pushing a synthetic dataset to HuggingFace
- 23 Loading a model with transformers or Unsloth
- 24 Setting generation parameters incl. temperature and top p
- 25 Batch generation with transformers or unsloth, incl. padding and chat templating
- 26 Llama 3.2 8B model performance before fine-tuning
- 27 Fine-tuning on synthetic data with unsloth or transformers
- 28 LoRA adapter setup, rescaled LoRa, choice of rank and alpha
- 29 Dataset preparation for fine-tuning, incl. prompt formatting
- 30 SFTTrainer trainer setup incl. epochs, batch size, gradient accumulation
- 31 Defining a custom learning schedule with annealing
- 32 How to train on completions only like openai’s default
- 33 Running training on Llama 3.2 1B
- 34 Performance evaluation after fine-tuning Llama 3.2
- 35 Using augmented synthetic data to improve Maths performance Advanced / Speculative!
- 36 Evaluating the baseline maths performance of Llama 3.2 1B
- 37 Fine-tuning on a training split of the lighteval/MATH dataset
- 38 Training on synthetic data from Llama 3.1 8B instead of the training split
- 39 Comparing results of training on a training split vs on synthetic Llama 3.1 8B answers
- 40 Training on an augmented synthetic dataset generated with Llama 3.1 8B and ground truth answers
- 41 Comparing all results, base vs fine-tuned on the raw dataset vs 8B synth vs 8B synth with augmentation
- 42 How to use augmented data if you have access to user conversations or feedback