Synthetic Data Generation and Fine-tuning for OpenAI GPT-4 or Llama 3

Synthetic Data Generation and Fine-tuning for OpenAI GPT-4 or Llama 3

Trelis Research via YouTube Direct link

Llama 3.2 8B model performance before fine-tuning

26 of 42

26 of 42

Llama 3.2 8B model performance before fine-tuning

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Synthetic Data Generation and Fine-tuning for OpenAI GPT-4 or Llama 3

Automatically move to the next video in the Classroom when playback concludes

  1. 1 How to generate synthetic data for fine-tuning
  2. 2 Video Overview fine-tune OpenAI or Llama 3
  3. 3 : Synthetic Question Generation
  4. 4 Synthetic Answer Generation
  5. 5 Why chain of thought is important in Synthetic Data
  6. 6 Augmented Synthetic Data
  7. 7 Generating Synthetic Data from Documents
  8. 8 Synthetic Data from Structured Data
  9. 9 Generating data from user conversations
  10. 10 GPU and notebook setup for Notebooks
  11. 11 OpenAI Notebook: Data Generation and Fine-tuning
  12. 12 Data extraction from pdfs
  13. 13 Synthetic Data Generation for GPT-4o-mini fine-tuning
  14. 14 Generating synthetic questions using structured outputs
  15. 15 Generating synthetic answers
  16. 16 Saving data in jsonl format for OpenAI fine-tuning
  17. 17 How to fine-tune an openai model on a synthetic dataset
  18. 18 Using an LLM as a judge for evaluation
  19. 19 Evaluation of gpt-4o-mini versus fine-tuned model
  20. 20 How to increase and improve the training data
  21. 21 Fine-tuning Open Source Models like Llama 3
  22. 22 Pushing a synthetic dataset to HuggingFace
  23. 23 Loading a model with transformers or Unsloth
  24. 24 Setting generation parameters incl. temperature and top p
  25. 25 Batch generation with transformers or unsloth, incl. padding and chat templating
  26. 26 Llama 3.2 8B model performance before fine-tuning
  27. 27 Fine-tuning on synthetic data with unsloth or transformers
  28. 28 LoRA adapter setup, rescaled LoRa, choice of rank and alpha
  29. 29 Dataset preparation for fine-tuning, incl. prompt formatting
  30. 30 SFTTrainer trainer setup incl. epochs, batch size, gradient accumulation
  31. 31 Defining a custom learning schedule with annealing
  32. 32 How to train on completions only like openai’s default
  33. 33 Running training on Llama 3.2 1B
  34. 34 Performance evaluation after fine-tuning Llama 3.2
  35. 35 Using augmented synthetic data to improve Maths performance Advanced / Speculative!
  36. 36 Evaluating the baseline maths performance of Llama 3.2 1B
  37. 37 Fine-tuning on a training split of the lighteval/MATH dataset
  38. 38 Training on synthetic data from Llama 3.1 8B instead of the training split
  39. 39 Comparing results of training on a training split vs on synthetic Llama 3.1 8B answers
  40. 40 Training on an augmented synthetic dataset generated with Llama 3.1 8B and ground truth answers
  41. 41 Comparing all results, base vs fine-tuned on the raw dataset vs 8B synth vs 8B synth with augmentation
  42. 42 How to use augmented data if you have access to user conversations or feedback

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.