Create a Large Language Model from Scratch with Python – Tutorial

Create a Large Language Model from Scratch with Python – Tutorial

freeCodeCamp.org via freeCodeCamp Direct link

Standard Deviation for model parameters

46 of 72

46 of 72

Standard Deviation for model parameters

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Create a Large Language Model from Scratch with Python – Tutorial

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Install Libraries
  3. 3 Pylzma build tools
  4. 4 Jupyter Notebook
  5. 5 Download wizard of oz
  6. 6 Experimenting with text file
  7. 7 Character-level tokenizer
  8. 8 Types of tokenizers
  9. 9 Tensors instead of Arrays
  10. 10 Linear Algebra heads up
  11. 11 Train and validation splits
  12. 12 Premise of Bigram Model
  13. 13 Inputs and Targets
  14. 14 Inputs and Targets Implementation
  15. 15 Batch size hyperparameter
  16. 16 Switching from CPU to CUDA
  17. 17 PyTorch Overview
  18. 18 CPU vs GPU performance in PyTorch
  19. 19 More PyTorch Functions
  20. 20 Embedding Vectors
  21. 21 Embedding Implementation
  22. 22 Dot Product and Matrix Multiplication
  23. 23 Matmul Implementation
  24. 24 Int vs Float
  25. 25 Recap and get_batch
  26. 26 nnModule subclass
  27. 27 Gradient Descent
  28. 28 Logits and Reshaping
  29. 29 Generate function and giving the model some context
  30. 30 Logits Dimensionality
  31. 31 Training loop + Optimizer + Zerograd explanation
  32. 32 Optimizers Overview
  33. 33 Applications of Optimizers
  34. 34 Loss reporting + Train VS Eval mode
  35. 35 Normalization Overview
  36. 36 ReLU, Sigmoid, Tanh Activations
  37. 37 Transformer and Self-Attention
  38. 38 Transformer Architecture
  39. 39 Building a GPT, not Transformer model
  40. 40 Self-Attention Deep Dive
  41. 41 GPT architecture
  42. 42 Switching to Macbook
  43. 43 Implementing Positional Encoding
  44. 44 GPTLanguageModel initalization
  45. 45 GPTLanguageModel forward pass
  46. 46 Standard Deviation for model parameters
  47. 47 Transformer Blocks
  48. 48 FeedForward network
  49. 49 Multi-head Attention
  50. 50 Dot product attention
  51. 51 4:19:43 Why we scale by 1/sqrtdk
  52. 52 Sequential VS ModuleList Processing
  53. 53 Overview Hyperparameters
  54. 54 Fixing errors, refining
  55. 55 Begin training
  56. 56 OpenWebText download and Survey of LLMs paper
  57. 57 How the dataloader/batch getter will have to change
  58. 58 Extract corpus with winrar
  59. 59 Python data extractor
  60. 60 Adjusting for train and val splits
  61. 61 Adding dataloader
  62. 62 Training on OpenWebText
  63. 63 Training works well, model loading/saving
  64. 64 Pickling
  65. 65 Fixing errors + GPU Memory in task manager
  66. 66 Command line argument parsing
  67. 67 Porting code to script
  68. 68 Prompt: Completion feature + more errors
  69. 69 nnModule inheritance + generation cropping
  70. 70 Pretraining vs Finetuning
  71. 71 R&D pointers
  72. 72 Outro

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.