Create a Large Language Model from Scratch with Python – Tutorial

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!

Grab it

Embark on a comprehensive 6-hour tutorial to build a large language model from scratch using Python. Dive deep into the intricacies of data handling, mathematics, and transformer architecture underlying these powerful models. Begin with setting up the development environment, then progress through tokenization, tensor operations, and PyTorch fundamentals. Explore embedding vectors, matrix multiplication, and gradient descent before delving into the transformer architecture and self-attention mechanisms. Implement a GPT-style model, covering positional encoding, multi-head attention, and feed-forward networks. Learn about hyperparameter tuning, model training on the OpenWebText dataset, and the differences between pretraining and fine-tuning. Gain hands-on experience with GPU acceleration, error handling, and model saving/loading techniques. Conclude with insights into prompt completion and future research directions in the field of large language models.

Syllabus

Intro
Install Libraries
Pylzma build tools
Jupyter Notebook
Download wizard of oz
Experimenting with text file
Character-level tokenizer
Types of tokenizers
Tensors instead of Arrays
Linear Algebra heads up
Train and validation splits
Premise of Bigram Model
Inputs and Targets
Inputs and Targets Implementation
Batch size hyperparameter
Switching from CPU to CUDA
PyTorch Overview
CPU vs GPU performance in PyTorch
More PyTorch Functions
Embedding Vectors
Embedding Implementation
Dot Product and Matrix Multiplication
Matmul Implementation
Int vs Float
Recap and get_batch
nnModule subclass
Gradient Descent
Logits and Reshaping
Generate function and giving the model some context
Logits Dimensionality
Training loop + Optimizer + Zerograd explanation
Optimizers Overview
Applications of Optimizers
Loss reporting + Train VS Eval mode
Normalization Overview
ReLU, Sigmoid, Tanh Activations
Transformer and Self-Attention
Transformer Architecture
Building a GPT, not Transformer model
Self-Attention Deep Dive
GPT architecture
Switching to Macbook
Implementing Positional Encoding
GPTLanguageModel initalization
GPTLanguageModel forward pass
Standard Deviation for model parameters
Transformer Blocks
FeedForward network
Multi-head Attention
Dot product attention
4:19:43 Why we scale by 1/sqrtdk
Sequential VS ModuleList Processing
Overview Hyperparameters
Fixing errors, refining
Begin training
OpenWebText download and Survey of LLMs paper
How the dataloader/batch getter will have to change
Extract corpus with winrar
Python data extractor
Adjusting for train and val splits
Adding dataloader
Training on OpenWebText
Training works well, model loading/saving
Pickling
Fixing errors + GPU Memory in task manager
Command line argument parsing
Porting code to script
Prompt: Completion feature + more errors
nnModule inheritance + generation cropping
Pretraining vs Finetuning
R&D pointers
Outro