The Spelled-Out Intro to Language Modeling - Building Makemore

Overview

Dive into a comprehensive video tutorial on building a bigram character-level language model, which serves as a foundation for developing more complex Transformer language models like GPT. Learn about torch.Tensor and its intricacies, efficient neural network evaluation, and the framework of language modeling, including model training, sampling, and loss evaluation. Explore topics such as dataset exploration, bigram counting, tensor visualization, model sampling, vectorized normalization, loss functions, neural network approaches, one-hot encodings, and softmax transformations. Gain practical insights through hands-on implementation, and benefit from provided resources and exercises to further enhance your understanding of language modeling concepts.

Syllabus

intro
reading and exploring the dataset
exploring the bigrams in the dataset
counting bigrams in a python dictionary
counting bigrams in a 2D torch tensor "training the model"
visualizing the bigram tensor
deleting spurious S and E tokens in favor of a single . token
sampling from the model
efficiency! vectorized normalization of the rows, tensor broadcasting
loss function the negative log likelihood of the data under our model
model smoothing with fake counts
PART 2: the neural network approach: intro
creating the bigram dataset for the neural net
feeding integers into neural nets? one-hot encodings
the "neural net": one linear layer of neurons implemented with matrix multiplication
transforming neural net outputs into probabilities: the softmax
summary, preview to next steps, reference to micrograd
vectorized loss
backward and update, in PyTorch
putting everything together
note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
note 2: model smoothing as regularization loss
sampling from the neural net
conclusion