Overview
Syllabus
intro
reading and exploring the dataset
exploring the bigrams in the dataset
counting bigrams in a python dictionary
counting bigrams in a 2D torch tensor "training the model"
visualizing the bigram tensor
deleting spurious S and E tokens in favor of a single . token
sampling from the model
efficiency! vectorized normalization of the rows, tensor broadcasting
loss function the negative log likelihood of the data under our model
model smoothing with fake counts
PART 2: the neural network approach: intro
creating the bigram dataset for the neural net
feeding integers into neural nets? one-hot encodings
the "neural net": one linear layer of neurons implemented with matrix multiplication
transforming neural net outputs into probabilities: the softmax
summary, preview to next steps, reference to micrograd
vectorized loss
backward and update, in PyTorch
putting everything together
note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
note 2: model smoothing as regularization loss
sampling from the neural net
conclusion
Taught by
Andrej Karpathy