Completed
conclusion
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
The Spelled-Out Intro to Language Modeling - Building Makemore
Automatically move to the next video in the Classroom when playback concludes
- 1 intro
- 2 reading and exploring the dataset
- 3 exploring the bigrams in the dataset
- 4 counting bigrams in a python dictionary
- 5 counting bigrams in a 2D torch tensor "training the model"
- 6 visualizing the bigram tensor
- 7 deleting spurious S and E tokens in favor of a single . token
- 8 sampling from the model
- 9 efficiency! vectorized normalization of the rows, tensor broadcasting
- 10 loss function the negative log likelihood of the data under our model
- 11 model smoothing with fake counts
- 12 PART 2: the neural network approach: intro
- 13 creating the bigram dataset for the neural net
- 14 feeding integers into neural nets? one-hot encodings
- 15 the "neural net": one linear layer of neurons implemented with matrix multiplication
- 16 transforming neural net outputs into probabilities: the softmax
- 17 summary, preview to next steps, reference to micrograd
- 18 vectorized loss
- 19 backward and update, in PyTorch
- 20 putting everything together
- 21 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
- 22 note 2: model smoothing as regularization loss
- 23 sampling from the neural net
- 24 conclusion