Building Makemore - MLP

Overview

Dive into the implementation of a multilayer perceptron (MLP) character-level language model in this comprehensive video tutorial. Learn essential machine learning concepts including model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, and under/overfitting. Follow along as the instructor builds a training dataset, implements embedding lookup tables and hidden layers, and explores the internals of PyTorch tensors. Discover how to implement output layers, negative log likelihood loss, and F.cross_entropy. Practice overfitting on a single batch before training on the full dataset with minibatches. Explore techniques for finding optimal learning rates and splitting datasets. Experiment with larger hidden layers and embedding sizes, visualize character embeddings, and learn to sample from the trained model. Access provided resources, including GitHub repositories, Jupyter notebooks, and relevant research papers to enhance your understanding and complete suggested exercises.

Syllabus

intro
Bengio et al. 2003 MLP language model paper walkthrough
re-building our training dataset
implementing the embedding lookup table
implementing the hidden layer + internals of torch.Tensor: storage, views
implementing the output layer
implementing the negative log likelihood loss
summary of the full network
introducing F.cross_entropy and why
implementing the training loop, overfitting one batch
training on the full dataset, minibatches
finding a good initial learning rate
splitting up the dataset into train/val/test splits and why
experiment: larger hidden layer
visualizing the character embeddings
experiment: larger embedding size
summary of our final code, conclusion
sampling from the model
google collab new!! notebook advertisement

Taught by

Andrej Karpathy

Reviews

Start your review of Building Makemore - MLP

Taught by

Create a Large Language Model from Scratch with Python – Tutorial

The Spelled-Out Intro to Language Modeling - Building Makemore

Building Makemore - Building a WaveNet

Neural Nets for NLP - Debugging Neural Nets for NLP

BigScience BLOOM - 3D Parallelism Explained - Large Language Models - ML Coding Series

LLM Foundations - LLM Bootcamp

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.