Overview
Syllabus
Intro
Language Modeling: Calculating the Probability of a Sentence
Count-based Language Models
A Refresher on Evaluation
Problems and Solutions?
An Alternative: Featurized Models
A Computation Graph View
A Note: "Lookup"
Training a Model
Parameter Update
Unknown Words
Evaluation and Vocabulary
Linear Models can't Learn Feature Combinations
Neural Language Models . (See Bengio et al. 2004)
Tying Input/Output Embeddings
Standard SGD
SGD With Momentum
Adagrad
Adam
Shuffling the Training Data
Neural nets have lots of parameters, and are prone to overfitting
Efficiency Tricks: Mini-batching
Minibatching
Manual Mini-batching
Mini-batched Code Example
Automatic Mini-batching!
Code-level Optimization . eg. TorchScript provides a restricted representation of a PyTorch module that can be run efficiently in C++
Regularizing and Optimizing LSTM Language Models (Merity et al. 2017)
In-class Discussion
Taught by
Graham Neubig