Overview
Syllabus
Intro
Language Modeling: Calculating
Count-based Language Models
A Refresher on Evaluation
Problems and Solutions? • Cannot share strength among similar words
Example
Softmax
A Computation Graph View
A Note: "Lookup"
Training a Model
Parameter Update
Unknown Words
Evaluation and Vocabulary
Linear Models can't Learn Feature Combinations
Neural Language Models (See Bengio et al. 2004)
Tying Input/Output Embeddings
Standard SGD
SGD With Momentum
Adagrad
Adam . Most standard optimization option in NLP and beyond . Considers rolling average of gradient, and momentum
Shuffling the Training Data
Taught by
Graham Neubig