Overview
Syllabus
Intro
NLP and Sequential Data
Long-distance Dependencies in Language
Can be Complicated!
Recurrent Neural Networks (Elman 1990)
Training RNNS
Parameter Tying
What Can RNNs Do?
Representing Sentences
e.g. Language Modeling
Vanishing Gradient . Gradients decrease as they get pushed back
A Solution: Long Short-term Memory (Hochreiter and Schmidhuber 1997)
LSTM Structure
What can LSTMs Learn? (1)
Handling Mini-batching
Mini-batching Method
Bucketing/Sorting
Optimized Implementations of LSTMs (Appleyard 2015)
Gated Recurrent Units (Cho et al. 2014)
Soft Hierarchical Stucture
Handling Long Sequences
Taught by
Graham Neubig