MIT: Recurrent Neural Networks
Alexander Amini and Massachusetts Institute of Technology via YouTube
Overview
Syllabus
Intro
Sequences in the wild
A sequence modeling problem: predict the next word
use a fixed window
can't model long-term dependencies
use entire sequence as set of counts
counts don't preserve order
use a really big fixed window
no parameter sharing
Sequence modeling: design criteria
Standard feed-forward neural network
Recurrent neural networks: sequence modeling
A standard "vanilla" neural network
A recurrent neural network (RNN)
RNN state update and output
RNNs: computational graph across time
Recall: backpropagation in feed forward models
RNNs: backpropagation through time
Standard RNN gradient flow: exploding gradients
Standard RNN gradient flow:vanishing gradients
The problem of long-term dependencies
Trick #1: activation functions
Trick #2: parameter initialization
Standard RNN In a standard RNN repeating modules contain a simple computation node
Long Short Term Memory (LSTMs)
LSTMs: forget irrelevant information
LSTMs: output filtered version of cell state
LSTM gradient flow
Example task: music generation
Example task: sentiment classification
Example task: machine translation
Attention mechanisms
Recurrent neural networks (RNNs)
Taught by
https://www.youtube.com/@AAmini/videos