MIT: Recurrent Neural Networks

Overview

Explore deep sequence modeling with recurrent neural networks in this lecture from MIT's Introduction to Deep Learning course. Dive into the challenges of modeling long-term dependencies in sequences and learn about various RNN architectures, including standard RNNs and Long Short-Term Memory (LSTM) networks. Discover techniques for addressing gradient flow issues, such as exploding and vanishing gradients. Examine practical applications of RNNs in music generation, sentiment classification, and machine translation. Gain insights into attention mechanisms and their role in improving sequence modeling performance. Enhance your understanding of deep learning techniques for processing and generating sequential data.

Syllabus

Intro
Sequences in the wild
A sequence modeling problem: predict the next word
use a fixed window
can't model long-term dependencies
use entire sequence as set of counts
counts don't preserve order
use a really big fixed window
no parameter sharing
Sequence modeling: design criteria
Standard feed-forward neural network
Recurrent neural networks: sequence modeling
A standard "vanilla" neural network
A recurrent neural network (RNN)
RNN state update and output
RNNs: computational graph across time
Recall: backpropagation in feed forward models
RNNs: backpropagation through time
Standard RNN gradient flow: exploding gradients
Standard RNN gradient flow:vanishing gradients
The problem of long-term dependencies
Trick #1: activation functions
Trick #2: parameter initialization
Standard RNN In a standard RNN repeating modules contain a simple computation node
Long Short Term Memory (LSTMs)
LSTMs: forget irrelevant information
LSTMs: output filtered version of cell state
LSTM gradient flow
Example task: music generation
Example task: sentiment classification
Example task: machine translation
Attention mechanisms
Recurrent neural networks (RNNs)