Overview
Syllabus
Intro
Encoder-decoder Models
Sentence Representations
Basic Idea (Bahdanau et al. 2015)
Calculating Attention (2)
A Graphical Example
Attention Score Functions (1)
Input Sentence
Hierarchical Structures (Yang et al. 2016)
Multiple Sources
Coverage • Problem: Neural models tends to drop or repeat
Incorporating Markov Properties (Cohn et al. 2015)
در Bidirectional Training
Hard Attention
Summary of the Transformer (Vaswani et al. 2017)
Attention Tricks
Training Tricks
Masking for Training . We want to perform training in as few operations as
Taught by
Graham Neubig