Overview
Syllabus
Intro
In Neural Networks, Tuning is Paramount!
A Typical Situation
Possible Causes
Identifying Training Time Problems
Is My Model Too Weak? Your model needs to be big enough to learn . Model size depends on task . For language modeling, at least 512 nodes • For natural language analysis, 128 or so may do . Multiple layers are often better
Be Careful of Deep Models
Trouble w/ Optimization
Reminder: Optimizers
Initialization
Bucketing/Sorting • If we use sentences of different lengths, too much padding and sorting can result in slow training • To remedy this sort sentences so similarly-lengthed sentences are in the same batch • But this can affect performance! (Morishita et al. 2017)
Debugging Decoding
Beam Search
Debugging Search
Look At Your Data!
Symptoms of Overfitting
Reminder: Dev-driven Learning Rate Decay Start w/ a high learning rate, then degrade learning rate when start overfitting the development set (the newbob learning rate schedule)
Taught by
Graham Neubig