Learning Neural Network Hyperparameters for Machine Translation - 2019
Center for Language & Speech Processing(CLSP), JHU via YouTube
Overview
Syllabus
Intro
Statistical Machine Translation
Motivation
Grid Search
Method Overview
Common Regularization
Objective Function
Proximal Gradient Methods
Experiments: 5-gram Language Modeling
5-gram Perplexity
Behavior During Training
Key Takeaways
Optimal Hyperparameters Not Universal
Auto-Sizing Transformer Layers
Pytorch Implementation
Beam Search
Perceptron Tuning
Experiment: Tuned Reward
Questions?
Taught by
Center for Language & Speech Processing(CLSP), JHU