Overview
Syllabus
Supervised, Unsupervised, Semi-supervised
Learning Features vs. Learning Discrete Structure
Unsupervised Feature Learning (Review)
How do we Use Learned Features?
What About Discrete Structure?
A Simple First Attempt
Unsupervised Hidden Markov Models • Change label states to unlabeled numbers
Hidden Markov Models w/ Gaussian Emissions • Instead of parameterizing each state with a categorical distribution, we can use a Gaussian (or Gaussian modure)!
Featurized Hidden Markov Models (Tran et al. 2016) • Calculate the transition emission probabilities with neural networks! • Emission: Calculate representation of each word in vocabulary w
CRF Autoencoders (Ammar et al. 2014)
Soft vs. Hard Tree Structure
One Other Paradigm: Weak Supervision
Gated Convolution (Cho et al. 2014)
Learning with RL (Yogatama et al. 2016)
Phrase Structure vs. Dependency Structure
Dependency Model w/ Valence (Klein and Manning 2004)
Unsupervised Dependency Induction w/ Neural Nets (Jiang et al. 2016)
Learning Dependency Heads w/ Attention (Kuncoro et al. 2017)
Learning Segmentations w/ Reconstruction Loss (Elsner and Shain 2017)
Learning Language-level Features (Malaviya et al. 2017) • All previous work learned features of a single sentence
Taught by
Graham Neubig