Completed
Intro
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Neural Nets for NLP - Structured Prediction with Local Independence Assumptions
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Sequence Labeling One tag for one word .e.g. Part of speech tagging hate
- 3 Sequence Labeling as Independent Classification hate
- 4 Problems
- 5 Exposure Bias Teacher Forcing
- 6 Label Bias
- 7 Models w/ Local Dependencies
- 8 Reminder: Globally Normalized Models
- 9 Conditional Random Fields General form of globally normalized model
- 10 Potential Functions
- 11 BILSTM-CRF for Sequence Labeling hate
- 12 CRF Training & Decoding
- 13 Interactions
- 14 Step: Initial Part First, calculate transition from and emission of the first word for every POS
- 15 Step: Middle Parts
- 16 Forward Step: Final Part • Finish up the sentence with the sentence final symbol
- 17 Computing the Partition Function • Hey|X is the partition of sequence with length equal tot and end with label y
- 18 Decoding and Gradient Calculation
- 19 CNN for Character-level representation • We used CNN to extract morphological information such as prefix or suffix of a word
- 20 Training Details
- 21 Experiments
- 22 Reward Functions in Structured Prediction
- 23 Previous Methods to Consider Reward
- 24 Minimizing Risk by Enumeration Simple idea: directly calculate the risk of all hypotheses in the space
- 25 Enumeration + Sampling (Shen+ 2016) • Enumerating all hypotheses is intractable! . Instead of enumerating over everything, only enumerate over a sample, and re-normalize
- 26 Token-wise Minimum Risk If we can come up with a decomposable error function, we can calculate risk for each word