Completed
Step: Initial Part First, calculate transition from and emission of the first word for every POS
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Neural Nets for NLP - Structured Prediction with Local Independence Assumptions
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Sequence Labeling One tag for one word .e.g. Part of speech tagging hate
- 3 Sequence Labeling as Independent Classification hate
- 4 Problems
- 5 Exposure Bias Teacher Forcing
- 6 Label Bias
- 7 Models w/ Local Dependencies
- 8 Reminder: Globally Normalized Models
- 9 Conditional Random Fields General form of globally normalized model
- 10 Potential Functions
- 11 BILSTM-CRF for Sequence Labeling hate
- 12 CRF Training & Decoding
- 13 Interactions
- 14 Step: Initial Part First, calculate transition from and emission of the first word for every POS
- 15 Step: Middle Parts
- 16 Forward Step: Final Part • Finish up the sentence with the sentence final symbol
- 17 Computing the Partition Function • Hey|X is the partition of sequence with length equal tot and end with label y
- 18 Decoding and Gradient Calculation
- 19 CNN for Character-level representation • We used CNN to extract morphological information such as prefix or suffix of a word
- 20 Training Details
- 21 Experiments
- 22 Reward Functions in Structured Prediction
- 23 Previous Methods to Consider Reward
- 24 Minimizing Risk by Enumeration Simple idea: directly calculate the risk of all hypotheses in the space
- 25 Enumeration + Sampling (Shen+ 2016) • Enumerating all hypotheses is intractable! . Instead of enumerating over everything, only enumerate over a sample, and re-normalize
- 26 Token-wise Minimum Risk If we can come up with a decomposable error function, we can calculate risk for each word