Overview
Syllabus
Intro
Text Classification
Sequence Labeling Given an input text X, predict an output label sequence of equal length
Reminder: Bi-RNNS - Simple and standard model for sequence labeling for classification
Issues w/ Simple BiRNN
Alternative: Bag of n-grams
Unknown Words
Sub-word Segmentation
Unsupervised Subword Segmentation Algorithms
Sub-word Based Embeddings
Sub-word Based Embedding Models
Embeddings for Cross-lingual Learning: Soft Decoupled Encoding
Labeled/Unlabeled Data Problem: we have very little labeled data for most analysis tasks for most languages
Joint Multi-task Learning
Pre-training
Masked Language Modeling
Thinking about Multi-tasking, and Pre-trained Representations
Other Monolingual BERTS
XTREME: Comparing Multilingual Representations
Why Call it "Structured" Prediction?
Why Model Interactions in Output?
Local Normalization vs. Global Normalization
Potential Functions
Discussion
Taught by
Graham Neubig