Overview
Explore active learning techniques for natural language processing in this 29-minute lecture from CMU's Multilingual NLP course. Delve into token-level and sequence-level active learning, examining uncertainty paradigms, query by committee, and various uncertainty measures. Gain insights into human effort considerations in active learning, including cost assessment and annotation reusability. Learn how to implement an effective active learning pipeline and understand its importance in multilingual NLP applications. Engage with a discussion question to reinforce your understanding of the presented concepts.
Syllabus
Intro
Types of Learning
Active Learning Pipeline
Why Active Learning?
Fundamental Ideas
Uncertainty Paradigms
Query by Committee
Sequence-level Uncertainty Measures
Training on Token Level
Token-level Representativeness Metrics
Sequence-to-sequence Uncertainty Metrics
Human Effort and Active Learning • In simulation, it's common to assess active learning based on words/sentences annotated
Considering Cost in Active Learning
Reusability of Active Learning Annotations
Discussion Question
Taught by
Graham Neubig