Completed
ROBERTA: Scaling BERT
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
KDD2020 - Transfer Learning Joshi
Automatically move to the next video in the Classroom when playback concludes
- 1 Transfer Learning via Pre-training
- 2 Pre-trained Contextualized Representations
- 3 BERT [Devlin et al. (2018)]
- 4 How can we do better?
- 5 Span-based Efficient Pre-training
- 6 Pre-training Span Representations
- 7 Why is this more efficient?
- 8 Random subword masks can be too easy
- 9 Which spans to mask?
- 10 Why SBO?
- 11 Single-sequence Inputs
- 12 Evaluation
- 13 Baselines
- 14 Extractive QA: SQUAD
- 15 GLUE
- 16 ROBERTA: Scaling BERT
- 17 The ROBERTA Recipe
- 18 What is still hard?
- 19 Next Big Thing: Few Shot Learning?
- 20 Next Big Thing: Non-parametric Memories?