KDD2020 - Transfer Learning Joshi

KDD2020 - Transfer Learning Joshi

Association for Computing Machinery (ACM) via YouTube Direct link

ROBERTA: Scaling BERT

16 of 20

16 of 20

ROBERTA: Scaling BERT

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

KDD2020 - Transfer Learning Joshi

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Transfer Learning via Pre-training
  2. 2 Pre-trained Contextualized Representations
  3. 3 BERT [Devlin et al. (2018)]
  4. 4 How can we do better?
  5. 5 Span-based Efficient Pre-training
  6. 6 Pre-training Span Representations
  7. 7 Why is this more efficient?
  8. 8 Random subword masks can be too easy
  9. 9 Which spans to mask?
  10. 10 Why SBO?
  11. 11 Single-sequence Inputs
  12. 12 Evaluation
  13. 13 Baselines
  14. 14 Extractive QA: SQUAD
  15. 15 GLUE
  16. 16 ROBERTA: Scaling BERT
  17. 17 The ROBERTA Recipe
  18. 18 What is still hard?
  19. 19 Next Big Thing: Few Shot Learning?
  20. 20 Next Big Thing: Non-parametric Memories?

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.