Towards Extreme Multi-Task Scaling for Transfer Learning

Overview

Explore an in-depth analysis of the ExT5 model, which pushes the limits of T5 by pre-training on 107 different supervised NLP tasks using the ExMix dataset. Learn about the model's architecture, task formulations, and performance compared to T5 baselines. Discover insights on multi-task scaling, co-training transfer among task families, and the impact of self-supervised data in pre-training. Gain understanding of the ExT5's improved performance on various NLP tasks and its enhanced sample efficiency during pre-training. This comprehensive video covers topics such as task selection, pre-training vs. pre-finetuning, and experimental results, providing valuable insights for researchers and practitioners in the field of natural language processing and transfer learning.

Syllabus

- Intro & Overview
- Recap: The T5 model
- The ExT5 model and task formulations
- ExMix dataset
- Do different tasks help each other?
- Which tasks should we include?
- Pre-Training vs Pre-Finetuning
- A few hypotheses about what's going on
- How much self-supervised data to use?
- More experimental results
- Conclusion & Summary

Taught by

Yannic Kilcher

Reviews

Start your review of Towards Extreme Multi-Task Scaling for Transfer Learning

Taught by

Advanced PyTorch Techniques and Applications

Neural Nets for NLP - Multi-task, Multi-lingual Learning

Big Transfer - General Visual Representation Learning

Self-Supervised Learning in Computer Vision

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

CMU Neural Nets for NLP 2020 - Multitask and Multilingual Learning

Never Stop Learning.