Overview
Explore a comprehensive research seminar that investigates efficient training algorithms for Transformer-based language models, focusing on the computational challenges and effectiveness of various optimization methods. Learn about three key categories of algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). Discover the findings when pre-training BERT and T5 models with fixed computation budgets, and understand the proposed evaluation protocol using reference system time. Delve into potential pitfalls, experimental setups, and practical implications for model training efficiency. Gain insights from speakers Jean Kaddour and Oscar Key as they present their research findings, supported by publicly available code and their published paper. Master concepts including model stacking, selected backdrop, efficient optimizers, and understand the overheads and conclusions drawn from their extensive experimentation.
Syllabus
Introduction
Outline
Story
Potential pitfalls
What could go wrong
Scenarios
Job Interferences
Measuring Reference System Time
Experimental Setup
Model Stacking
Selected Backdrop
Question
Efficient Optimizers
Results
What goes wrong
Overheads
Conclusions
Taught by
AutoML Seminars