Overview
Syllabus
- Intro
- Brief paper, setup & idea recap
- Main experimental results & high standard deviations
- Why is there no clear winner?
- Why are bigger models not a lot better?
- What’s behind the name ChibiT?
- Why is iGPT underperforming?
- How are tokens distributed in Reinforcement Learning?
- What other domains could have good properties to transfer?
- A deeper dive into the models' attention patterns
- Codebase, model sizes, and compute requirements
- Scaling behavior of pre-trained models
- What did not work out in this project?
- How can people get started and where to go next?
Taught by
Yannic Kilcher