Completed
More details on the stochastic masking loss
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Non-Parametric Transformers - Paper Explained
Automatically move to the next video in the Classroom when playback concludes
- 1 Key ideas of the paper
- 2 Abstract
- 3 Note on k-NN non-parametric machine learning
- 4 Data and NPT setup explained
- 5 NPT loss is inspired by BERT
- 6 A high-level architecture overview
- 7 NPT jointly learns imputation and prediction
- 8 Architecture deep dive input embeddings, etc
- 9 More details on the stochastic masking loss
- 10 Connections to Graph Neural Networks and CNNs
- 11 NPT achieves great results on tabular data benchmarks
- 12 NPT learns the underlying relational, causal mechanisms
- 13 NPT does rely on other datapoints
- 14 NPT attends to similar vectors
- 15 Conclusions