Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore language acquisition in masked language models through a comprehensive lecture by Naomi Saphra from the Kempner Institute at Harvard University. Delve into the study of Syntactic Attention Structure (SAS) and its crucial role in the development of linguistic capabilities in Transformer models. Examine how analyzing the training process provides unique insights into model behavior, focusing on a specific window when SAS emerges alongside a significant drop in loss. Investigate the causal relationship between SAS and grammatical abilities through manipulation experiments. Discover the competitive nature of SAS with other beneficial traits during training and learn how brief suppression of SAS can lead to improved model quality. Gain valuable insights into simplicity bias and breakthrough training dynamics in the context of language model development.