Challenges in Aligning Language Models

Overview

Explore the challenges of aligning language models in this insightful conference talk by Assistant Professor He He from New York University. Delve into the complexities of making large language models (LMs) more aligned with developers' intentions and users' needs. Gain valuable insights into current alignment approaches, including finetuning and Reinforcement Learning from Human Feedback (RLHF). Discover the speaker's research on improving model truthfulness through supervised finetuning and the impact of feedback-tuned models on co-writing applications. Learn about the importance of model robustness, truthfulness, and human-AI collaboration in creating more adaptable and factually accurate AI systems. This talk offers a deep dive into the cutting-edge research aimed at enhancing the capabilities and reliability of language models for a wide range of applications.