Forecasting and Aligning AI - Jacob Steinhardt

Overview

Limited-Time Offer: Up to 75% Off Coursera Plus!

7000+ certificate courses from Google, Microsoft, IBM, and many more.

Modern ML systems sometimes undergo qualitative shifts in behavior simply by “scaling up” the number of parameters and training examples. Given this, how can we extrapolate the behavior of future ML systems and ensure that they behave safely and are aligned with humans? I’ll argue that we can often study (potential) capabilities of future ML systems through well-controlled experiments run on current systems, and use this as a laboratory for designing alignment techniques. I’ll also discuss some recent work on “medium-term” AI forecasting.

Syllabus

Introduction.
Rest of Talk.
Reward Hacking: Motivation.
Reward Hacking Example.
Reward Hacking: Example.
Summary of Full Results.
Reward Hacking: Summary.
Making NLP Models Truthful.
Contrastive Representation Clustering.
Results on Unified QA.
Caveat: True Answers Work Too.
Forecasting: Motivation.
Forecasting Competition.
Forecasting Questions.
Summary of Benchmark Forecasts.
Results So Far.
Forecasting: Lessons Learned.
Forecasting Class.

Taught by

Stanford Online

Reviews

Start your review of Forecasting and Aligning AI - Jacob Steinhardt

BloomTech’s Downfall: A Long Time Coming

Most common

Popular subjects

Popular courses

Forecasting and Aligning AI - Jacob Steinhardt

Overview

Limited-Time Offer: Up to 75% Off Coursera Plus!

Syllabus

Taught by

Tags

Reviews

BloomTech’s Downfall: A Long Time Coming

Limited-Time Offer: Up to 75% Off Coursera Plus!

Taught by

Tags

150+ Stanford On-Campus Computer Science Courses Available Online

50+ Free Online Courses and Webinars on Artificial Intelligence in Healthcare

10 Best Artificial Intelligence Courses

10 Best Applied AI & ML Courses

Never Stop Learning.