Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Chain-of-Thought Reasoning in Large Language Models - Exploring SFT and RL Relevance

Discover AI via YouTube

Overview

Explore a technical analysis of OpenAI's o3 model's Chain-of-Thought (CoT) reasoning mechanisms during inference time in this 27-minute research presentation. Dive deep into the relationship between Supervised Fine-Tuning (SFT), Reinforcement Learning (RL), and test-time reasoning capabilities. Learn how language models are taught to explicitly reason over safety specifications through CoT, breaking down complex problems into intermediate steps rather than relying solely on pattern recognition. Examine the innovative Alignment Fine-Tuning (AFT) paradigm that addresses assessment misalignment issues through a three-step process involving COT training, response generation, and score calibration. Understand the implications of explicit reasoning versus implicit pattern learning in language models, drawing from research by HuggingFace and OpenAI on scaling test-time compute and deliberative alignment strategies.

Syllabus

o3 Inference Time CoT Reasoning: How relevant is SFT and RL?

Taught by

Discover AI

Reviews

Start your review of Chain-of-Thought Reasoning in Large Language Models - Exploring SFT and RL Relevance

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.