Overview
Syllabus
OpenAI o1 type techniques for scaling test time compute
Video Overview temperature, chain of thought
Training compute versus test time compute
Why spend more compute on test time / inference?
Using verifiers to select the best answers
Exploring and critiquing/verifying answers during inference
Understanding Temperature for sampling
Should you set temperature to zero?
Beam search
Problems with setting a non-zero temperature
Using top p, top k, min p, and best of
Recap on choosing temperature for sampling
How to implement chain of thought
Setup for notebook run-through on gsm8k and hotpot qa
Using sampling and chain of thought on hotpotqa and gsm8k
Running vllm in a Jupyter notebook allows for batching
Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
Multi-threading the scoring / grading for speed
Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
Controlling sampling parameters min p, top p, top k, beam search, temperature
Running temperature / sampling ablations WITHOUT chain of thought
Chain of Thought Setup
Running ablations WITH chain of thought
GSM8K Results Charts
Hotpot QA Results Charts
Recommendations on sampling, temperature and chain of thought
Video resources
Taught by
Trelis Research