Completed
Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Test Time Compute: Sampling and Chain of Thought Techniques
Automatically move to the next video in the Classroom when playback concludes
- 1 OpenAI o1 type techniques for scaling test time compute
- 2 Video Overview temperature, chain of thought
- 3 Training compute versus test time compute
- 4 Why spend more compute on test time / inference?
- 5 Using verifiers to select the best answers
- 6 Exploring and critiquing/verifying answers during inference
- 7 Understanding Temperature for sampling
- 8 Should you set temperature to zero?
- 9 Beam search
- 10 Problems with setting a non-zero temperature
- 11 Using top p, top k, min p, and best of
- 12 Recap on choosing temperature for sampling
- 13 How to implement chain of thought
- 14 Setup for notebook run-through on gsm8k and hotpot qa
- 15 Using sampling and chain of thought on hotpotqa and gsm8k
- 16 Running vllm in a Jupyter notebook allows for batching
- 17 Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
- 18 Multi-threading the scoring / grading for speed
- 19 Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
- 20 Controlling sampling parameters min p, top p, top k, beam search, temperature
- 21 Running temperature / sampling ablations WITHOUT chain of thought
- 22 Chain of Thought Setup
- 23 Running ablations WITH chain of thought
- 24 GSM8K Results Charts
- 25 Hotpot QA Results Charts
- 26 Recommendations on sampling, temperature and chain of thought
- 27 Video resources