Completed
Running vllm in a Jupyter notebook allows for batching
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Test Time Compute: Sampling and Chain of Thought Techniques
Automatically move to the next video in the Classroom when playback concludes
- 1 OpenAI o1 type techniques for scaling test time compute
- 2 Video Overview temperature, chain of thought
- 3 Training compute versus test time compute
- 4 Why spend more compute on test time / inference?
- 5 Using verifiers to select the best answers
- 6 Exploring and critiquing/verifying answers during inference
- 7 Understanding Temperature for sampling
- 8 Should you set temperature to zero?
- 9 Beam search
- 10 Problems with setting a non-zero temperature
- 11 Using top p, top k, min p, and best of
- 12 Recap on choosing temperature for sampling
- 13 How to implement chain of thought
- 14 Setup for notebook run-through on gsm8k and hotpot qa
- 15 Using sampling and chain of thought on hotpotqa and gsm8k
- 16 Running vllm in a Jupyter notebook allows for batching
- 17 Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
- 18 Multi-threading the scoring / grading for speed
- 19 Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
- 20 Controlling sampling parameters min p, top p, top k, beam search, temperature
- 21 Running temperature / sampling ablations WITHOUT chain of thought
- 22 Chain of Thought Setup
- 23 Running ablations WITH chain of thought
- 24 GSM8K Results Charts
- 25 Hotpot QA Results Charts
- 26 Recommendations on sampling, temperature and chain of thought
- 27 Video resources