Test Time Compute: Sampling and Chain of Thought Techniques

Overview

Explore advanced inference techniques for large language models in this comprehensive 55-minute video lecture. Dive into the differences between training and test-time compute, and learn why investing more resources in inference can be beneficial. Discover how to use verifiers for selecting optimal answers and explore methods for critiquing responses during inference. Gain a deep understanding of temperature in sampling, including when to use zero or non-zero values, and explore alternatives like beam search, top-p, top-k, and min-p sampling. Learn to implement chain-of-thought reasoning and see practical demonstrations using datasets like GSM8K and HotpotQA. Follow along with notebook run-throughs, learn to use VLLM for efficient batching, and discover techniques for scoring and grading responses. Analyze the results of various sampling and chain-of-thought configurations through detailed charts, and receive expert recommendations on optimizing these parameters for different tasks.

Syllabus

OpenAI o1 type techniques for scaling test time compute
Video Overview temperature, chain of thought
Training compute versus test time compute
Why spend more compute on test time / inference?
Using verifiers to select the best answers
Exploring and critiquing/verifying answers during inference
Understanding Temperature for sampling
Should you set temperature to zero?
Beam search
Problems with setting a non-zero temperature
Using top p, top k, min p, and best of
Recap on choosing temperature for sampling
How to implement chain of thought
Setup for notebook run-through on gsm8k and hotpot qa
Using sampling and chain of thought on hotpotqa and gsm8k
Running vllm in a Jupyter notebook allows for batching
Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
Multi-threading the scoring / grading for speed
Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
Controlling sampling parameters min p, top p, top k, beam search, temperature
Running temperature / sampling ablations WITHOUT chain of thought
Chain of Thought Setup
Running ablations WITH chain of thought
GSM8K Results Charts
Hotpot QA Results Charts
Recommendations on sampling, temperature and chain of thought
Video resources

Taught by

Trelis Research

Reviews

Start your review of Test Time Compute: Sampling and Chain of Thought Techniques

Taught by

Test Time Compute: Verifiers and Parallel Sampling - Part 2

Never Stop Learning.