Watch a Harvard CMSA conference talk exploring the theoretical foundations and practical implications of data splitting in statistical analysis. Delve into cross-validation methods and their asymptotic properties across various models, understanding how to determine optimal fold numbers and calculate accurate confidence intervals. Learn about the statistical advantages of cross-validation compared to train-test splits, and explore the role of cross-fitting in the generalized method of moments. Discover when sample splitting is necessary for machine learning estimators and when data can be reused effectively, particularly in moderate sample sizes. Through examples including cross-validation and cross-fitting, gain insights into stability conditions, central limit theorems, and Berry-Esseen bounds that guide data splitting decisions in modern statistical applications.
To Split or Not to Split: From Cross Validation to Debiased Machine Learning
Harvard CMSA via YouTube
Overview
Syllabus
Intro
Data streaming
Addition method
Naive ideas
Natural questions
Dependence
Intuition
Remarks
Generous method moment
Data splitting
Data augmentation
Two examples
Double descent curve
Taught by
Harvard CMSA