To Split or Not to Split: From Cross Validation to Debiased Machine Learning

Overview

Watch a Harvard CMSA conference talk exploring the theoretical foundations and practical implications of data splitting in statistical analysis. Delve into cross-validation methods and their asymptotic properties across various models, understanding how to determine optimal fold numbers and calculate accurate confidence intervals. Learn about the statistical advantages of cross-validation compared to train-test splits, and explore the role of cross-fitting in the generalized method of moments. Discover when sample splitting is necessary for machine learning estimators and when data can be reused effectively, particularly in moderate sample sizes. Through examples including cross-validation and cross-fitting, gain insights into stability conditions, central limit theorems, and Berry-Esseen bounds that guide data splitting decisions in modern statistical applications.

Syllabus

Intro
Data streaming
Addition method
Naive ideas
Natural questions
Dependence
Intuition
Remarks
Generous method moment
Data splitting
Data augmentation
Two examples
Double descent curve

Taught by

Harvard CMSA

Reviews

Start your review of To Split or Not to Split: From Cross Validation to Debiased Machine Learning

Taught by

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.