Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

To Split or Not to Split: From Cross Validation to Debiased Machine Learning

Harvard CMSA via YouTube

Overview

Watch a Harvard CMSA conference talk exploring the theoretical foundations and practical implications of data splitting in statistical analysis. Delve into cross-validation methods and their asymptotic properties across various models, understanding how to determine optimal fold numbers and calculate accurate confidence intervals. Learn about the statistical advantages of cross-validation compared to train-test splits, and explore the role of cross-fitting in the generalized method of moments. Discover when sample splitting is necessary for machine learning estimators and when data can be reused effectively, particularly in moderate sample sizes. Through examples including cross-validation and cross-fitting, gain insights into stability conditions, central limit theorems, and Berry-Esseen bounds that guide data splitting decisions in modern statistical applications.

Syllabus

Intro
Data streaming
Addition method
Naive ideas
Natural questions
Dependence
Intuition
Remarks
Generous method moment
Data splitting
Data augmentation
Two examples
Double descent curve

Taught by

Harvard CMSA

Reviews

Start your review of To Split or Not to Split: From Cross Validation to Debiased Machine Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.