Overview
Explore a 30-minute AutoML seminar that introduces Context-Aware Automated Feature Engineering (CAAFE), a novel approach leveraging large language models for enhanced feature engineering in tabular datasets. Learn how CAAFE iteratively generates semantically meaningful features by incorporating domain knowledge through LLMs, producing both executable Python code and clear explanations for each generated feature. Discover how this methodologically straightforward approach improved performance on 11 out of 14 datasets, achieving a mean ROC AUC performance increase from 0.798 to 0.822 - comparable to the improvement gained when switching from logistic regression to random forest. Presented by Noah Hollmann, this talk demonstrates CAAFE's potential in advancing semi-automated data science tasks and expanding AutoML capabilities through semantic understanding, with practical implementations available through the provided GitHub repository.
Syllabus
LLMs for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering
Taught by
AutoML Seminars