Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore machine learning techniques for handling non-curated data in this 43-minute EuroPython Conference talk. Delve into practical solutions for two common dirty-data problems: missing values and non-normalized entries. Learn how to implement standard machine learning tools like scikit-learn when dealing with these data errors. Discover the importance of imputation and adding missingness indicators for handling missing values, and understand how to create vectorial representations for non-normalized categories. Gain insights from theoretical analyses and recent machine learning publications to improve your data science workflow and efficiency when working with imperfect datasets.
Syllabus
Gael Varoquaux - Machine learning on non curated data
Taught by
EuroPython Conference