Overview
Explore statistical typing as a runtime typing system for data science and machine learning in this 22-minute PyCon US talk. Discover how statistical typing extends primitive data types to create multivariate schemas, enabling more effective data validation and testing. Learn how to use pandera, a pandas data testing library, to implement statistical typing concepts, validate real-world data with reusable schemas, and isolate units of processing, analysis, and model-training code. Gain insights into overcoming barriers in testing data processing, analysis, and model-training code, and understand how statistical typing can improve the quality of datasets for visualization, statistical inference, and modeling.
Syllabus
TALK / Niels Bantilan / Statistical Typing: A Runtime TypingSystem for Data Science&Machine Learning
Taught by
PyCon US