Improve the overall analytic workflow of your organization by boosting your data cleaning skills in Python.
Overview
Syllabus
Introduction
- Why is clean data important?
- What you should know
- Using GitHub Codespaces with this course
- Types of errors
- Missing values
- Bad values
- Duplicates
- Human errors
- Machine errors
- Design errors
- Challenge: UI design
- Solution: UI design
- Schemas
- Validation
- Finding missing data
- Domain knowledge
- Subgroups
- Challenge: Find bad data
- Solution: Find bad data
- Serialization formats
- Digital signature
- Data pipelines and automation
- Transactions
- Data organization and tidy data
- Process and data quality metrics
- Challenge: ETL
- Solution: ETL
- Renaming fields
- Fixing types
- Joining and splitting data
- Deleting bad data
- Filling missing values
- Reshaping data
- Challenge: Workshop earnings
- Solution: Workshop earnings
- Next steps
Taught by
Miki Tebeka