Learn Python programming skills for data science and machine learning. Discover how to clean, transform, analyze, and visualize data, as you build a practical, real-world project.
Overview
Syllabus
Introduction
- Data science life hacks
- What you should know
- How to use Codespaces with this course
- Introduction to the data professions
- Data science careers: Identifying where and how you'll thrive
- Why to use Python for analytics
- High-level course road map
- Intro to data preparation
- Numpy and pandas basics
- Filtering and selecting
- Treating missing values
- Removing duplicates
- Concatenating and transforming
- Grouping and aggregation
- Importance of visualization in data science
- The three types of data visualization
- Selecting optimal data graphics
- Communicating with color and context
- Introduction to the matplotlib and Seaborn libraries
- Creating standard data graphics
- Defining elements of a plot
- Plot formatting
- Creating labels and annotations
- Visualizing time series
- Creating statistical data graphics in Seaborn
- Simple arithmetic
- Generating summary statistics
- Summarizing categorical data
- Pearson correlation analysis
- Spearman rank correlation and Chi-square
- Extreme value analysis for outliers
- Multivariate analysis for outliers
- Cleaning and treating categorical variables
- Transforming data set distributions
- Applied machine learning: Starter problem
- Introduction of web scraping
- Python requests for automating data collection
- BeautifulSoup object
- NavigableString objects
- Data parsing
- Web scraping in practice
- Asynchronous scraping
- Introduction to Streamlit
- Environment setup
- Create basic charts
- Line charts in Streamlit
- Bar charts and pie charts in Streamlit
- Create statistical charts
- Next steps
Taught by
Lillian Pierson, P.E.