Dive into one of the most important data science tools: SQL. Learn how to use joins and subqueries, statistical functions, window functions, and much more.
Overview
Syllabus
Introduction
- The need for SQL in data science
- What you should know
- Overview of data science operations
- Data manipulation commands
- Data definition commands
- SQL standards
- Installing PostgreSQL
- Loading data
- Basic aggregate functions
- Statistical aggregate functions
- Grouping and filtering data
- Joining and filtering data
- Challenge: Write a Query with a Filter
- Solution: Write a Query with a Filter
- Reformat character data
- Extract strings from character data
- Filter with regular expressions
- Reformat numeric data
- Use SOUNDEX with misspelled text
- Challenge: Prepare a data set for analysis
- Solution: Prepare a data set for analysis
- Use the HAVING clause to find subgroups
- Subqueries for column values
- Subqueries in FROM clauses
- Subqueries in WHERE clauses
- Use ROLLUP to create subtotals
- Use CUBE to total across dimensions
- Use Top-N queries to find top results
- Challenge: Filter and aggregate a data set
- Solution: Filter and aggregate a data set
- Introduction to window functions
- NTH_VALUE and NTILE
- RANK, LEAD, and LAG
- WIDTH_BUCKET and CUME_DIST
- Challenge: Segment a data set using Window functions
- Solution: Segment a data set using Window functions
- Introduction to common table expressions (CTEs)
- Multiple table common table expressions
- Hierarchical tables
- Recursive common table expressions
- Challenge: Rewrite a complex query to use CTEs
- Solution: Rewrite a complex query to use CTEs
- Next steps
Taught by
Dan Sullivan