Use Scala in your data science work. Explore the Scala features most useful to data scientists, including custom functions, parallel processing, and programming Spark with Scala.
Overview
Syllabus
Introduction
- Welcome
- What you should know
- Using the exercise files
- The advantages of Scala for data science
- Installing Scala
- Scala data types
- Scala collections
- Scala sets Scala arrays, vectors, and ranges
- Scala maps
- Scala expressions
- Scala functions
- Scala objects
- Advantages of parallel collections
- Creating parallel collections
- Mapping functions over parallel collections
- Filtering parallel collections
- When and when not to use parallel collections
- Installing PostgreSQL
- Loading data into PostgreSQL
- Connecting to PostgreSQL
- Querying with SQL strings
- Querying with prepared statements
- Summary of SQL in Scala
- Introduction to Spark
- Installing Spark
- Getting Started with Spark RDDs
- Mapping Functions over RDDs
- Statistics over RDDs
- Summary of Scala and Spark RDDs
- Creating DataFrames
- Grouping and filtering on DataFrames
- Joining DataFrames
- Working with JSON files
- Summary of Scala and Spark DataFrames
- Review of Scala for data science
Taught by
Dan Sullivan