Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Apache PySpark by Example

via LinkedIn Learning

Overview

Get up and running with Apache Spark quickly. This practical hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.

Syllabus

Introduction
  • Apache PySpark
  • What you should know
1. Introduction to Apache Spark
  • The Apache Spark ecosystem
  • Why Spark?
  • Spark origins and Databricks
  • Spark components
  • Partitions, transformations, lazy evaluations, and actions
2. Technical Setup
  • Set up the lab environment
  • Download a dataset
  • Importing
3. Working with the DataFrame API
  • The DataFrame API
  • Working with DataFrames
  • Schemas
  • Working with columns
  • Working with rows
  • Challenge
  • Solution
4. Functions
  • Built-in functions
  • Working with dates
  • User-defined functions
  • Working with joins
  • Challenge
  • Solution
5. Resilient Distributed Datasets (RDDs)
  • RDDs
  • Working with RDDs
Conclusion
  • Next steps

Taught by

Jonathan Fernandes

Reviews

4.7 rating at LinkedIn Learning based on 1259 ratings

Start your review of Apache PySpark by Example

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.