Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the integration of R programming with Databricks and Apache Spark using the SparkR package in this comprehensive tutorial. Dive into the architecture, Apache Arrow, and the SparkR API while learning about open-source libraries and example code. Master essential techniques such as masking, SparkSQL, and SparkR SQL. Discover how to create local dataframes, print schemas, and perform data transformations using piping, mutate, and aggregate functions. Practice DataFrame operations, including joins and merges, and tackle a coding challenge to reinforce your newly acquired skills.
Syllabus
Intro
Architecture
Apache Arrow
What is SparkR
SparkR API
Open Source Libraries
Example Code
Masking
SparkSQL
SparkR SQL
SparkR Display
Masking Objects
Creating a Local Dataframe
SparkR Print Schema
SparkR Piping
Transform
Mutate
DataFrame
Aggregate Functions
DataFrame Operations
Join
Merge
Challenge
Code
Notebook
Taught by
Bryan Cafferky