Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Introduction to Spark Datasets

Scala Days Conferences via YouTube

Overview

Explore Apache Spark's Dataset API in this 43-minute conference talk from Scala Days Copenhagen 2017. Dive into the basics of working with Spark Datasets, a hybrid approach that combines functional and relational programming concepts. Learn about Spark's components, including machine learning and streaming, and how they're being rewritten to support Dataset-compatible APIs. Discover the performance benefits and space efficiency of Spark SQL, and gain hands-on experience loading JSON data, applying schemas, and performing relational transformations. Understand how the optimizer works and how to mix functional and relational styles effectively. Examine windowed operations and window specifications, and grasp why Datasets are becoming increasingly important in the Spark ecosystem. No prior Spark knowledge is required, but a basic understanding of Scala is recommended.

Syllabus

Intro
What is Spark?
The different pieces of Spark
Why should we consider Spark SQL?
What is the performance like?
How is it so fast?
How much more space efficient?
Getting started
Loading some simple JSON data
Sample case class for schema
Then apply some type magic
What do relational transforms look like?
Writing a relational transformation
What can the optimizer do now?
Using Datasets to mix functional & relational style
And functional style maps
What is DS functional perf like?
Build the recipe for each query
Windowed operations
Window specs
Summary: Why to use Datasets
The next book.....

Taught by

Scala Days Conferences

Reviews

Start your review of Introduction to Spark Datasets

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.