Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Simplifying Testing of Spark Applications

Linux Foundation via YouTube

Overview

Explore techniques for simplifying testing of Spark applications in this conference talk presented by Megan Yow from Sobeys and Han Wang from Lyft. Dive into the advantages of using Spark, compare Python Spark with pandas, and understand various scenarios involving Python UDFs, Pandas UDFs, and Pandas Type Hints. Learn about undecorated UDFs, the Functional API, and working with Spark DataFrames. Gain insights into using Databricks and Kaggle Notebooks for Spark development. Discover best practices for testing in Spark, including exploration goals and mindset. Watch a demonstration showcasing a notebook environment, tokenization techniques, and sentiment score analysis. This 46-minute presentation provides valuable knowledge for developers looking to enhance their Spark application testing skills.

Syllabus

Intro
Why use Spark
Python Spark
pandas
Python vs Spark
Scenarios
Python UDF
Pandas UDF
Pandas Type Hints
Undecorated UDFs
Functional API
Spark DataFrame
Databricks
Kaggle Notebook
Testing in Spark
Exploration Goals
Mindset
Demo
Notebook Environment
Tokenization
Sentiment Scores

Taught by

Linux Foundation

Reviews

Start your review of Simplifying Testing of Spark Applications

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.