Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Pluralsight

Beginning Data Exploration and Analysis with Apache Spark

via Pluralsight

Overview

80% of a data scientist's job is data preparation. This course is all about data preparation i.e. cleaning, transforming, summarizing data using Spark.

Data preparation is a staple task for any data professional, whether you just want to explore data or develop sophisticated Machine Learning models. Spark is an engine that helps do this in a very intuitive way, using functional constructs that abstract the user from all the messiness of working with large datasets. In this course, Beginning Data Exploration and Analysis with Apache Spark, you'll go through exploratory data analysis and data munging with Spark, step-by-step. First, you'll explore RDDs and functional constructs that make processing in Spark extremely intuitive. Next, you'll discover how to transform and clean unstructured data. Finally, you'll learn how to summarize data along dimensions and how to model relationships to build co-occurrence networks. By the end of this course, you'll be able to use Spark to transform data in any way that you would like.

Syllabus

  • Course Overview 1min
  • Getting Started with Spark's Resilient Distributed Datasets 27mins
  • Transforming and Cleaning Unstructured Data 32mins
  • Summarizing Data Along Dimensions 30mins
  • Modeling Relationships in the Marvel Social Universe 25mins

Taught by

Swetha Kolalapudi

Reviews

4.5 rating at Pluralsight based on 125 ratings

Start your review of Beginning Data Exploration and Analysis with Apache Spark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.