Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

DataCamp

Big Data Fundamentals with PySpark

via DataCamp

Overview

Learn the fundamentals of working with big data with PySpark.

There's been a lot of buzz about Big Data over the past few years, and it's finally become mainstream for many companies. But what is this Big Data? This course covers the fundamentals of Big Data via PySpark. Spark is a "lightning fast cluster computing" framework for Big Data. It provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop. You’ll use PySpark, a Python package for Spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc. You will explore the works of William Shakespeare, analyze Fifa 2018 data and perform clustering on genomic datasets. At the end of this course, you will have gained an in-depth understanding of PySpark and its application to general Big Data analysis.

Syllabus

  • Introduction to Big Data analysis with Spark
    • This chapter introduces the exciting world of Big Data, as well as the various concepts and different frameworks for processing Big Data. You will understand why Apache Spark is considered the best framework for BigData.
  • Programming in PySpark RDD’s
    • The main abstraction Spark provides is a resilient distributed dataset (RDD), which is the fundamental and backbone data type of this engine. This chapter introduces RDDs and shows how RDDs can be created and executed using RDD Transformations and Actions.
  • PySpark SQL & DataFrames
    • In this chapter, you'll learn about Spark SQL which is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. This chapter shows how Spark SQL allows you to use DataFrames in Python.
  • Machine Learning with PySpark MLlib
    • PySpark MLlib is the Apache Spark scalable machine learning library in Python consisting of common learning algorithms and utilities. Throughout this last chapter, you'll learn important Machine Learning algorithms. You will build a movie recommendation engine and a spam filter, and use k-means clustering.

Taught by

Upendra Kumar Devisetty

Reviews

1.0 rating, based on 1 Class Central review

4 rating at DataCamp based on 19 ratings

Start your review of Big Data Fundamentals with PySpark

  • Anonymous
    This is probably the worst course in all of DataCamp - a definite decline from the first course. For beginners with not much background, there is a lot of jargon put in. In addition, the lecturer is incredibly boring to listen to. Halfway through, I decided that I would rather look at the subtitles on mute instead. It is perhaps a cheap shot, but his credentials were totally unimpressive - I don't know why DataCamp could not find a better teacher for the course.

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.