Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Cloud Hadoop: Scaling Apache Spark

via LinkedIn Learning

Go to class Write review

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Generate genuine business insights from big data. Learn to implement Apache Hadoop and Spark workflows on AWS.

Syllabus

Introduction

Scaling Apache Hadoop and Spark
What you should know
Using cloud services

1. Hadoop and Spark Fundamentals

Modern Hadoop and Spark
File systems used with Hadoop and Spark
Apache or commercial Hadoop distros
Hadoop and Spark libraries
Hadoop on Google Cloud Platform
Spark Job on Google Cloud Platform

2. AWS Cloud Spark Environments

Sign up for Databricks Community Edition
Add Hadoop libraries
Databricks AWS Community Edition
Load data into tables
Hadoop and Spark cluster on AWS EMR
Run Spark job on AWS EMR
Review batch architecture for ETL on AWS

3. Spark Basics

Apache Spark libraries
Spark data interfaces
Select your programming language
Spark session objects
Spark shell

4. Using Spark

Tour the Databricks Environment
Tour the notebook
Import and export notebooks
Calculate Pi on Spark
Run WordCount of Spark with Scala
Import data
Transformations and actions
Caching and the DAG
Architecture: Streaming for prediction

5. Spark Libraries

Spark SQL
SparkR
Spark ML: Preparing data
Spark ML: Building the model
Spark ML: Evaluating the model
Advanced machine learning on Spark
MXNet
Spark with ADAM for genomics
Spark architecture for genomics

6. Spark Streaming

Reexamine streaming pipelines
Spark Streaming
Streaming ingest services
Advanced Spark Streaming with MLeap

7. Scaling Spark on AWS and GCP

Scale Spark on the cloud by example
Build a quick start with Databricks AWS
Scale Spark cloud compute with VMs
Optimize cloud Spark virtual machines
Use AWS EKS containers and data lake
Optimize Spark cloud data tiers on Kubernetes
Build reproducible cloud infrastructure
Scale on GCP Dataproc or on Terra.bio

Conclusion

Continue learning for scaling

Taught by

Lynn Langit

Reviews

4.6 rating at LinkedIn Learning based on 192 ratings

Start your review of Cloud Hadoop: Scaling Apache Spark