Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Cloud Hadoop: Scaling Apache Spark

via LinkedIn Learning

Overview

Generate genuine business insights from big data. Learn to implement Apache Hadoop and Spark workflows on AWS.

Syllabus

Introduction
  • Scaling Apache Hadoop and Spark
  • What you should know
  • Using cloud services
1. Hadoop and Spark Fundamentals
  • Modern Hadoop and Spark
  • File systems used with Hadoop and Spark
  • Apache or commercial Hadoop distros
  • Hadoop and Spark libraries
  • Hadoop on Google Cloud Platform
  • Spark Job on Google Cloud Platform
2. AWS Cloud Spark Environments
  • Sign up for Databricks Community Edition
  • Add Hadoop libraries
  • Databricks AWS Community Edition
  • Load data into tables
  • Hadoop and Spark cluster on AWS EMR
  • Run Spark job on AWS EMR
  • Review batch architecture for ETL on AWS
3. Spark Basics
  • Apache Spark libraries
  • Spark data interfaces
  • Select your programming language
  • Spark session objects
  • Spark shell
4. Using Spark
  • Tour the Databricks Environment
  • Tour the notebook
  • Import and export notebooks
  • Calculate Pi on Spark
  • Run WordCount of Spark with Scala
  • Import data
  • Transformations and actions
  • Caching and the DAG
  • Architecture: Streaming for prediction
5. Spark Libraries
  • Spark SQL
  • SparkR
  • Spark ML: Preparing data
  • Spark ML: Building the model
  • Spark ML: Evaluating the model
  • Advanced machine learning on Spark
  • MXNet
  • Spark with ADAM for genomics
  • Spark architecture for genomics
6. Spark Streaming
  • Reexamine streaming pipelines
  • Spark Streaming
  • Streaming ingest services
  • Advanced Spark Streaming with MLeap
7. Scaling Spark on AWS and GCP
  • Scale Spark on the cloud by example
  • Build a quick start with Databricks AWS
  • Scale Spark cloud compute with VMs
  • Optimize cloud Spark virtual machines
  • Use AWS EKS containers and data lake
  • Optimize Spark cloud data tiers on Kubernetes
  • Build reproducible cloud infrastructure
  • Scale on GCP Dataproc or on Terra.bio
Conclusion
  • Continue learning for scaling

Taught by

Lynn Langit

Reviews

4.6 rating at LinkedIn Learning based on 192 ratings

Start your review of Cloud Hadoop: Scaling Apache Spark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.