Learn all the essentials of Hadoop, a key tool for processing and understanding big data.
Overview
Syllabus
Introduction
- What and why Hadoop?
- What you should know
- Use cloud services
- What is Hadoop?
- Review Hadoop distributions and cloud services
- Set up GCP Dataproc Metastore and VM cluster
- Verify GCP Dataproc VM cluster
- Understand Hadoop components
- Understand Java virtual machines (JVMs)
- Explore Hadoop file systems: HDFS
- Explore Hadoop file systems: AWS S3
- Review Hadoop cluster components
- Review test jobs
- Review job output
- Verify Hadoop web interfaces in your test environment
- Verify Hadoop Spark web interfaces in your test environment
- Use the Jupyter interface for Hadoop
- What is MapReduce?
- What is MapReduce word count?
- Review MapReduce word count job
- Prepare for MapReduce Java coding
- Review MapReduce WordCount job code
- Tune by physical methods
- Tune a Mapper
- Understanding data types
- Tune a Reducer
- Use MR 2.0 and 3.0
- Review MR optimization examples
- Migrate to Cloud Hadoop
- Scale VM-based Clusters
- Use autoscale policies
- Scale Kubernetes Spark clusters
- Understand Hive and HBase
- Create and query tables with Hive
- Understand Pig
- Run WordCount using Pig
- Review Spark architecture
- Scale a Spark job to calculate Pi
- Learn more about using Hadoop
Taught by
Lynn Langit