This lab demonstrates how to launch an Amazon Elastic MapReduce (EMR) cluster for Big Data processing and use Hive with SQL-style queries to analyze data. You will create a Hadoop cluster using Amazon EMR which will allow to run interactive Hive queries against data stored in Amazon S3. You will use Hive to normalize the data in a more useful way, and you will run queries to analyze the data.
Level
Advanced
Duration
1 Hours 15 MinutesCourse Objectives
In this course, you will learn how to:
- Create an Amazon EMR cluster running Hive
- Use Hive statements to create tables from Google Ngram input data stored in Amazon S3
- Run Hive queries to drill-down and analyze data
Intended Audience
This course is intended for:
- Architects
- Data Engineers
Prerequisites
We recommend that attendees of this course have the following prerequisites:
- None
Course Outline
- Task 1: Launch an Amazon EMR cluster
- Task 2: Connect to Your Cluster
- Task 3: Analyze Data