Data Processing with Azure

Overview

This Azure training course is designed to equip students with the knowledge need to process, store and analyze data for making informed business decisions. Through this Azure course, the student will understand what big data is along with the importance of big data analytics, which will improve the students mathematical and programming skills. Students will learn the most effective method of using essential analytical tools such as Python, R, and Apache Spark.

Syllabus

Introduction

This Azure training course is designed to equip the students with the knowledge need to process, store and analyze data for making informed business decisions. Through this Azure course, the student will understand what big data is along with the importance of big data analytics, which will improve the students mathematical and programming skills. Students will learn the most effective method of using essential analytical tools such as R, and Apache Spark.

Section 1 - Batch Processing with Databricks and Data Factory on Azure

One of the primary benefits of Azure Databricks is its ability to integrate with many other data environments to pull data through an ETL or ELT process. In module course, we examine each of the E, L, and T to learn how Azure Databricks can help ease us into a cloud solution.

Section 2 - Creating Pipelines and Activities

Processing big data in real-time is now an operational necessity for many businesses. Azure Stream Analytics is Microsoft’s serverless real-time analytics offering for complex event processing. In this section we examine how customers unlock valuable insights and gain competitive advantage by harnessing the power of big data.

Section 3 - Link Services and Datasets

A data factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. The activities in a pipeline define actions to perform on your data. Before you create a dataset, you must create a linked service to link your data store to the data factory. This section deals with linked services and data sets within Azure Blob Storage.

Section 4 - Schedules and Triggers

Azure Data Factory is a fully managed, cloud-based data orchestration service that enables data movement and transformation. In this section, we explore scheduling triggers for Azure Data Factory to automate your pipeline execution.

Section 5 - Selecting Windowing Functions

In time-streaming scenarios, performing operations on the data contained in temporal windows is a common pattern. Stream Analytics has native support for windowing functions, enabling developers to author complex stream processing jobs with minimal effort. In this section, we study windowing functions related to in-stream analytics.

Section 6 - Configuring Input and Output for Streaming Data Solutions

This section teaches how to analyze phone call data using Azure Stream Analytics. The phone call data, generated by a client application, contains some fraudulent calls, which will be filtered by the Stream Analytics job.

Section 7 - ELT versus ETL in Polybase

Traditional SMP data warehouses use an Extract, Transform and Load (ETL) process for loading data. Azure SQL Data Warehouse is a massively parallel processing (MPP) architecture that takes advantage of the scalability and flexibility of compute and storage resources. Utilizing an Extract, Load, and Transform (ELT) process can take advantage of MPP and eliminate resources needed to transform the data prior to loading.