Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Microsoft

Distributed programming on the cloud

Microsoft via Microsoft Learn

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
  • Module 1: Carnegie Mellon University's Cloud Developer course. Learn about distributed programming and why it's useful for the cloud, including programming models, types of parallelism, and symmetrical vs. asymmetrical architecture.
  • In this module, you will:

    • Classify programs as sequential, concurrent, parallel, and distributed
    • Indicate why programmers usually parallelize sequential programs
    • Explain why cloud programs are important for solving complex computing problems
    • Define distributed systems, and indicate the relationship between distributed systems and clouds
    • Define distributed programming models
    • Indicate why synchronization is needed in shared-memory systems
    • Describe how tasks can communicate by using the message-passing programming model
    • Outline the difference between synchronous and asynchronous programs
    • Explain the bulk synchronous parallel (BSP) model
    • Outline the difference between data parallelism and graph parallelism
    • Distinguish between these distributed programs: single program, multiple data (SPMD); and multiple program, multiple data (MPMD)
    • Discuss the two main techniques that can be incorporated in distributed programs so as to address the communication bottleneck in the cloud
    • Define heterogeneous and homogenous clouds, and identify the main reasons for heterogeneity in the cloud
    • State when and why synchronization is required in the cloud
    • Identify the main technique that can be used to tolerate faults in clouds
    • Outline the difference between task scheduling and job scheduling

    In partnership with Dr. Majd Sakr and Carnegie Mellon University.

  • Module 2: Carnegie Mellon University's cloud developer course. MapReduce was a breakthrough in big data processing that has become mainstream and been improved upon significantly. Learn about how MapReduce works.
  • In this module, you will:

    • Identify the underlying distributed programming model of MapReduce
    • Explain how MapReduce can exploit data parallelism
    • Identify the input and output of map and reduce tasks
    • Define task elasticity, and indicate its importance for effective job scheduling
    • Explain the map and reduce task-scheduling strategies in Hadoop MapReduce
    • List the elements of the YARN architecture, and identify the role of each element
    • Summarize the lifecycle of a MapReduce job in YARN
    • Compare and contrast the architectures and the resource allocators of YARN and the previous Hadoop MapReduce
    • Indicate how job and task scheduling differ in YARN as opposed to the previous Hadoop MapReduce

    In partnership with Dr. Majd Sakr and Carnegie Mellon University.

  • Module 3: Carnegie Mellon University's cloud developer course. GraphLab is a big data tool developed by Carnegie Mellon University to help with data mining. Learn about how GraphLab works and why it's useful.
  • In this module, you will:

    • Describe the unique features in GraphLab and the application types that it targets
    • Recall the features of a graph-parallel distributed programming framework
    • Recall the three main parts in the GraphLab engine
    • Describe the steps that are involved in the GraphLab execution engine
    • Discuss the architectural model of GraphLab
    • Recall the scheduling strategy of GraphLab
    • Describe the programming model of GraphLab
    • List and explain the consistency levels in GraphLab
    • Describe the in-memory data placement strategy in GraphLab and its performance implications for certain types of graphs
    • Discuss the computational model of GraphLab
    • Discuss the fault-tolerance mechanisms in GraphLab
    • Identify the steps that are involved in the execution of a GraphLab program
    • Compare and contrast MapReduce, Spark, and GraphLab in terms of their programming, computation, parallelism, architectural, and scheduling models
    • Identify a suitable analytics engine given an application's characteristics

    In partnership with Dr. Majd Sakr and Carnegie Mellon University.

  • Module 4: Carnegie Mellon University's cloud developer course. Spark is an open-source cluster-computing framework with different strengths than MapReduce has. Learn about how Spark works.
  • In this module, you will:

    • Recall the features of an iterative programming framework
    • Describe the architecture and job flow in Spark
    • Recall the role of resilient distributed datasets (RDDs) in Spark
    • Describe the properties of RDDs in Spark
    • Compare and contrast RDDs with distributed shared-memory systems
    • Describe fault-tolerance mechanics in Spark
    • Describe the role of lineage in RDDs for fault tolerance and recovery
    • Understand the different types of dependencies between RDDs
    • Understand the basic operations on Spark RDDs
    • Step through a simple iterative Spark program
    • Recall the various Spark libraries and their functions

    In partnership with Dr. Majd Sakr and Carnegie Mellon University.

  • Module 5: Carnegie Mellon University's cloud developer course. The increase of available data has led to the rise of continuous streams of real-time data to process. Learn about different systems and techniques for consuming and processing real-time data streams.
  • In this module, you will:

    • Define a message queue and recall a basic architecture
    • Recall the characteristics, and present the advantages and disadvantages, of a message queue
    • Explain the basic architecture of Apache Kafka
    • Discuss the roles of topics and partitions, as well as how scalability and fault tolerance are achieved
    • Discuss general requirements of stream processing systems
    • Recall the evolution of stream processing
    • Explain the basic components of Apache Samza
    • Discuss how Apache Samza achieves stateful stream processing
    • Discuss the differences between the Lambda and Kappa architectures
    • Discuss the motivation for the adoption of message queues and stream processing in the LinkedIn use case

    In partnership with Dr. Majd Sakr and Carnegie Mellon University.

Syllabus

  • Module 1: What is distributed programming?
    • Introduction
    • Categories of computer programs
    • Why use distributed programming?
    • Distributed programming on the cloud
    • Programming models for clouds
    • Synchronous vs. asynchronous computation
    • Types of parallelism
    • Symmetrical vs. asymmetrical architecture
    • Cloud challenges: Scalability
    • Cloud challenges: Communication
    • Cloud challenges: Heterogeneity
    • Cloud challenges: Synchronization
    • Cloud challenges: Fault tolerance
    • Cloud challenges: Scheduling
    • Summary
  • Module 2: Distributed computing on the cloud: MapReduce
    • Introduction
    • Programming model
    • Data structure
    • Example MapReduce programs
    • Computation and architectural models
    • Job and task scheduling
    • Fault tolerance
    • YARN
    • Summary
  • Module 3: Distributed computing on the cloud: GraphLab
    • Introduction
    • Data structure and graph flow
    • Architectural model
    • Programming model
    • Computational model
    • Fault tolerance
    • An example application in GraphLab
    • Comparison of distributed analytics engines
    • Summary
  • Module 4: Distributed computing on the cloud: Spark
    • Introduction
    • Spark overview
    • Resilient distributed datasets
    • Lineage, fault tolerance, and recovery
    • Programming in Spark
    • The Spark ecosystem
    • Summary
  • Module 5: Message queues and stream processing
    • Introduction
    • Message queues
    • Message queues: Case study
    • Stream processing systems
    • Streaming architectures: Case study
    • Big data processing architectures
    • Real-time architectures in practice
    • Summary

Reviews

Start your review of Distributed programming on the cloud

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.