Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Fault Tolerance in Distributed Systems - A Case Study in Apache Spark

Scala Days Conferences via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore fault tolerance in distributed systems through a case study of Apache Spark in this Scala Days conference talk. Delve into the challenges of building robust distributed computing platforms and learn valuable lessons applicable to developing your own systems. Examine how Spark implements fault-tolerance, including its use of Scala and functional programming principles, as well as instances where it deviates from concepts like immutability. Gain insights into the Spark computation model, its similarities to MapReduce, and how it extends beyond this paradigm. Understand what fault tolerance truly means in practice, including how to handle hardware failures and the importance of fault injection testing. Discover the limitations of platform guarantees and learn essential questions to ask when evaluating distributed systems. Through real-world examples and code analysis, gain a deeper understanding of fault tolerance implementation and its implications for both system developers and users.

Syllabus

Intro
Spark Computation Model: Like MapReduce
Beyond MapReduce
What is Fault Tolerance?
One Bad Disk Spoils The Whole Bunch
Handling Flaky Hardware (SPARK-8425)
Miracles do happen
Testing Fault Tolerance
Fault Injection Testing
What Fault Tolerance might mean to you

Taught by

Scala Days Conferences

Reviews

Start your review of Fault Tolerance in Distributed Systems - A Case Study in Apache Spark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.