Explore a conference talk from PLDI 2023 that presents a fault-tolerant programming model for cloud applications based on actors, retry orchestration, and tail calls. Delve into how this model leverages persistent data stores and message queues to ensure resilience against failures and interruptions. Learn about the key features of retry orchestration, including guaranteed retries for failed actor invocations, prevention of repeated completed invocations, and preservation of happen-before relationships across failures within call stacks. Discover how tail calls can be utilized to break complex tasks into simpler steps, minimizing re-execution during recovery. Examine key application patterns, failure scenarios, and a formalized process calculus that captures the fault tolerance mechanisms. Gain insights into the implementation and its functional correctness, validated through an enterprise-inspired application scenario. Assess the impact of fault preparedness and recovery on performance in cloud-based systems.
Overview
Syllabus
[PLDI'23] Reliable Actors with Retry Orchestration
Taught by
ACM SIGPLAN