Learn essential strategies for post-incident management in this AWS re:Invent conference session that delves into the critical phases following system failures and incidents. Explore effective approaches to root cause analysis, preventive action planning, and implementing lasting solutions within cloud architectures. Discover mental models and real-world experiences that help identify true root causes while considering the complexities of shared responsibility models and third-party vendor relationships. Master the art of transforming incidents into valuable learning opportunities, establishing robust correction of error (COE) practices, and developing a culture of continuous improvement within your organization. Gain practical insights for building resilient operational practices that go beyond immediate incident resolution to ensure long-term system reliability and organizational learning.
Overview
Syllabus
AWS re:Invent 2024 - The incident is over: Now what? (ARC207)
Taught by
AWS Events