Overview
Explore principles of cognition, teaming, and coordination to support high-performance and resilient Site Reliability Engineering in this 57-minute conference talk from SREcon22 APAC. Discover how to design work systems that enhance SREs' ability to recognize anomalies, adapt to changing conditions, and effectively coordinate across organizational boundaries. Learn about cognitive and coordinative mechanisms underlying resilient software engineering, drawing from engineering psychology, design thinking, cognitive systems engineering, and contemporary management theory. Gain insights from practical case studies and engaging stories collected over five years of studying software engineers at work. Delve into topics such as finely tuned skills, team dynamics, resilient organizations, incident management, and the Howie Guide for incident reviews. Understand the importance of cognition and coordination in SRE practice, explore multifaceted uses of incident reports, and examine the concept of learning in this context. Discuss organizational culture, change management, and experimentation while addressing the challenges of avoiding silos, normalizing knowledge gaps, and fostering a culture of knowledge sharing.
Syllabus
Introduction
Agenda
Cognitive Work
finely tuned skills
coordination
team change
resilient organizations
Incidents
Howie Guide
Incident Review
Why Does a Focus on Cognition and Coordination Matter
Multifaceted Use of Incident Reports
What Does Learning Mean
Interaction
Culture
Building Culture
Organizational Change Management
Experimentation
Avoiding silos expert knowledge
normalizing knowledge gaps
hand raises are changing
who knows
everyone is sharing
Taught by
USENIX