Overview
Syllabus
intro
preamble
sre 2.0 : amplifying reliability with genai
agenda
quick intro about myself
gartner sre hype cycle
sre
navigating digital transformation: managing ever-growing complexity
operations is a software problem
genai emerges: unveiling the power of next-gen artificial intelligence
unveiling the potential: the capabilityies of llm
navigating challenges: risks associated with llms
addressing model challenges: finding effective solutions
retrieval-augmented generation rag / knowledge bases
llm agents
prompt engineering best practices
prompt engineering properties
sre 2.0
genai in observability
use case - analyze log data to automatically identify root causes of performance issues
genai in sli, slo, and error budgets
use case - recommend optimal error budget allocations based on business priorities and user expectations
genai in system architecture and recovery objectives
use case - predict the impact of different failure scenarios on system availability and performance
genai in release & incident engineering
use case - provide real-time incident response recommendations based on the current situation and historical data
genai in automation
use case - analyze the effectiveness of automation workflows and recommend improvements bases on performance metrics
genai in genai in resilience engineering
use case - automate the execution of chaos experiments based on identified risk factors and failure scenarios
genai in genai in blameless postmortems
use case - analyze historical post-mortem data to identify recurring patterns and trends in incidents
measure progress with business outcomes
best practices
pitfalls to avoid
thank you.
Taught by
Conf42