Courses from 1000+ universities
Two years after its first major layoff round, Coursera announces another, impacting 10% of its workforce.
600 Free Google Certifications
Digital Marketing
Computer Science
Graphic Design
Mining Massive Datasets
Making Successful Decisions through the Strategy, Law & Ethics Model
The Science of Well-Being
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore all talks and presentations from SREcon. Dive deep into the latest insights, research, and trends from the world's leading experts.
Exploring distributed tracing in real-time data streaming systems, focusing on challenges and solutions for trading platforms, including session tracking, data flow management, and storage optimization.
Explore principles and tools for safer production environments through automation, safe proxies, and audited break-glass, reducing human errors and insider threats in system operations.
Explores limitations of Machine Learning in production engineering, debunking common misconceptions and discussing potential feasible applications for SREs.
Learn how Squarespace's team adopted SRE practices to transform their unreliable logging platform into a trusted system with 99.9% uptime, sharing valuable insights and strategies for improving service reliability.
Explore systems thinking for safety and cybersecurity, integrating approaches to manage emergent properties and control problems in complex systems.
Explore strategies for efficient systems data management, including sampling and aggregation techniques, to maintain crucial information while reducing data volume and costs.
Practical strategies for implementing Site Reliability Engineering principles in resource-constrained environments, focusing on gradual improvements and stress reduction for engineering teams.
Explore Stripe's approach to prioritizing technical infrastructure investments, balancing firefighting and innovation, and enabling long-term success through strategic decision-making and resource allocation.
Explore the journey of defining effective SLOs for data-intensive services, focusing on search engines. Learn about monitoring processes, consistency, and automated mitigation strategies for complex systems.
Explore Google's SRE training program, featuring hands-on exercises in a safe environment. Learn how SRE principles were applied to improve the curriculum, minimize toil, and enhance reliability through automation and monitoring.
Learn to quickly estimate system performance using base rates and napkin math, enabling informed decision-making in technical discussions and design processes without building systems first.
Learn to create a PID controller for autoscaling Kubernetes deployments, ensuring smooth scaling based on custom targets. Explore control theory principles and their application in SRE practices.
Transforming engineering culture: One SRE's journey from chaos to improved reliability, featuring practical tips on implementing SLIs, reducing incident response times, and fostering organizational change.
Explore Instagram's global infrastructure scaling challenges and solutions, focusing on cross-continent deployment, latency issues, and strategies for managing stateful services.
Explore Linux memory management for reliable, large-scale systems. Learn about swap, cgroups, PSI, and memory protection from a Facebook SRE, debunking myths and offering expert insights.
Get personalized course recommendations, track subjects and courses with reminders, and more.