Courses from 1000+ universities
Two years after its first major layoff round, Coursera announces another, impacting 10% of its workforce.
600 Free Google Certifications
Data Analysis
Microsoft Excel
Artificial Intelligence
An Introduction to Interactive Programming in Python (Part 1)
Excel: Fundamentos y herramientas
The Future of Work: Preparing for Disruption
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore all talks and presentations from SREcon. Dive deep into the latest insights, research, and trends from the world's leading experts.
Explore adaptive concurrency control for real-time analytics, addressing timeouts in mixed workloads. Learn iterative improvements for better user experience in query services.
Discover how Wikipedia uses W3C's Network Error Logging to detect user-facing outages in real-time. Learn about implementation strategies and case studies of outages missed by traditional monitoring methods.
Explore how SREs can align mental models with system reality using resilience stress testing and decision trees. Learn practical tools for documenting, visualizing, and improving complex software systems.
Insights on platform engineering best practices, including product management, developer retention, trust-building, and re-skilling ops staff, drawn from diverse organizations' experiences over the past decade.
Explore the pitfalls of blindly adopting others' platform strategies. Learn to retain agency, identify needs, and avoid common mistakes in platform engineering for your unique business context.
Explore J.P. Morgan's transition to public cloud, focusing on SRE's role in overcoming regulatory, technical, and organizational challenges while ensuring stability and reliability.
Explore the evolution of Honeycomb's Kafka cluster and telemetry systems, covering scaling strategies, infrastructure choices, and best practices for handling 10x data volume growth.
Demystifying OpenTelemetry metrics: Learn about different metric instruments, their implementation, and how they can enhance your understanding of system performance and error rates.
Explore scaling Prometheus for massive metrics installations, covering field hint indices, query push down, GitOps deployment, and lessons learned from eBay's journey to planet-scale observability.
Discover how Spotify transforms incident reports into valuable insights for improving system operations and work processes, extracting meaningful data from seemingly mundane paperwork.
Explore building an open-source APM using OpenTelemetry, Prometheus, and Jaeger. Learn implementation strategies, risks, and upcoming improvements in the OTel community for cost-effective application performance monitoring.
Explore chaos experimentation as a test-driven development approach for distributed systems, enhancing reliability and validating changes throughout the software lifecycle.
Explore how SRE teams evolve as startups grow, focusing on organizational changes, advocacy strategies, and overcoming technical debt to support rapid scaling and maintain operational excellence.
Exploring SRE as a cultural force for change, examining its evolution and potential to drive reliability in organizations while drawing parallels to social movements and people-centric approaches.
Explore TLA+ for reliable system design through a hands-on approach, solving a subtle concurrency issue to demonstrate its power in preventing critical failures and reducing debugging time.
Get personalized course recommendations, track subjects and courses with reminders, and more.