Courses from 1000+ universities
Two years after its first major layoff round, Coursera announces another, impacting 10% of its workforce.
600 Free Google Certifications
Data Analysis
Microsoft Excel
Artificial Intelligence
An Introduction to Interactive Programming in Python (Part 1)
Excel: Fundamentos y herramientas
The Future of Work: Preparing for Disruption
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore all talks and presentations from SREcon. Dive deep into the latest insights, research, and trends from the world's leading experts.
Exploring burnout in SRE: causes, impacts, and strategies for recovery. Learn to apply system reliability principles to personal resilience and mental health management.
Explore Sloth, a Go tool for simulating network failures in cloud infrastructure. Learn how it helps discover and fix issues in monitoring, graceful degradation, and usability through controlled network disruptions.
Explore time series-based alerting methods and practical examples in Prometheus to reduce alert spam and improve production hygiene for more efficient monitoring at scale.
Explore the concept of SREs taking joint operational responsibility for customer-built systems on platforms, addressing challenges and discussing future implications for SRE as a discipline.
Explore Twitter's Aperture algorithm for non-cooperative, client-side load balancing in large-scale RPC frameworks, addressing scalability challenges in service clusters with thousands of instances.
Explore SLO limitations in complex services, understand failure modes, and learn best practices for robust SLO construction to improve reliability and decision-making in modern systems.
Strategies for engineering leaders to identify and overcome organizational friction, enhancing team efficiency and driving positive change in complex technical environments.
Explore the evolution of Site Reliability Engineering (SRE) from its Google origins to widespread adoption, and ponder its future as the field matures and faces new challenges in the tech industry.
Explore zero-downtime techniques for rebalancing and migrating data in multi-shard platforms, focusing on MySQL binlog-based tools for load distribution and tenant isolation.
Explore the importance of high-quality reliability measurements in SRE, learn tips for effective implementation, and understand why it's crucial for engineering work in SRE organizations.
Learn how to overhaul and centralize a complex monitoring system, creating a user-friendly solution that encourages developer engagement and continuous improvement.
Explore effective methods for implementing Latency Service Level Objectives (SLOs), addressing common pitfalls in latency measurement and aggregation for improved service quality tracking.
Explore how CERN's new monitoring system improved service operations and fostered SRE practices, discussing design decisions, operational challenges, and the benefits of implementing SLIs/SLOs.
Explore HBase architecture, components, and key metrics for efficient high-availability operations in this comprehensive overview of the Hadoop-based key-value datastore.
Comprehensive overview of Linux network interfaces, from traditional to modern cloud-native types. Explores VMs, containers, and service meshes, aiding in debugging cloud platforms.
Get personalized course recommendations, track subjects and courses with reminders, and more.