Courses from 1000+ universities
Two years after its first major layoff round, Coursera announces another, impacting 10% of its workforce.
600 Free Google Certifications
Web Development
Software Development
Graphic Design
Functional Programming Principles in Scala
Mountains 101
Industrial Pharmacy-I
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore all talks and presentations from SREcon. Dive deep into the latest insights, research, and trends from the world's leading experts.
Veteran systems engineer shares invaluable lessons from two decades of failures, offering insights on designing, running, and troubleshooting large-scale online services to help others avoid costly mistakes.
Explore secure pod-to-pod communication using Mutual TLS, focusing on certificate management, implementation in Golang and Java applications, and integration with Kubernetes for streamlined security.
Insights from a seasoned systems engineer on designing and running large-scale online services, drawing from 20+ years of failures and lessons learned in the tech industry.
Explore Shopify's journey from a single-database Rails app to a multi-datacenter setup, focusing on scalability, resiliency, and disaster recovery strategies for multi-tenant architectures.
Explore Google's Doorman system for global distributed client-side rate limiting, coordinating resource usage across multiple clients to prevent capacity overload.
Explore how restaurant operations parallel computer system reliability, drawing insights from dining experiences to enhance understanding of complex system management and fault tolerance.
Optimize high-scale production systems with Kafka and MySQL binlog for fresh, fast data distribution. Learn to balance caching tradeoffs and improve performance across thousands of services.
Strategies for managing metrics growth and cardinality in cloud-native environments, focusing on best practices, KPIs, and team collaboration to enhance observability and streamline remediation processes.
Explore reliability engineering beyond traditional SRE practices. Learn about concrete models, underlying mechanisms, and new strategies to enhance service reliability and tackle complex challenges.
Exploring hidden challenges in incident management, including diagnostic work, coordination costs, and decision-making dilemmas, with insights for improved recognition and handling.
Explore nine key questions for building effective infrastructure automation pipelines, focusing on modular design, intent-driven approaches, and seamless tool integration for continuous delivery.
Explore machine learning-driven automation for optimizing Kubernetes microservices and JVM settings, enhancing performance, efficiency, and cost-effectiveness in complex tech stacks.
Explore complex systems' traits and learn better approaches to incident analysis beyond linear root-cause methods. Gain insights from history, science, and philosophy to enhance understanding of resilience in modern organizations.
Explore Slack's evolution in incident management, covering strategies for handling numerous incidents, team-wide capability building, and future directions in maintaining platform reliability.
Explore techniques for managing and improving reliability in large-scale machine learning production systems, focusing on common failure modes, best practices, and practical strategies for SREs.
Get personalized course recommendations, track subjects and courses with reminders, and more.