Stories from the SRE/DevOps and programming world about the Reliability of Software systems, and best practices.

Pinned
Event Report: Best Practices for using Amazon Distro for OpenTelemetry
Downsampling in Prometheus: Practical Strategies to Manage Cardinality and Query Performance
SRE vs. Platform Engineering
Managed Prometheus Services: The Superiority of Levitate
Telemetry in the Cloud Native World: An In-Depth Look
Understanding Mean Time Between Incidents (MTBI)