Last9 of Reliability

Follow

Follow

Stories from the SRE/DevOps and programming world about the Reliability of Software systems, and best practices.

Pinned

Event Report: Best Practices for using Amazon Distro for OpenTelemetry

Prathamesh Sonpatki

Prathamesh Sonpatki

Jun 8, 20231 min read70 views

On 7th June 2023, we hosted Michael Hausenblas on our Discord Community for a talk on "Best Practices for using Amazon Distro for...

Event Report: Best Practices for using Amazon Distro for OpenTelemetry

Downsampling in Prometheus: Practical Strategies to Manage Cardinality and Query Performance

Prathamesh Sonpatki

Prathamesh Sonpatki

Nov 12, 20234 min read803 views

Prometheus is a widespread open-source monitoring and alerting toolkit widely used for collecting and querying metrics from various systems and...

Downsampling in Prometheus: Practical Strategies to Manage Cardinality and Query Performance

SRE vs. Platform Engineering

Ujjwal Goyal

Nov 12, 20232 min read12 views

Title: Unveiling the Distinctions: SRE vs. Platform Engineering Introduction: In the ever-evolving software development and operations landscape, Site...

SRE vs. Platform Engineering

Managed Prometheus Services: The Superiority of Levitate

Prathamesh Sonpatki

Prathamesh Sonpatki

Sep 13, 20232 min read30 views

Prometheus, an open-source monitoring and alerting toolkit, has become integral to the cloud-native ecosystem. Its robust capabilities and flexible...

Managed Prometheus Services: The Superiority of Levitate

Telemetry in the Cloud Native World: An In-Depth Look

Prathamesh Sonpatki

Prathamesh Sonpatki

Aug 22, 20233 min read16 views

Cloud-native architectures have revolutionized how we develop, deploy, and manage applications. Understanding their behavior, performance, and health...

Telemetry in the Cloud Native World: An In-Depth Look

Understanding Mean Time Between Incidents (MTBI)

Prathamesh Sonpatki

Prathamesh Sonpatki

Aug 13, 20232 min read31 views

The term "Mean Time Between Incidents" or MTBI might sound technical, but it is fundamental to any organization that seeks continuous improvement in...

Understanding Mean Time Between Incidents (MTBI)