Slight Reliability Episode 87 - Measuring the value of SRE with Artem Yakimenko
Thursday, Jul 25, 2024 | 2 minute read | Updated at Thursday, Jul 25, 2024
Podcast - Slight Reliability
Published Jul 25, 2024
Summary:
- Podcast Title and Host: The podcast is titled “SL Reliability” and hosted by Steven Townend.
- Guest Background: The guest, Artam Yakimo, is from Culture Amp and has a background in operations and reliability engineering, including time at Google working on Google Cloud Storage and Customer Reliability.
- Discussion Topic: The main discussion revolves around making reliability meaningful in organizations, specifically how it impacts business value and customer experiences.
- Guest’s Role and Activities: Despite being an engineering director, Artam maintains a hands-on role by taking support shifts to understand team challenges and deployment issues. He mentions that this hands-on approach gives valuable insights into daily operational challenges.
- Measuring Reliability: Artam argues that while it’s challenging to quantify the impact of reliability engineering in simple monetary terms, one can use metrics such as uptime, service level agreements (SLAs), and service level objectives (SLOs) combined with qualitative analyses of support ticket data to gauge impacts on user experience.
- Concrete Example: Artam provides an example of analyzing keywords in support tickets like “latency” or “slow” to identify and prioritize areas for improvement in reliability. This method allows for identifying prevalent issues impacting users, which can then be addressed to enhance system performance and user satisfaction.
- Importance of Financial Ties: The discussion also touches on the business aspect, emphasizing the need for reliability efforts to be aligned with revenue-generating activities. Artam suggests that any improvement efforts should focus on areas critical to customer experience and, consequently, revenue generation.
- Reliability in Business Context: Lastly, there is a focus on expressing the value of reliability engineering to business stakeholders, ensuring they understand its importance not only in preventing downtime but in fostering a positive company reputation and maintaining customer trust and satisfaction. This summary provides an overview of key themes discussed in the podcast, reflecting on the integration of technical roles with business outcomes and continuous improvement based on practical, data-driven insights.
Listen to the episode: YouTube