Slight Reliability Episode 89 - Blameless Post-mortems with Karanveer Anand

Wednesday, Sep 4, 2024 | 2 minute read | Updated at Wednesday, Sep 4, 2024

Podcast - Slight Reliability

Published Sep 04, 2024

Summary:

  1. Introduction of the Podcast:
    • The podcast series, named “SL reliability,” focuses on learning about Site Reliability Engineering (SRE) and observability.
    • Host Steven Townend introduces guest Karanveer Anand, a Technical Program Manager at Google, who has a background in software reliability.
  2. Topic Discussion - Blameless Post-Mortems:
    • The podcast specifically addresses the concept of blameless post-mortems.
    • Anand discusses the importance of a blameless approach, highlighting it promotes a learning culture rather than focusing on individual mistakes.
    • Blameless post-mortems are described as focusing on the products and processes to improve them, rather than attributing personal blame.
  3. Benefits of Blameless Post-Mortem:
    • Blameless post-mortems lead to well-documented records of incidents, which help in identifying preventive measures and reducing recovery times for future incidents.
  4. Process of Conducting a Post-Mortem:
    • Anand describes the process in three phases: pre-postmortem preparations, conducting the post-mortem emphasizing psychological safety, and post-postmortem activities which include widespread sharing of lessons learned.
    • A collaborative document is essential for gathering input from all relevant stakeholders during the post-mortem .
  5. Implementation and Follow-up:
    • Effective post-mortems require assignment of clear action items and ownership to ensure follow-through on identified improvements.
    • Public accountability mechanisms, such as sharing action items widely, are suggested to ensure commitments are met.
  6. Importance of Regular Practice:
    • Anand stresses the importance of treating post-mortems as regular and integral activities to continuously improve systems and prevent repetitive issues. This summary encapsulates the key points discussed regarding blameless post-mortems in the context of enhancing organizational learning and reliability in engineering practices.

Listen to the episode: YouTube

About this site

This site is a list of summaries of Ops and SRE related podcast episodes.

I built this to fulfill a personal need.

There are so many podcasts with valuable content out there but it’s impossible for me to listen to them in their entirety. These summaries give me a starting point to decide which of them has stuff that I need to know more about. Based on that I go and listen to the episode.

The summaries are auto-generated by an LLM from the episodes, so it’s possible there are minor errors. I try my best to correct any I that notice. Please reach out to let me know if you come across any.

I would encourage users of this site to go and listen to the actual podcast episodes that they find interesting based on the summaries.

I am not affiliated with any of the podcasts or their authors.

All feedback is welcome. My contact info