Done Criteria
We have a runbook for SP approvers that covers:
How often to check if approved SPs are not meeting approval criteria
What to do when they aren't?
What channel to message in or where to keep an issue tracking the action
How long to wait before unapproving them
Commands to run to unapprove them
Why Important
This supports having more bus factor to this critical area that @TippyFlitsUK has been handling by default.
User/Customer
Notes
We can give a time window (e.g., 1–4 hours) to diagnose & see if they recover.
Reasons to give a window:
Avoid flapping the approved list on minor or transient issues.
Allow James to account for known short maintenance, etc.
But, from a risk standpoint:
If SP is failing for more than that window, better to unapprove to protect users.
We'll get automated alerting with Automated "alerting" if an SP should get approved or unapproved dealbot#280 , but when the alarm goes off, we still need a runbook for how we handle it.
Before automated alerting, we're expecting to manually check twice per day.
We need to get a couple of people onboarded onto this process (e.g., Beck, Orjan)
Done Criteria
We have a runbook for SP approvers that covers:
Why Important
User/Customer
Notes