mirror of
https://github.com/fleetdm/fleet
synced 2026-04-21 13:37:30 +00:00
Update handbook incident response sections (#43049)
This commit is contained in:
parent
5652731f89
commit
a8dae187f9
2 changed files with 4 additions and 1 deletions
|
|
@ -825,6 +825,8 @@ Each product group maintains two engineers assigned to incident on-call. Enginee
|
|||
|
||||
#### Incident on-call responsibilities
|
||||
|
||||
The incident on-call engineer leads the incident from acknowledgment through resolution and owns internal communication. Don't assume anyone is already aware. Mention the right people in the incident channel to pull them in: someone from CS (the reporter, if CS reported the issue), the relevant engineering manager, and any engineers or QA needed for investigation. Post regular status updates in the incident channel and keep the incident response issue up to date.
|
||||
|
||||
**Outside of business hours**
|
||||
|
||||
The incident on-call engineer is responsible for stabilizing the issue well enough to pick it back up in the morning. They should file P1 issues for any immediate follow-up items that need to be addressed during the next business day.
|
||||
|
|
|
|||
|
|
@ -176,6 +176,7 @@ The incident on-call engineer is responsible for:
|
|||
- Knowing [the incident on-call rotation](https://fleetdm.com/handbook/company/product-groups#incident-on-call-engineer).
|
||||
- Completing the [incident.io on-call engineer onboarding steps](https://help.incident.io/articles/3472064049-get-started-as-an-on-call-responder) sent via email when invited to incident.io.
|
||||
- Confirming incident pages push through Do Not Disturb.
|
||||
- Assuming the incident lead in incident.io.
|
||||
- Performing the [incident on-call responsibilities](https://fleetdm.com/handbook/company/product-groups#incident-on-call-responsibilities).
|
||||
|
||||
|
||||
|
|
@ -199,7 +200,7 @@ Incident notifications are sent 24/7/365 via incident.io, triggered by creating
|
|||
|
||||
Mitigating the outage may require writing and merging code. The current infrastructure on-call engineer is first line for all reviews and QA required to deploy a hot-fix. If additional code review or engineering support is needed, the responding engineer should escalate to their manager.
|
||||
|
||||
> If outside of business hours, the incident on-call engineer is responsible for stabilizing the issue well enough to pick it back up in the morning, and should file P1 issues for any immediate follow-up items. During business hours, the incident on-call engineer triages the incident and coordinates a response across engineering, QA, CS, and infrastructure until the incident has been resolved.
|
||||
> If outside of business hours, the incident on-call engineer is responsible for stabilizing the issue well enough to pick it back up in the morning, and should file P1 issues for any immediate follow-up items. During business hours, the incident on-call engineer triages the incident and coordinates a response across engineering, QA, CS, and infrastructure until the incident has been resolved. See [incident on-call responsibilities](https://fleetdm.com/handbook/company/product-groups#incident-on-call-responsibilities) for details.
|
||||
|
||||
|
||||
### Participate in QA Day
|
||||
|
|
|
|||
Loading…
Reference in a new issue