doc updates for oncall and mdm migration light/dark logo feature (#12964)

quick doc change to oncall and product feature mdm migration light/dark
logos

---------

Co-authored-by: Noah Talerman <47070608+noahtalerman@users.noreply.github.com>
This commit is contained in:
Gabriel Hernandez 2023-08-03 10:38:41 +01:00 committed by GitHub
parent f4bf8ba8bf
commit 926bdd30af
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 87 additions and 67 deletions

View file

@ -27,7 +27,25 @@ This section provides instructions for migrating your hosts away from your old M
2. In ABM, assign these hosts' MDM server to Fleet: In ABM, select **Devices** and then select **All Devices**. Then, select **Edit** next to **Edit MDM Server**, select **Assign to the following MDM:**, select your Fleet server in the dropdown, and select **Continue**.
5. In your old MDM solution, unenroll the hosts to be migrated. MacOS does not allow multiple MDMs to be installed at once.
6. Send [these guided instructions](#instructions-for-end-users) to your end users to complete the final few steps via Fleet Desktop.
* Note that there will be a gap in MDM coverage between when the host is unenrolled from the old MDM and when the host turns on MDM in Fleet.
* Note that there will be a gap in MDM coverage between when the host is unenrolled from the old
MDM and when the host turns on MDM in Fleet. Use Fleet's [end user migration workflow](#end-user-migration-workflow) to reduce the gap in MDM coverage.
### End user migration workflow
> Available in Fleet Premium or Ultimate
You can use Fleet's end user migration workflow to reduce the gap in MDM coverage during migration.
The migration worfklow is supported for automatically enrolled (DEP) hosts.
During the end user migration workflow, an end user's device will have their selected system
theme (light or dark) applied. If your logo does not look good on either light or dark backgrounds,
you can optionally set an alternate logo for the themes.
You can do this in the Fleet UI by going to **Settings** > **Organization settings** >
**Organization info** and adding a url to the desired image in the **Organization avatar URL (for
dark backgrounds)** and **Organization avatar URL (for light backgrounds)** inputs. The appropriate
image will show depending on the selected system theme.
## FileVault recovery keys
@ -118,4 +136,4 @@ Want to know what your organization can see? Read about [transparency](https://f
<meta name="pageOrderInSection" value="1501">
<meta name="title" value="MDM migration guide">
<meta name="description" value="Instructions for migrating hosts away from an old MDM solution to Fleet.">
<meta name="navSection" value="Device management">
<meta name="navSection" value="Device management">

View file

@ -50,7 +50,7 @@ Our scrum boards are exclusively composed of four types of scrum items:
- [Sprint ceremonies](#sprint-ceremonies)
- [Eng together](#eng-together)
- [Group weeklies](#group-weeklies)
- [Eng leadership weekly](#eng-leadership)
- [Eng leadership weekly](#eng-leadership)
- [Eng product weekly](#eng-product-weekly)
### Goals
@ -90,7 +90,7 @@ A chance for deeper, synchronous discussion on topics relevant across product gr
#### Participants
Anyone who wishes to participate.
Anyone who wishes to participate.
#### Sample Agenda (Frontend weekly)
@ -98,7 +98,7 @@ Anyone who wishes to participate.
- Review difficult frontend bugs
- Write engineering-initiated stories
### Eng leadership weekly
### Eng leadership weekly
Engineering leaders discuss topics of importance that week. Prepare agenda, announcements, and tech talks before the monthly [Eng Together](#eng-together) meeting.
@ -116,7 +116,7 @@ Engineering leaders discuss topics of importance that week. Prepare agenda, anno
### Eng product weekly
Engineering and product weekly sync to discuss process, roadmap, and scheduling.
Engineering and product weekly sync to discuss process, roadmap, and scheduling.
#### Participants
@ -134,25 +134,25 @@ Engineering and product weekly sync to discuss process, roadmap, and scheduling.
## Engineering-initiated stories
- [Creating an engineering-initiated story](#creating-an-engineering-initiated-story)
- [Creating an engineering-initiated story](#creating-an-engineering-initiated-story)
Engineering-initiated stories are types of user stories created by engineers to make technical changes to Fleet. Technical changes should improve the user experience or contributor experience. For example, optimizing SQL that improves the response time of an API endpoint improves user experience by reducing latency. A script that generates common boilerplate, or automated tests to cover important business logic, improves the quality of life for contributors, making them happier and more productive, resulting in faster delivery of features to our customers.
It is important to frame engineering-initiated user stories the same way we frame all user stories. Stay focused on how this technical change will drive value for our users.
It is important to frame engineering-initiated user stories the same way we frame all user stories. Stay focused on how this technical change will drive value for our users.
Engineering-initiated stories follow the [user story drafting process](https://fleetdm.com/handbook/company/development-groups#drafting). Once your user story is created using the [new story template](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=story%2C%3Aproduct&projects=&template=story.md&title=), add the `~engineering-initiated` label, assign it to yourself, and work with an EM or PM to progress the story through the drafting process.
Engineering-initiated stories follow the [user story drafting process](https://fleetdm.com/handbook/company/development-groups#drafting). Once your user story is created using the [new story template](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=story%2C%3Aproduct&projects=&template=story.md&title=), add the `~engineering-initiated` label, assign it to yourself, and work with an EM or PM to progress the story through the drafting process.
> We prefer the term engineering-initiated stories over technical debt because the user story format helps keep us focused on our users.
### Creating an engineering-initiated story
1. Create a [new feature request issue](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=~engineering-initiated&projects=&template=feature-request.md&title=) in GitHub.
2. Ensure it is labeled with `~engineering-initiated` and the relevant product group. Remove any `~customer-request` label.
3. Assign it to yourself. You will own this user story until it is either prioritized or closed.
4. Schedule a time with an EM and/or PM to present your story. Iterate based on feedback.
1. Create a [new feature request issue](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=~engineering-initiated&projects=&template=feature-request.md&title=) in GitHub.
2. Ensure it is labeled with `~engineering-initiated` and the relevant product group. Remove any `~customer-request` label.
3. Assign it to yourself. You will own this user story until it is either prioritized or closed.
4. Schedule a time with an EM and/or PM to present your story. Iterate based on feedback.
5. You, your EM or PM can bring this to Feature Fest for consideration. All engineering-initiated changes go through the same [drafting process](https://fleetdm.com/handbook/product#intake) as any other story.
> We aspire to dedicate 20% of each sprint to technical changes, but may allocate less based on customer needs and business priorities.
> We aspire to dedicate 20% of each sprint to technical changes, but may allocate less based on customer needs and business priorities.
## Documentation for contributors
@ -169,33 +169,35 @@ The current release cadence is once every three weeks and is concentrated around
### Release freeze period
To ensure release quality, Fleet has a freeze period for testing beginning the Thursday before the release at 9:00 AM Pacific. Effective at the start of the freeze period, new feature work will not be merged into `main`.
To ensure release quality, Fleet has a freeze period for testing beginning the Thursday before the release at 9:00 AM Pacific. Effective at the start of the freeze period, new feature work will not be merged into `main`.
Bugs are exempt from the release freeze period.
Bugs are exempt from the release freeze period.
### Freeze day
To begin the freeze, [open the repo on Merge Freeze](https://www.mergefreeze.com/installations/3704/branches/6847) and click the "Freeze now" button. This will freeze the `main` branch and require any PRs to be manually unfrozen before merging. PRs can be manually unfrozen in Merge Freeze using the PR number.
To begin the freeze, [open the repo on Merge Freeze](https://www.mergefreeze.com/installations/3704/branches/6847) and click the "Freeze now" button. This will freeze the `main` branch and require any PRs to be manually unfrozen before merging. PRs can be manually unfrozen in Merge Freeze using the PR number.
> Any Fleetie can [unfreeze PRs on Merge Freeze](https://www.mergefreeze.com/installations/3704/branches) if the PR contains documentation changes or bug fixes only. If the PR contains other changes, please confirm with your manager before unfreezing.
#### Check dependencies
Before kicking off release QA, confirm that we are using the latest versions of dependencies we want to keep up-to-date with each release. Currently, those dependencies are:
Before kicking off release QA, confirm that we are using the latest versions of dependencies we want to keep up-to-date with each release. Currently, those dependencies are:
1. **Go**: Latest minor release
* Check the [version included in Fleet](https://github.com/fleetdm/fleet/blob/main/.github/workflows/build-binaries.yaml#L30).
* Check the [latest minor version of Go](https://go.dev/dl/). For example, if we are using `go1.19.8`, and there is a new minor version `go1.19.9`, we will upgrade.
* If the latest minor version is greater than the version included in Fleet, [file a bug](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=bug%2C%3Areproduce&projects=&template=bug-report.md&title=) and assign it to the [release ritual DRI](https://fleetdm.com/handbook/engineering#rituals) and the [current oncall engineer](https://fleetdm.com/handbook/engineering#how-to-reach-the-oncall-engineer). Add the `~release blocker` label. We must upgrade to the latest minor version before publishing the next release.
* If the latest major version is greater than the version included in Fleet, [create a story](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=story%2C%3Aproduct&projects=&template=story.md&title=) and assign it to the [release ritual DRI](https://fleetdm.com/handbook/engineering#rituals) and the [current oncall engineer](https://fleetdm.com/handbook/engineering#how-to-reach-the-oncall-engineer). This will be considered for an upcoming sprint. The release can proceed without upgrading the major version.
- Check the [version included in Fleet](https://github.com/fleetdm/fleet/blob/main/.github/workflows/build-binaries.yaml#L30).
- Check the [latest minor version of Go](https://go.dev/dl/). For example, if we are using `go1.19.8`, and there is a new minor version `go1.19.9`, we will upgrade.
- If the latest minor version is greater than the version included in Fleet, [file a bug](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=bug%2C%3Areproduce&projects=&template=bug-report.md&title=) and assign it to the [release ritual DRI](https://fleetdm.com/handbook/engineering#rituals) and the [current oncall engineer](https://fleetdm.com/handbook/engineering#how-to-reach-the-oncall-engineer). Add the `~release blocker` label. We must upgrade to the latest minor version before publishing the next release.
- If the latest major version is greater than the version included in Fleet, [create a story](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=story%2C%3Aproduct&projects=&template=story.md&title=) and assign it to the [release ritual DRI](https://fleetdm.com/handbook/engineering#rituals) and the [current oncall engineer](https://fleetdm.com/handbook/engineering#how-to-reach-the-oncall-engineer). This will be considered for an upcoming sprint. The release can proceed without upgrading the major version.
> In Go versioning, the number after the first dot is the "major" version, while the number after the second dot is the "minor" version. For example, in Go 1.19.9, "19" is the major version and "9" is the minor version. Major version upgrades are assessed separately by engineering.
2. **macadmins-extension**: Latest release
* Check the [latest version of the macadmins-extension](https://github.com/macadmins/osquery-extension/releases).
* Check the [version included in Fleet](https://github.com/fleetdm/fleet/blob/main/go.mod#L60).
* If the latest stable version of the macadmins-extension is greater than the version included in Fleet, [file a bug](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=bug%2C%3Areproduce&projects=&template=bug-report.md&title=) and assign it to the [release ritual DRI](https://fleetdm.com/handbook/engineering#rituals) and the [current oncall engineer](https://fleetdm.com/handbook/engineering#how-to-reach-the-oncall-engineer).
* Add the `~release blocker` label.
- Check the [latest version of the macadmins-extension](https://github.com/macadmins/osquery-extension/releases).
- Check the [version included in Fleet](https://github.com/fleetdm/fleet/blob/main/go.mod#L60).
- If the latest stable version of the macadmins-extension is greater than the version included in Fleet, [file a bug](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=bug%2C%3Areproduce&projects=&template=bug-report.md&title=) and assign it to the [release ritual DRI](https://fleetdm.com/handbook/engineering#rituals) and the [current oncall engineer](https://fleetdm.com/handbook/engineering#how-to-reach-the-oncall-engineer).
- Add the `~release blocker` label.
>**Note:** Some new versions of the macadmins-extension include updates that require code changes in Fleet. Make sure to note in the bug that the update should be checked for any changes, like new tables, that require code changes in Fleet.
@ -222,7 +224,7 @@ How to deploy a new release to dogfood:
5. Select **Run workflow** and paste the image name in the **The image tag wished to be deployed.** field.
> Note that this action will not handle down migrations. Always deploy a newer version than is currently deployed.
>
>
> Note that "fleetdm/fleet:main" is not a image name, instead use the commit hash in place of "main".
## Oncall rotation
@ -259,13 +261,13 @@ We respond within 1-hour (during business hours) for interactions and ask the on
#### PR reviews
PRs from Fleeties are reviewed by auto-assignment of codeowners, or by selecting the group or reviewer manually.
PRs from Fleeties are reviewed by auto-assignment of codeowners, or by selecting the group or reviewer manually.
All PRs from the community are routed through the oncall engineer. For documentation changes, the community contact ([Kathy](https://github.com/ksatter)) is assigned by the oncall engineer. For code changes, if the oncall engineer has the knowledge and confidence to review, they should do so. Otherwise, they should request a review from an engineer with the appropriate domain knowledge. It is the oncall engineer's responsibility to monitor community PRs and make sure that they are moved forward (either by review with feedback or merge).
#### Customer success meetings
The oncall engineer is encouraged to attend some of the customer success meetings during the week. Post a message to the #g-customer-experience Slack channel requesting invitations to upcoming meetings.
The oncall engineer is encouraged to attend some of the customer success meetings during the week. Post a message to the #g-cx Slack channel requesting invitations to upcoming meetings.
This has a dual purpose of providing more context for how our customers use Fleet. The engineer should actively participate and provide input where appropriate (if not sure, please ask your manager or organizer of the call).
@ -281,11 +283,11 @@ The remaining time after fulfilling the responsibilities of oncall is free for t
Some ideas:
* Do training/learning relevant to your work.
* Improve the Fleet developer experience.
* Hack on a product idea. Note: Experiments are encouraged, but not all experiments will ship! Check in with the product team before shipping user-visible changes.
* Create a blog post (or other content) for fleetdm.com.
* Try out an experimental refactor.
- Do training/learning relevant to your work.
- Improve the Fleet developer experience.
- Hack on a product idea. Note: Experiments are encouraged, but not all experiments will ship! Check in with the product team before shipping user-visible changes.
- Create a blog post (or other content) for fleetdm.com.
- Try out an experimental refactor.
At the end of your oncall shift, you will be asked to share about how you spent your time.
@ -342,13 +344,13 @@ At Fleet, we do postmortem meetings for every production incident, whether it's
### Postmortem document
Before running the postmortem meeting, copy this [Postmortem Template](https://docs.google.com/document/d/1Ajp2LfIclWfr4Bm77lnUggkYNQyfjePiWSnBv1b1nwM/edit?usp=sharing) document and populate it with some initial data to enable a productive conversation.
Before running the postmortem meeting, copy this [Postmortem Template](https://docs.google.com/document/d/1Ajp2LfIclWfr4Bm77lnUggkYNQyfjePiWSnBv1b1nwM/edit?usp=sharing) document and populate it with some initial data to enable a productive conversation.
### Postmortem meeting
Invite all stakeholders, typically the team involved and QA representatives.
Follow the document topic by topic. Keep the goal in mind which is to take action items for addressing the root cause and making sure a similar incident will not happen again.
Follow the document topic by topic. Keep the goal in mind which is to take action items for addressing the root cause and making sure a similar incident will not happen again.
Distinguish between the root cause of the bug, which by that time was solved and released, and the root cause of why this issue reached our customers. These could be different issues. (e.g. the root cause of the bug was a coding issue, but the root causes (plural) of the event may be that the test plan did not cover a specific scenario, a lack of testing, and a lack of metrics to identify the issue quickly).
@ -368,7 +370,7 @@ At Fleet, we consider an outage to be a situation where new features or previous
## Scaling Fleet
Fleet, as a Go server, scales horizontally very well. Its not very CPU or memory intensive. However, there are some specific gotchas to be aware of when implementing new features. Visit our [scaling Fleet page](https://fleetdm.com/handbook/engineering/scaling-fleet) for tips on scaling Fleet as efficiently and effectively as possible.
Fleet, as a Go server, scales horizontally very well. Its not very CPU or memory intensive. However, there are some specific gotchas to be aware of when implementing new features. Visit our [scaling Fleet page](https://fleetdm.com/handbook/engineering/scaling-fleet) for tips on scaling Fleet as efficiently and effectively as possible.
## Load testing
@ -474,7 +476,7 @@ For each bug found, please use the [bug report template](https://github.com/flee
For unreleased bugs in an active sprint, a new bug is created with the `~unreleased bug` label. The `:release` label and associated product group label is added, and the engineer responsible for the feature is assigned. If QA is unsure who the bug should be assigned to, it is assigned to the EM. Fixing the bug becomes part of the story.
### Debugging
### Debugging
You can read our guide to diagnosing issues in Fleet on the [debugging page](https://fleetdm.com/handbook/engineering/debugging).
@ -485,22 +487,22 @@ You can read our guide to diagnosing issues in Fleet on the [debugging page](htt
- [Outages](#outages)
- [All bugs](#all-bugs)
All bugs in Fleet are tracked by QA on the [bugs board](https://app.zenhub.com/workspaces/-bugs-647f6d382e171b003416f51a/board) in ZenHub.
All bugs in Fleet are tracked by QA on the [bugs board](https://app.zenhub.com/workspaces/-bugs-647f6d382e171b003416f51a/board) in ZenHub.
### Bug states
The lifecycle stages of a bug at Fleet are:
The lifecycle stages of a bug at Fleet are:
1. [Inbox](#inbox)
2. [Reproduced](#reproduced)
3. [In product drafting (as needed)](#in-product-drafting-as-needed)
4. [In engineering](#in-engineering)
5. [Awaiting QA](#awaiting-qa)
The above are all the possible states for a bug as envisioned in this process. These states each correspond to a set of GitHub labels, assignees, and boards.
The above are all the possible states for a bug as envisioned in this process. These states each correspond to a set of GitHub labels, assignees, and boards.
See [Bug states and filters](#bug-states-and-filters) at the end of this document for descriptions of these states and links to each GitHub filter.
#### Inbox
When a new bug is created using the [bug report form](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=bug%2C%3Areproduce&template=bug-report.md&title=), it is in the "inbox" state.
#### Inbox
When a new bug is created using the [bug report form](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=bug%2C%3Areproduce&template=bug-report.md&title=), it is in the "inbox" state.
At this state, the [bug review DRI](#rituals) (QA) is responsible for going through the inbox and documenting reproduction steps, asking for more reproduction details from the reporter, or asking the product team for more guidance. QA has one week to move the bug to the next step (reproduced).
@ -514,7 +516,7 @@ QA has weekly check-in with product to go over the inbox items. QA is responsibl
QA may also propose that a reported bug is not actually a bug. A bug is defined as “behavior that is not according to spec or implied by spec.” If agreed that it is not a bug, then it's assigned to the relevant product manager to determine its priority.
#### Reproduced
QA has reproduced the issue successfully. It should now be transferred to engineering.
QA has reproduced the issue successfully. It should now be transferred to engineering.
Remove the “reproduce” label, add the label of the relevant team (e.g. #g-cx, #g-mdm, #g-infra, #g-website), and assign it to the relevant engineering manager. (Make your best guess as to which team. The EM will re-assign if they think it belongs to another team.) [See on GitHub](https://github.com/fleetdm/fleet/issues?q=archived%3Afalse+org%3Afleetdm+is%3Aissue+is%3Aopen+label%3Abug+label%3A%3Aproduct%2C%3Arelease+-label%3A%3Areproduce+sort%3Aupdated-asc+).
@ -529,11 +531,11 @@ A bug is in engineering after it has been reproduced and assigned to an EM. If a
If the bug does not meet the criteria of a critical bug, the EM will determine if there is capacity in the current sprint for this bug. If so, the `:release` label is added, and it is moved to the "Current release' column on the bugs board. If there is no available capacity in the current sprint, the EM will move the bug to the "Sprint backlog" column where it will be prioritized for the next sprint.
When fixing the bug, if the proposed solution requires changes that would affect the user experience (UI, API, or CLI), notify the EM and PM to align on the acceptability of the change.
When fixing the bug, if the proposed solution requires changes that would affect the user experience (UI, API, or CLI), notify the EM and PM to align on the acceptability of the change.
Fleet [always prioritizes bugs](https://fleetdm.com/handbook/product#prioritizing-improvements) into a release within six weeks. If a bug is not prioritized in the current release, and it is not prioritized in the next release, it is removed from the "Sprint backlog" and placed back in the "Product drafting" column with the `:product` label. Product will determine if the bug should be closed as accepted behavior, or if further drafting is necessary.
Fleet [always prioritizes bugs](https://fleetdm.com/handbook/product#prioritizing-improvements) into a release within six weeks. If a bug is not prioritized in the current release, and it is not prioritized in the next release, it is removed from the "Sprint backlog" and placed back in the "Product drafting" column with the `:product` label. Product will determine if the bug should be closed as accepted behavior, or if further drafting is necessary.
#### Awaiting QA
#### Awaiting QA
Bugs will be verified as fixed by QA when they are placed in the "Awaiting QA" column of the relevant product group's sprint board. If the bug is verified as fixed, it is moved to the "Ready for release" column of the sprint board. Otherwise, the remaining issues are noted in a comment, and it is moved back to the "In progress" column of the sprint board.
### All bugs
@ -556,17 +558,17 @@ This filter returns all "bug" issues closed after the specified date. Simply rep
When a release is in testing, QA should use the Slack channel #help-qa to keep everyone aware of issues found. All bugs found should be reported in the channel after creating the bug first.
When a critical bug is found, the Fleetie who labels the bug as critical is responsible for following the [critical bug notification process](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#critical-bug-notification-process) below.
When a critical bug is found, the Fleetie who labels the bug as critical is responsible for following the [critical bug notification process](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#critical-bug-notification-process) below.
All unreleased bugs are addressed before publishing a release. Released bugs that are not critical may be addressed during the next release per the standard [bug process](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#bug-process).
All unreleased bugs are addressed before publishing a release. Released bugs that are not critical may be addressed during the next release per the standard [bug process](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#bug-process).
### Release blockers
Product may add the `~release blocker` label to user stories to indicate that the story must be completed to publish the next version of Fleet. Bugs are never labeled as release blockers.
Product may add the `~release blocker` label to user stories to indicate that the story must be completed to publish the next version of Fleet. Bugs are never labeled as release blockers.
### Critical bugs
A critical bug is a bug with the `~critical bug` label. A critical bug is defined as behavior that:
A critical bug is a bug with the `~critical bug` label. A critical bug is defined as behavior that:
* Blocks the normal use a workflow
* Prevents upgrades to Fleet
* Causes irreversible damage, such as data loss
@ -590,10 +592,10 @@ When a critical bug is identified, we will then follow the patch release process
## Measurement
We track the success of this process by observing the throughput of issues through the system and identifying where buildups (and therefore bottlenecks) are occurring.
The metrics are:
We track the success of this process by observing the throughput of issues through the system and identifying where buildups (and therefore bottlenecks) are occurring.
The metrics are:
* Number of bugs opened this week
* Total # bugs open
* Total # bugs open
* Bugs in each state (inbox, acknowledged, reproduced)
* Number of bugs closed this week
@ -643,22 +645,22 @@ Escalations (in order):
- Eric Shaw (fleetdm.com)
- Mike McNeil
The first responder on-call will take ownership of the @infrastructure-oncall alias in Slack first thing Monday morning. The previous week's on-call will provide a summary in the #g-infra Slack channel with an update on alarms that came up the week before, open issues with or without direct end-user impact, and other issues to keep an eye out for.
The first responder on-call will take ownership of the @infrastructure-oncall alias in Slack first thing Monday morning. The previous week's on-call will provide a summary in the #g-infra Slack channel with an update on alarms that came up the week before, open issues with or without direct end-user impact, and other issues to keep an eye out for.
Expected response times: during business hours, 1 hour. Outside of business hours <4 hours.
For fleetdm.com and sandbox alarms, if the issue is not user-facing (e.g. provisioner/deprovisioner/temporary errors in osquery/etc), the on-call engineer will proceed to address the issue. If the issue is user-facing (e.g. the user noticed this error first-hand through the Fleet UI), then the on-call engineer will proceed to identify the user and contact them letting them know that we are aware of the issue and working on a resolution. They may also request more information from the user if it is needed. They will cc the EM and PM of the #g-infra group on any user correspondence.
For fleetdm.com and sandbox alarms, if the issue is not user-facing (e.g. provisioner/deprovisioner/temporary errors in osquery/etc), the on-call engineer will proceed to address the issue. If the issue is user-facing (e.g. the user noticed this error first-hand through the Fleet UI), then the on-call engineer will proceed to identify the user and contact them letting them know that we are aware of the issue and working on a resolution. They may also request more information from the user if it is needed. They will cc the EM and PM of the #g-infra group on any user correspondence.
For Fleet managed cloud alarms that are user-facing, the first responder should collect the email address of the customer and all available information on the error. If the error occurs during business hours, the first responder should make their best effort to understand where in the app the error might have occurred. Assistance can be requested in `#help-engineering` by including the data they know regarding the issue, and when available, a frontend or backend engineer can help identify what might be causing the problem. If the error occurs outside of business hours, the on-call engineer will contact the user letting them know that we are aware of the issue and working on a resolution. Its more helpful to say something like “we saw that you received an error while trying to create a query” than to say “your POST /api/blah failed”.
Escalation of issues will be done manually by the first responder according to the escalation contacts mentioned above. An outage issue (template available) should be created in the Fleet confidential repo addressing:
Escalation of issues will be done manually by the first responder according to the escalation contacts mentioned above. An outage issue (template available) should be created in the Fleet confidential repo addressing:
1. Who was affected and for how long?
2. What expected behavior occurred?
3. How do you know?
4. What near-term resolution can be taken to recover the affected user?
5. What is the underlying reason or suspected reason for the outage?
6. What are the next steps Fleet will take to address the root cause?
1. Who was affected and for how long?
2. What expected behavior occurred?
3. How do you know?
4. What near-term resolution can be taken to recover the affected user?
5. What is the underlying reason or suspected reason for the outage?
6. What are the next steps Fleet will take to address the root cause?
All infrastructure alarms (fleetdm.com, Fleet managed cloud, and sandbox) will go to #help-p1.
@ -668,7 +670,7 @@ When an infrastructure on-call engineer is out of the office, Zach Wasserman wil
## Accounts
Engineering is responsible for managing third-party accounts required to support engineering infrastructure.
Engineering is responsible for managing third-party accounts required to support engineering infrastructure.
### Apple developer account
@ -678,11 +680,11 @@ When this occurs, we will begin receiving the following error message when attem
1. Visit the [Apple developer account login page](https://appleid.apple.com/account?appId=632&returnUrl=https%3A%2F%2Fdeveloper.apple.com%2Fcontact%2F).
2. Log in using the credentials stored in 1Password under "Apple developer account".
2. Log in using the credentials stored in 1Password under "Apple developer account".
3. Contact the Head of Business Operations to determine which phone number to use for 2FA.
3. Contact the Head of Business Operations to determine which phone number to use for 2FA.
4. Complete the 2FA process to log in.
4. Complete the 2FA process to log in.
5. Accept the new terms of service.