fleet/server/datastore
Jordan Montgomery 612d3de968
Mark setup experience installs as "cancelled" and later fail them when certain bulk actions happen (#29355)
Still adding tests but wanted to get this up for review of the overall
"shape" of the fix

When certain things happen like installer updates we delete pending
upcoming_activities(UA) and host_software_install(HSI) entries and need
to mark setup_experience_status_results(SESR) cancelled. When this
happens if that UA/HSI are being depended on by setup experience we need
to make sure that that setup experience result eventually gets marked
failed.

I kind of went back and forth a few times on how best to do this and
avoid race conditions. One thing I tried was looking at existence of the
UA/HSI but found that naively just trying to look at that in relation to
the SESR entry seemed to have a few race conditions that were hard to
resolve. There are a few possible states here we need to account for
such as:

un-activated, totally not yet running software install cancelled
activated but not yet running on the host software install cancelled
activated and running on the host software install cancelled before
results are completely reported back

What I eventually came around to was essentially that we want to mark
the SESR cancelled in the same transaction we delete the HSI/UA in. We
then finalize it by marking it failed and sending the activity the next
time the host fetches setupm experience results. The new cancelled
status never leaves fleet. This is a bit ugly but in my testing avoided
the race conditions and works well.

Note that to actually avoid setup experience hanging entirely we still
need to fix #29357 which encompasses several cases where the unified
queue can get completely stuck for a host

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
- [ ] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- [ ] Added support on fleet's osquery simulator `cmd/osquery-perf` for
new osquery data ingestion features.
- [ ] If paths of existing endpoints are modified without backwards
compatibility, checked the frontend/CLI for any necessary changes
- [ ] If database migrations are included, checked table schema to
confirm autoupdate
- For database migrations:
- [ ] Checked schema for all modified table for columns that will
auto-update timestamps during migration.
- [ ] Confirmed that updating the timestamps is acceptable, and will not
cause unwanted side effects.
- [ ] Ensured the correct collation is explicitly set for character
columns (`COLLATE utf8mb4_unicode_ci`).
- [ ] Added/updated automated tests
- [ ] Manual QA for all new/changed functionality
- For Orbit and Fleet Desktop changes:
- [ ] Make sure fleetd is compatible with the latest released version of
Fleet (see [Must
rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)).
- [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit
feature/bugfix should only apply to one platform (`runtime.GOOS`).
- [ ] Manual QA must be performed in the three main OSs, macOS, Windows
and Linux.
- [ ] Auto-update manual QA, from released version of component to new
version (see [tools/tuf/test](../tools/tuf/test/README.md)).
- [ ] For unreleased bug fixes in a release candidate, confirmed that
the fix is not expected to adversely impact load test results or alerted
the release DRI if additional load testing is needed.
2025-05-27 16:52:51 -04:00
..
cached_mysql Enable staticcheck Go linter. (#23487) 2024-11-05 11:16:24 -06:00
filesystem Added signed URLs (#25197) 2025-01-09 12:56:54 -06:00
mysql Mark setup experience installs as "cancelled" and later fail them when certain bulk actions happen (#29355) 2025-05-27 16:52:51 -04:00
mysqlredis Update to Go 1.24.1 (#27506) 2025-03-31 11:14:09 -05:00
redis Add macOS redis cluster support (#29433) 2025-05-27 11:38:59 -04:00
s3 Remove unused code (from Fleet's sandbox implementation) (#26645) 2025-02-27 17:37:56 -03:00