<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#42836
This is another hot path optimization.
## Before
When a host submits policy results via `SubmitDistributedQueryResults`,
the system needed to determine which policies "flipped" (changed from
passing to failing or vice versa). Each consumer computed this
independently:
```
SubmitDistributedQueryResults(policyResults)
|
+-- processScriptsForNewlyFailingPolicies
| filter to failing policies with scripts
| BUILD SUBSET of results
| CALL FlippingPoliciesForHost(subset) <-- DB query #1
| convert result to set, filter, queue scripts
|
+-- processSoftwareForNewlyFailingPolicies
| filter to failing policies with installers
| BUILD SUBSET of results
| CALL FlippingPoliciesForHost(subset) <-- DB query #2
| convert result to set, filter, queue installs
|
+-- processVPPForNewlyFailingPolicies
| filter to failing policies with VPP apps
| BUILD SUBSET of results
| CALL FlippingPoliciesForHost(subset) <-- DB query #3
| convert result to set, filter, queue VPP
|
+-- webhook filtering
| filter to webhook-enabled policies
| CALL FlippingPoliciesForHost(subset) <-- DB query #4
| register flipped policies in Redis
|
+-- RecordPolicyQueryExecutions
CALL FlippingPoliciesForHost(all results) <-- DB query #5
reset attempt counters for newly passing
INSERT/UPDATE policy_membership
```
Each `FlippingPoliciesForHost` call runs `SELECT policy_id, passes FROM
policy_membership WHERE host_id = ? AND policy_id IN (?)`. All 5 queries
hit the same table for the same host before `policy_membership` is
updated, so they all see identical state.
Each consumer also built intermediate maps to narrow down to its subset
before calling `FlippingPoliciesForHost`, then converted the result into
yet another set for filtering. This meant 3-4 temporary maps per
consumer.
## After
```
SubmitDistributedQueryResults(policyResults)
|
CALL FlippingPoliciesForHost(all results) <-- single DB query
build newFailingSet, normalize newPassing
|
+-- processScriptsForNewlyFailingPolicies
| filter to failing policies with scripts
| CHECK newFailingSet (in-memory map lookup)
| queue scripts
|
+-- processSoftwareForNewlyFailingPolicies
| filter to failing policies with installers
| CHECK newFailingSet (in-memory map lookup)
| queue installs
|
+-- processVPPForNewlyFailingPolicies
| filter to failing policies with VPP apps
| CHECK newFailingSet (in-memory map lookup)
| queue VPP
|
+-- webhook filtering
| filter to webhook-enabled policies
| FILTER newFailing/newPassing by policy IDs (in-memory)
| register flipped policies in Redis
|
+-- RecordPolicyQueryExecutions
USE pre-computed newPassing (skip DB query)
reset attempt counters for newly passing
INSERT/UPDATE policy_membership
```
The intermediate subset maps and per-consumer set conversions are
removed. Each process function goes directly from "policies with
associated automation" to "is this policy in newFailingSet?" in a single
map lookup.
# Checklist for submitter
If some of the following don't apply, delete the relevant line.
- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
## Testing
- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Performance Improvements**
* Reduced redundant database queries during policy result submissions by
computing flipping policies once per host check-in instead of multiple
times.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#40540
# Checklist for submitter
- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
- Changes present in previous PR
## Testing
- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Updated internal logging infrastructure to improve consistency and
maintainability across the application.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#40054
# Checklist for submitter
- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
- Changes included in previous PR
## Testing
- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Consolidated and standardized internal logging infrastructure across
the application by adopting a unified logging package throughout the
codebase, replacing previous external logging dependencies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Fixes#32313
OpenTelemetry Tracing
- Added tracing to async task collectors: FlushHostsLastSeen,
collectHostsLastSeen, collectLabelQueryExecutions,
collectPolicyQueryExecutions, collectScheduledQueryStats
- Updated HTTP middleware to use OTEL semantic convention for span names
({method} {route})
- Added OTELEnabled() helper to FleetConfig
Optimizations
- Reduced OTEL batch size from 512 to 256 spans to prevent gRPC message
size errors
- Enabled gzip compression for trace exports
NOTE: I tried to improve OTEL instrumentation for cron jobs, but it got
too complicated due to goroutines in `schedule.go` so that effort should
be separate. We do have SQL instrumentation for cron jobs, but we are
missing root spans for cron jobs as a whole.
# Checklist for submitter
If some of the following don't apply, delete the relevant line.
- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
## Testing
- [x] QA'd all new/changed functionality manually
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Expanded OpenTelemetry tracing for async tasks (host last seen, label
membership, policy membership, scheduled query stats) to provide richer
observability.
* More descriptive HTTP span names using “METHOD /route” for clearer
trace analysis.
* **Bug Fixes**
* Improved OTLP gRPC exporter reliability by enabling gzip compression
and reducing export batch size, mitigating intermittent gRPC errors.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
For #26713
# Details
This PR updates Fleet and its related tools and binaries to use Go
version 1.24.1.
Scanning through the changelog, I didn't see anything relevant to Fleet
that requires action. The only possible breaking change I spotted was:
> As [announced](https://tip.golang.org/doc/go1.23#linux) in the Go 1.23
release notes, Go 1.24 requires Linux kernel version 3.2 or later.
Linux kernel 3.2 was released in January of 2012, so I think we can
commit to dropping support for earlier kernel versions.
The new [tools directive](https://tip.golang.org/doc/go1.24#tools) is
interesting as it means we can move away from using `tools.go` files,
but it's not a required update.
# Checklist for submitter
If some of the following don't apply, delete the relevant line.
<!-- Note that API documentation changes are now addressed by the
product design team. -->
- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
- [x] Manual QA for all new/changed functionality
- For Orbit and Fleet Desktop changes:
- [X] Make sure fleetd is compatible with the latest released version of
Fleet
- [x] Orbit runs on macOS ✅ , Linux ✅ and Windows.
- [x] Manual QA must be performed in the three main OSs, macOS ✅,
Windows and Linux ✅.
`go-kit/kit/log` was deprecated and generating warnings
# Checklist for submitter
If some of the following don't apply, delete the relevant line.
<!-- Note that API documentation changes are now addressed by the
product design team. -->
- [x] Manual QA for all new/changed functionality
#16480
# Checklist for submitter
- [x] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- [x] Added/updated tests
- [x] Manual QA for all new/changed functionality
#13643
Updating the `policies` table to use a checksum column for uniqueness.
The checksum is computed with team_id (which may be null) and name. This
change is modeled on the checksum in the software table.
# Checklist for submitter
If some of the following don't apply, delete the relevant line.
<!-- Note that API documentation changes are now addressed by the
product design team. -->
- [x] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- [x] Added/updated tests
- [x] If database migrations are included, checked table schema to
confirm autoupdate
- For database migrations:
- [x] Checked schema for all modified table for columns that will
auto-update timestamps during migration.
- [x] Confirmed that updating the timestamps is acceptable, and will not
cause unwanted side effects.
- [x] Manual QA for all new/changed functionality
#15472
# Checklist for submitter
- [x] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
- [x] Added/updated tests
- [x] Manual QA for all new/changed functionality
For #13715, this:
- Upgrades the Go version to `1.21.1`, infrastructure changes are
addressed separately at https://github.com/fleetdm/fleet/pull/13878
- Upgrades the linter version, as the current version doesn't work well
after the Go upgrade
- Fixes new linting errors (we now get errors for memory aliasing in
loops! 🎉 )
After this is merged people will need to:
1. Update their Go version. I use `gvm` and I did it like:
```
$ gvm install go1.21.1
$ gvm use go1.21.1 --default
```
2. Update the local version of `golangci-lint`:
```
$ go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.54.2
```
3. (optional) depending on your setup, you might need to re-install some
packages, for example:
```
# goimports to automatically import libraries
$ go install golang.org/x/tools/cmd/goimports@latest
# gopls for the language server
$ go install golang.org/x/tools/gopls@latest
# etc...
```
#13527
(Adding @mna to double check the changes in the async implementation of
policy result storage)
This PR also adds the osquery-perf changes needed to define the count of
macOS and Windows hosts.
- [X] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or
docs/Contributing/API-for-contributors.md)~
- ~[ ] Documented any permissions changes (docs/Using
Fleet/manage-access.md)~
- [X] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for
new osquery data ingestion features.
- [X] Added/updated tests
- [X] Manual QA for all new/changed functionality
- ~For Orbit and Fleet Desktop changes:~
- ~[ ] Manual QA must be performed in the three main OSs, macOS, Windows
and Linux.~
- ~[ ] Auto-update manual QA, from released version of component to new
version (see [tools/tuf/test](../tools/tuf/test/README.md)).~
Test with 80k hosts: 70k simulated macOS, 10k simulated Windows.
Apply Windows policies first, then apply macOS policies:
```
fleetctl apply -f ee/cis/win-10/cis-policy-queries.yml
# Leave running for some time
fleetctl apply -f ee/cis/macos-13/cis-policy-queries.yml
```
After applying CIS policies previous to these changes:

After applying these changes and applying the same policies:

* Add sentry
* Fix gosum
* More gosum fixes
* Add missing def for config
* Enrich sentry scope a bit
* Add changes file
* Add goroutine safe scope to errors
* Encapsulate sentry logic
* Add documentation for new flag
* Add sentry capturing to crons and other background tasks
* Only send to sentry when enabled
* Serialize hosts writes per instance
* Write hosts asynchronously
* Dont make the save in a goroutine
* Revert "Dont make the save in a goroutine"
This reverts commit 4a890c5271.
* Make all savehosts async
* Address review comments and make this approach configurable
* Address review comments
* Disable bulk seen time marking for a test
* Move host seen times to a new table
* Remove unused
* Add seen_time to list hosts
* Add some jitter to seen time flushing
* Remove unused
* Add timeout to deferred save host
* Add tests for serialSaveHost
* Update hosts in labels and policy executions in a serial way
* Address review comments and remove fk constraints in host software
* Make errCh buffered
* Add changes file
* Readd key