fleet

mirror of https://github.com/fleetdm/fleet synced 2026-05-24 09:28:54 +00:00

Author	SHA1	Message	Date
Victor Lyuboslavsky	8af94af14b	Removed duplicate FlippingPoliciesForHost DB calls (#42845 ) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> Related issue: Resolves #42836 This is another hot path optimization. ## Before When a host submits policy results via `SubmitDistributedQueryResults`, the system needed to determine which policies "flipped" (changed from passing to failing or vice versa). Each consumer computed this independently: ``` SubmitDistributedQueryResults(policyResults) \| +-- processScriptsForNewlyFailingPolicies \| filter to failing policies with scripts \| BUILD SUBSET of results \| CALL FlippingPoliciesForHost(subset) <-- DB query #1 \| convert result to set, filter, queue scripts \| +-- processSoftwareForNewlyFailingPolicies \| filter to failing policies with installers \| BUILD SUBSET of results \| CALL FlippingPoliciesForHost(subset) <-- DB query #2 \| convert result to set, filter, queue installs \| +-- processVPPForNewlyFailingPolicies \| filter to failing policies with VPP apps \| BUILD SUBSET of results \| CALL FlippingPoliciesForHost(subset) <-- DB query #3 \| convert result to set, filter, queue VPP \| +-- webhook filtering \| filter to webhook-enabled policies \| CALL FlippingPoliciesForHost(subset) <-- DB query #4 \| register flipped policies in Redis \| +-- RecordPolicyQueryExecutions CALL FlippingPoliciesForHost(all results) <-- DB query #5 reset attempt counters for newly passing INSERT/UPDATE policy_membership ``` Each `FlippingPoliciesForHost` call runs `SELECT policy_id, passes FROM policy_membership WHERE host_id = ? AND policy_id IN (?)`. All 5 queries hit the same table for the same host before `policy_membership` is updated, so they all see identical state. Each consumer also built intermediate maps to narrow down to its subset before calling `FlippingPoliciesForHost`, then converted the result into yet another set for filtering. This meant 3-4 temporary maps per consumer. ## After ``` SubmitDistributedQueryResults(policyResults) \| CALL FlippingPoliciesForHost(all results) <-- single DB query build newFailingSet, normalize newPassing \| +-- processScriptsForNewlyFailingPolicies \| filter to failing policies with scripts \| CHECK newFailingSet (in-memory map lookup) \| queue scripts \| +-- processSoftwareForNewlyFailingPolicies \| filter to failing policies with installers \| CHECK newFailingSet (in-memory map lookup) \| queue installs \| +-- processVPPForNewlyFailingPolicies \| filter to failing policies with VPP apps \| CHECK newFailingSet (in-memory map lookup) \| queue VPP \| +-- webhook filtering \| filter to webhook-enabled policies \| FILTER newFailing/newPassing by policy IDs (in-memory) \| register flipped policies in Redis \| +-- RecordPolicyQueryExecutions USE pre-computed newPassing (skip DB query) reset attempt counters for newly passing INSERT/UPDATE policy_membership ``` The intermediate subset maps and per-consumer set conversions are removed. Each process function goes directly from "policies with associated automation" to "is this policy in newFailingSet?" in a single map lookup. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. ## Testing - [x] Added/updated automated tests - [x] QA'd all new/changed functionality manually <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Performance Improvements * Reduced redundant database queries during policy result submissions by computing flipping policies once per host check-in instead of multiple times. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-04-06 10:11:07 -05:00
Victor Lyuboslavsky	4dfdc870bd	slog migration: service layer + subsystem libraries (#40661 ) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> Related issue: Resolves #40540 # Checklist for submitter - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. - Changes present in previous PR ## Testing - [x] Added/updated automated tests - [x] QA'd all new/changed functionality manually <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Updated internal logging infrastructure to improve consistency and maintainability across the application. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-02-26 17:40:46 -06:00
Victor Lyuboslavsky	c14bea44de	Replaced all kitlog.Logger instances with the intermediate logging.Logger (#40425 ) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> Related issue:* Resolves #40054 # Checklist for submitter - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. - Changes included in previous PR ## Testing - [x] Added/updated automated tests - [x] QA'd all new/changed functionality manually <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Consolidated and standardized internal logging infrastructure across the application by adopting a unified logging package throughout the codebase, replacing previous external logging dependencies. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-02-24 18:52:45 -06:00
Victor Lyuboslavsky	84e45f6fa1	OpenTelemetry minor improvements (#32324 ) Fixes #32313 OpenTelemetry Tracing - Added tracing to async task collectors: FlushHostsLastSeen, collectHostsLastSeen, collectLabelQueryExecutions, collectPolicyQueryExecutions, collectScheduledQueryStats - Updated HTTP middleware to use OTEL semantic convention for span names ({method} {route}) - Added OTELEnabled() helper to FleetConfig Optimizations - Reduced OTEL batch size from 512 to 256 spans to prevent gRPC message size errors - Enabled gzip compression for trace exports NOTE: I tried to improve OTEL instrumentation for cron jobs, but it got too complicated due to goroutines in `schedule.go` so that effort should be separate. We do have SQL instrumentation for cron jobs, but we are missing root spans for cron jobs as a whole. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. ## Testing - [x] QA'd all new/changed functionality manually <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Expanded OpenTelemetry tracing for async tasks (host last seen, label membership, policy membership, scheduled query stats) to provide richer observability. * More descriptive HTTP span names using “METHOD /route” for clearer trace analysis. * Bug Fixes * Improved OTLP gRPC exporter reliability by enabling gzip compression and reducing export batch size, mitigating intermittent gRPC errors. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-08-28 19:32:46 -05:00
Scott Gress	59f96651b6	Update to Go 1.24.1 (#27506 ) For #26713 # Details This PR updates Fleet and its related tools and binaries to use Go version 1.24.1. Scanning through the changelog, I didn't see anything relevant to Fleet that requires action. The only possible breaking change I spotted was: > As [announced](https://tip.golang.org/doc/go1.23#linux) in the Go 1.23 release notes, Go 1.24 requires Linux kernel version 3.2 or later. Linux kernel 3.2 was released in January of 2012, so I think we can commit to dropping support for earlier kernel versions. The new [tools directive](https://tip.golang.org/doc/go1.24#tools) is interesting as it means we can move away from using `tools.go` files, but it's not a required update. # Checklist for submitter If some of the following don't apply, delete the relevant line. <!-- Note that API documentation changes are now addressed by the product design team. --> - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [X] Make sure fleetd is compatible with the latest released version of Fleet - [x] Orbit runs on macOS ✅ , Linux ✅ and Windows. - [x] Manual QA must be performed in the three main OSs, macOS ✅, Windows and Linux ✅.	2025-03-31 11:14:09 -05:00
Victor Lyuboslavsky	f85b6f776f	Updating golangci-lint to 1.61.0 (#22973 )	2024-10-18 12:38:26 -05:00
Roberto Dip	1cc13a09fb	🧹 friday cleanup party: substitute deprecated import of go-kit (#19774 ) `go-kit/kit/log` was deprecated and generating warnings # Checklist for submitter If some of the following don't apply, delete the relevant line. <!-- Note that API documentation changes are now addressed by the product design team. --> - [x] Manual QA for all new/changed functionality	2024-06-17 10:27:31 -03:00
Martin Angers	c5b988d600	Fix stack trace of captured errors in Sentry, capture errors in more code paths (#16966 ) #16480 # Checklist for submitter - [x] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - [x] Added/updated tests - [x] Manual QA for all new/changed functionality	2024-02-22 15:10:28 -03:00
Victor Lyuboslavsky	dbf53cae6a	Policies are now unique for (team_id, name). (#16501 ) #13643 Updating the `policies` table to use a checksum column for uniqueness. The checksum is computed with team_id (which may be null) and name. This change is modeled on the checksum in the software table. # Checklist for submitter If some of the following don't apply, delete the relevant line. <!-- Note that API documentation changes are now addressed by the product design team. --> - [x] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - [x] Added/updated tests - [x] If database migrations are included, checked table schema to confirm autoupdate - For database migrations: - [x] Checked schema for all modified table for columns that will auto-update timestamps during migration. - [x] Confirmed that updating the timestamps is acceptable, and will not cause unwanted side effects. - [x] Manual QA for all new/changed functionality	2024-02-02 17:41:32 -06:00
Victor Lyuboslavsky	9236a19342	Changed query performance statistics to uint64 to match osquery reports. (#15505 ) #15472 # Checklist for submitter - [x] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. - [x] Added/updated tests - [x] Manual QA for all new/changed functionality	2023-12-11 11:29:17 -06:00
Roberto Dip	ea6b59f179	upgrade Go version to 1.21.1 (#13877 ) For #13715, this: - Upgrades the Go version to `1.21.1`, infrastructure changes are addressed separately at https://github.com/fleetdm/fleet/pull/13878 - Upgrades the linter version, as the current version doesn't work well after the Go upgrade - Fixes new linting errors (we now get errors for memory aliasing in loops! 🎉 ) After this is merged people will need to: 1. Update their Go version. I use `gvm` and I did it like: ``` $ gvm install go1.21.1 $ gvm use go1.21.1 --default ``` 2. Update the local version of `golangci-lint`: ``` $ go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.54.2 ``` 3. (optional) depending on your setup, you might need to re-install some packages, for example: ``` # goimports to automatically import libraries $ go install golang.org/x/tools/cmd/goimports@latest # gopls for the language server $ go install golang.org/x/tools/gopls@latest # etc... ```	2023-09-13 15:59:35 -03:00
Lucas Manuel Rodriguez	9142c5de79	Prevent thundering herd when applying large number of policies on large number of hosts (#13552 ) #13527 (Adding @mna to double check the changes in the async implementation of policy result storage) This PR also adds the osquery-perf changes needed to define the count of macOS and Windows hosts. - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes (docs/Using Fleet/manage-access.md)~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [X] Manual QA for all new/changed functionality - ~For Orbit and Fleet Desktop changes:~ - ~[ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux.~ - ~[ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).~ Test with 80k hosts: 70k simulated macOS, 10k simulated Windows. Apply Windows policies first, then apply macOS policies: ``` fleetctl apply -f ee/cis/win-10/cis-policy-queries.yml # Leave running for some time fleetctl apply -f ee/cis/macos-13/cis-policy-queries.yml ``` After applying CIS policies previous to these changes: ![Screenshot 2023-08-23 at 11 36 18](https://github.com/fleetdm/fleet/assets/2073526/72c1dc7d-e601-4248-be35-93c85b749f5d) After applying these changes and applying the same policies: ![Screenshot 2023-08-28 at 15 42 57](https://github.com/fleetdm/fleet/assets/2073526/6b6d76b8-6acb-4893-a913-bf603a68f1a4)	2023-08-31 10:58:50 -03:00
Lucas Manuel Rodriguez	2afbd24021	Combine Schedules and Queries: API changes (#12778 ) Combining schedules and queries API changes.	2023-07-24 20:17:20 -04:00
Juan Fernandez	6df0768803	Fixed broken tests	2023-07-07 09:59:16 -04:00
gillespi314	94dd1c3745	Ingest pending MDM hosts (#9065 ) Co-authored-by @roperzh	2022-12-26 15:32:39 -06:00
gillespi314	6fb3a87ae9	Enable `errcheck` linter for `golangci-lint` (#8899 )	2022-12-05 16:50:49 -06:00
Martin Angers	9755eb2e27	Support async saving of scheduled query statistics (#7012 )	2022-08-10 10:01:05 -04:00
Lucas Manuel Rodriguez	de1717291d	Set authz checked when rate limiting device endpoints (#6702 ) * Set authz checked when rate limiting device endpoints * Unexport var and attempt to fix flaky test	2022-07-18 14:22:49 -03:00
Roberto Dip	4a867d53dc	use a single context for background jobs and HTTP handlers (#6313 )	2022-06-21 15:09:00 -03:00
Martin Angers	e6b90ca8b9	Support per-task configuration for async host processing configuration (#5700 )	2022-05-16 09:44:50 -04:00
Martin Angers	1fa7bb7a19	Support async saving of hosts' last seen time (#5640 )	2022-05-10 11:29:17 -04:00
Tomas Touceda	9d572309ae	Add sentry (#3669 ) * Add sentry * Fix gosum * More gosum fixes * Add missing def for config * Enrich sentry scope a bit * Add changes file * Add goroutine safe scope to errors * Encapsulate sentry logic * Add documentation for new flag * Add sentry capturing to crons and other background tasks * Only send to sentry when enabled	2022-01-20 16:41:02 -03:00
Martin Angers	afb3310937	Migrate team-related endpoints to new pattern (#3740 )	2022-01-19 10:52:14 -05:00
Martin Angers	f19e676e62	Refactor async host processing to avoid redis SCAN keys (for policies) (#3657 )	2022-01-18 09:56:43 -05:00
Martin Angers	1f185a7a8b	Refactor async host processing to avoid redis SCAN keys (for labels only) (#3639 )	2022-01-17 14:53:59 -05:00
Martin Angers	69a4985cac	Use new error handling approach in other packages (#2954 )	2021-11-22 09:13:26 -05:00
Martin Angers	b57b64ccb2	Add total and per platform counts to host summary endpoint (#2845 )	2021-11-09 09:35:36 -05:00
Tomas Touceda	7db6de7278	Serialize hosts writes per instance (#2753 ) * Serialize hosts writes per instance * Write hosts asynchronously * Dont make the save in a goroutine * Revert "Dont make the save in a goroutine" This reverts commit `4a890c5271`. * Make all savehosts async * Address review comments and make this approach configurable * Address review comments * Disable bulk seen time marking for a test * Move host seen times to a new table * Remove unused * Add seen_time to list hosts * Add some jitter to seen time flushing * Remove unused * Add timeout to deferred save host * Add tests for serialSaveHost * Update hosts in labels and policy executions in a serial way * Address review comments and remove fk constraints in host software * Make errCh buffered * Add changes file * Readd key	2021-11-08 11:42:37 -03:00
Martin Angers	a8735d55bb	Implement async processing of hosts for label queries (#2288 )	2021-11-01 14:13:16 -04:00

29 commits