Commit graph

6 commits

Author SHA1 Message Date
Sharon Katz
3e38592fda
Fix FD leak in goval_dictionary Analyze (#42741) (#43983)
**Related issue:** Resolves #42741

## Problem
`goval_dictionary.Analyze` opened a `*sql.DB` via `LoadDb` but never
closed it. `pkg/download/download.go` atomically renames the goval
sqlite on each refresh, unlinking the old inode while the pool still
held FDs on it. lsof showed them as `(deleted)`, accumulating over days
until Fleet server restart.

## Fix
- New `Database.Close()` that delegates to the underlying `*sql.DB`.
- `defer func() { _ = db.Close() }()` in `Analyze` right after `LoadDb`.

## How this was tested
- New unit test `TestDatabaseCloseReleasesFileHandle` opens a
file-backed sqlite, runs a query to force a pool connection, then
asserts Close drains the pool and blocks further queries.
- `go test ./server/vulnerabilities/goval_dictionary/...` passes.
- Standalone Go program reproduced the leak mechanism: `sql.Open` +
query + unlink left the FD on the orphaned inode; adding Close released
it.

## Confidence and QA
~90% confident. I did not reproduce end-to-end through Fleet's vuln cron
locally (the analyzer never entered its query loop; likely
`HostIDsByOSVersion` hadn't populated for the Rocky test host).
Reviewer: flag anything that drops your confidence. @xpkoala for QA
after merge: please exercise in a production-like env with enrolled RHEL
hosts and confirm no `(deleted)` FDs after goval refreshes.

# Checklist for submitter
- [x] Changes file added for user-visible changes in `changes/`
(`changes/42741-fix-goval-dictionary-fd-leak`).
- [x] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (N/A, no new input paths).
- [x] Timeouts are implemented and retries are limited to avoid infinite
loops (N/A, no new network calls).
- [x] If paths of existing endpoints are modified without backwards
compatibility, checked the frontend/CLI for any necessary changes (N/A,
no endpoint changes).

## Testing
- [x] Added/updated automated tests
- [x] Where appropriate, automated tests simulate multiple hosts and
test for host isolation (N/A, package-level unit test).
- [ ] QA'd all new/changed functionality manually (pending, post-merge
by @xpkoala).

## Database migrations
- [x] Checked schema for modified tables for auto-updating timestamp
columns (N/A, no schema changes).
- [x] Confirmed timestamp updates are acceptable (N/A, no schema
changes).
- [x] Ensured correct collation is explicitly set for character columns
(N/A, no schema changes).

## New Fleet configuration settings
- [x] Setting(s) is/are explicitly excluded from GitOps (N/A, no new
settings).

## fleetd/orbit/Fleet Desktop
- [x] Verified compatibility with the latest released version of Fleet
(N/A, server-only change).
- [x] If the change applies to only one platform, confirmed
`runtime.GOOS` is used (N/A).
- [x] Verified fleetd runs on macOS, Linux and Windows (N/A, server-only
change).
- [x] Verified auto-update works (N/A, server-only change).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Fixed a file-descriptor leak in vulnerability processing so deleted
SQLite database files are properly closed without requiring a server
restart, improving stability and resource usage.

* **Tests**
* Added a regression test to ensure database handles are released after
close.

* **Documentation**
  * Documented the fix for the file-descriptor leak.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-29 12:31:04 -04:00
Victor Lyuboslavsky
ae4ccdf6d3
Migrating vulnerabilities pkgs to slog. (#40106)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #40054 

# Checklist for submitter

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
  - Included in previous PR

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Migrated logging infrastructure from external framework to standard
library structured logging, enabling improved context-aware operations
and error tracking across vulnerability detection and synchronization
workflows.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-20 15:36:38 -06:00
Victor Lyuboslavsky
092b51f1c2
Vulnerabilities cron optimization (#39820)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #31820 and #39898

Vulnerability processing performance improvements, and added OTEL spans
to the vulnerabilities cron job.
Optimized the two main bottlenecks in the vulnerability cron job: CPE
matching and CVE insertion. In my loadtest testing (10K hosts), the
overall initial vulnerabilities job went from over 2 hours down to 53
minutes, and the number of spans (DB accesses) went from ~2 million to
~90K.

1. CPE matching (TranslateSoftwareToCPE): replaced the goqu query
builder with hand-written SQL using raw database/sql queries. Replaced
UNION with separate queries because case number 3 was an expensive full
text match operation and in most cases we did not need to do that.

2. CVE insertion (TranslateCPEToCVE and other places): replaced
individual INSERT INTO software_cve ... VALUES (?,?,?,?) calls with
batch inserts of 500 rows each, using the existing BatchProcessSimple
helper. Same pattern applied to OS vulnerability inserts using the
existing InsertOSVulnerabilities batch method.

Functional equivalence verified using osquery perf dataset locally. Both
changes produce identical output (22,366 CPEs, 131,233 CVEs) when
compared against the old code using a before/after comparison tool.
- CPE caveats: bugs #39898 and
https://github.com/fleetdm/fleet/issues/39899 found

# Checklist for submitter

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Expanded tracing for automated vulnerability workflows to improve
observability.

* **Performance**
* Bulk/batched processing for software and OS vulnerability inserts to
speed ingestion and downstream tasks.
* More efficient CPE lookup and read-optimized database access for
faster translations.

* **Bug Fixes**
* Improved error recording and read-after-write consistency to reduce
missed or duplicate vulnerability notifications.

* **Tests**
  * Test suite updated to support batch insertion semantics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-18 13:59:15 -06:00
Tim Lee
fb2ddde9bf
Scan goval-dict for rhel kernel vulnerabilities(#39749) 2026-02-12 15:21:59 -07:00
Konstantin Sykulev
bd2b2bcd3b
validate generate-cve.yml outputs (#26752)
https://github.com/fleetdm/fleet/issues/21300

- [x] Added/updated automated tests
- [x] A detailed QA plan exists on the associated ticket (if it isn't
there, work with the product group's QA engineer to add it)
- [x] Manual QA for all new/changed functionality
2025-03-12 14:49:47 -05:00
Ian Littman
e96c70e4c0
Pull xz'd goval-dictionary sqlite files to evaluate vulnerabilities on Amazon Linux hosts (#21506)
#20934

This is tied to https://github.com/fleetdm/vulnerabilities/pull/14; for
supported OS versions (currently Amazon Linux 1/2/2022/2023) we'll pull
XZ'd sqlite files from the vulnerabilities repo and query them to
determine what's vulnerable. See the associated issue for how I
self-QA'd this.

This replaced OVAL parsing for Amazon Linux 2, as we were using the
wrong data source there (Amazon has backported a bunch of fixes to their
own-named releases, so any RHEL fixes don't match).

Some checklist items are missing here; getting this set up in draft to
get code feedback now, and I'll push updates with e.g. docs changes, as
well ass an addition to the changes file.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Committing-Changes.md#changes-files)
for more information.
- [x] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- [x] Added/updated tests
    - [x] Add tests to oval_platform
    - [x] Add sync_test
    - [x] Add database_test
- [x] Manual QA for all new/changed functionality
- [x] Update vulnerability management docs
2024-08-26 14:07:42 -05:00