<!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #28584 The correct fix for the bug was to add a migration to update existing software rows to match the new naming convention. However, that should have been done in Fleet 4.67, and that ship has already sailed. See the issue description in the `Name changes and the rename problem` of the doc.
4.9 KiB
Software inventory architecture
This document provides an overview of Fleet's Software Inventory architecture.
Introduction
Software Inventory in Fleet provides visibility into the software installed on devices across the fleet. This document provides insights into the design decisions, system components, and interactions specific to the Software Inventory functionality.
Software identity
A software row in the software table is uniquely identified by the combination of these fields:
name- the software name as reported by osqueryversionsource- e.g.,apps,deb_packages,programs,chrome_extensionsbundle_identifier- macOS bundle ID (e.g.,org.mozilla.firefox)release,vendor,arch- included if any is non-emptyextension_id,extension_for- included if any is non-emptyapplication_id- included if non-emptyupgrade_code- included if non-empty (Windows MSI)
These fields are combined into a checksum (ComputeRawChecksum in server/fleet/software.go) stored as a unique index on the software table. The same fields (in a different format) produce the ToUniqueStr() used for in-memory comparisons during ingestion.
The software table is shared across hosts
The software table is a global catalog -each row represents a unique piece of software, and multiple hosts link to the same row via the host_software join table. This means modifying a software row affects all hosts that reference it.
Software titles
Software titles (software_titles table) group related software versions under a single name for display in the UI. For macOS apps with a bundle_identifier, the title is matched by bundle ID (not name), so different software rows with different names but the same bundle ID share a single title.
Why name is part of software identity
Name is included in the software identity (checksum and unique string) because multiple distinct software entries can share the same bundle_identifier and version. For example, macOS helper binaries inside an app bundle:
"Postman Helper (GPU)", version="", bundle_id="com.postmanlabs.mac.helper"
"Postman Helper (Renderer)", version="", bundle_id="com.postmanlabs.mac.helper"
These are different executables that osquery discovers independently. Without name in the identity, they would collapse into a single row, losing visibility into what's actually installed.
Name changes and the rename problem
Because name is part of software identity, changing how Fleet computes the name (e.g., modifying the osquery query to use display_name instead of the raw filename) creates a mismatch between what's stored in the database and what osquery reports on the next check-in. This triggers the ingestion pipeline to treat the software as "uninstalled" (old name) and "newly installed" (new name), which:
- Creates a new software row with a new ID
- Deletes the old
host_softwarelink and creates a new one - Orphans the old software row (cleaned up later by
SyncHostsSoftware) - Loses vulnerability associations until the next vulnerability scan
This is normally a non-issue during regular operation because osquery consistently reports the same name for a given app. It only becomes a problem when Fleet changes how software names are derived -such as modifying the osquery query to prefer display_name over the raw filename (see #28584). In those cases, a database migration should also update existing software rows to match the new naming convention.
There is no clean way to distinguish "same software, name changed" from "different software, same bundle_id" in the general case. A 1:1 rename heuristic (match by bundle_id+version when there's exactly one entry on each side) works for most apps but fails for the helper binary case described above.
Ingestion pipeline
The software ingestion pipeline runs when a host checks in (approximately once per hour). The entry point is applyChangesForNewSoftwareDB in server/datastore/mysql/software.go.
Flow
- Read current state -
listSoftwareByHostIDShortreads the host's current software from the replica DB - Diff -
nothingChangedcompares current vs incoming byToUniqueStr(). If identical, no DB writes occur - Lookup existing -
getExistingSoftwarecomputes checksums for new incoming software and looks them up in the DB - Phase 1 (outside transaction) -
preInsertSoftwareInventoryinserts new software titles and software rows viaINSERT IGNOREin small batches to reduce lock contention - Phase 2 (transaction) -Deletes
host_softwarelinks for uninstalled software, creates links for new software, and updateslast_opened_attimestamps
Visibility
The UI endpoint (ListHostSoftware) uses hostInstalledSoftware which joins host_software → software → software_titles with INNER JOIN.