mirror of
https://github.com/fleetdm/fleet
synced 2026-05-24 09:28:54 +00:00
<!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #43928 This PR adds a Redis-backed cache in front of the two host-by-key lookups on the agent auth paths. Docs: https://github.com/fleetdm/fleet/pull/44504 ## What changes **Read path (osquery/orbit auth):** - `LoadHostByNodeKey` and `LoadHostByOrbitNodeKey` now check Redis before falling through to MySQL. - Successful lookups are cached for 60s ± 10% jitter (configurable via `FLEET_REDIS_HOST_CACHE_TTL`). - `NotFound` results are cached for 5s as a negative entry, dampening repeated probes for keys that do not exist (deleted hosts whose agents are still polling, attacker scans, retry storms). - Concurrent lookups for the same key collapse into one DB query via `singleflight`. The shared query runs under a context detached from any one caller's deadline so the leader giving up does not abort the work for joiners. The shared query is itself bounded by a 30s timeout so a wedged DB call cannot pin the singleflight slot indefinitely. **Write path (invalidations):** - These methods now invalidate the cache after a successful inner call: `UpdateHost`, `SerialUpdateHost`, `UpdateHostOsqueryIntervals`, `UpdateHostRefetchRequested`, `UpdateHostRefetchCriticalQueriesUntil`, `UpdateHostIdentityCertHostIDBySerial`, `EnrollOsquery`, `EnrollOrbit`, `NewHost`, `DeleteHost`, `DeleteHosts`, `CleanupExpiredHosts`, `CleanupIncomingHosts`, `AddHostsToTeam`. - `AddHostsToTeam`, `DeleteHosts`, `CleanupExpiredHosts`, and `CleanupIncomingHosts` use a pipelined batch invalidator so 10k-host operations stay in the millisecond range instead of taking minutes of sequential round-trips. - Inner-call errors are not invalidations: a failing write leaves cached state intact. **Configuration:** - New flags `FLEET_REDIS_HOST_CACHE_ENABLED` (default `true`) and `FLEET_REDIS_HOST_CACHE_TTL` (default `60s`). - Server refuses to start if the cache is enabled with `TTL <= 0`. **Observability:** - Three new OTEL counters under the `fleet` meter: - `fleet.host_cache.lookups{result=hit|negative_hit|miss}` - `fleet.host_cache.errors{op=get|set|del}` - `fleet.host_cache.invalidations{reason=update|enroll|team|delete|cert}` - A pre-built SigNoz dashboard ships in `tools/signoz/host_cache_dashboard.json`. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [x] Timeouts are implemented and retries are limited to avoid infinite loops ## Testing - [x] Added/updated automated tests - [x] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [x] QA'd all new/changed functionality manually <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Optional Redis-backed host lookup cache for osquery and orbit auth, with automatic invalidation and metrics/monitoring dashboard. * **Bug Fixes** * Fixed host-removal batching so cache-related removals use correct chunks. * **Tests** * Added comprehensive host-cache unit tests covering hits, negative cache, invalidation, concurrency, and JSON round-trips. * **Chores** * New config flags to enable the cache and set TTL (default 60s ±10% jitter). <!-- end of auto-generated comment: release notes by coderabbit.ai -->
62 lines
2.6 KiB
Go
62 lines
2.6 KiB
Go
package mysqlredis
|
|
|
|
import (
|
|
"github.com/fleetdm/fleet/v4/server/fleet"
|
|
)
|
|
|
|
// hostCacheEnvelope is the JSON wire format for cached host lookups. It
|
|
// embeds fleet.Host so every normally-serializable field rides along
|
|
// automatically, then shadows the four fields fleet.Host tags `json:"-"` to
|
|
// keep out of HTTP responses: OsqueryHostID, NodeKey, OrbitNodeKey, and
|
|
// HasHostIdentityCert. These four MUST round-trip or auth breaks (a cached
|
|
// host with HasHostIdentityCert=nil would cause AuthenticateHost to skip the
|
|
// httpsig check for up to TTL).
|
|
//
|
|
// Why embedding works without collision: the embedded fleet.Host has those
|
|
// four fields tagged `json:"-"`, so encoding/json skips them entirely. Our
|
|
// outer fields carry the real JSON names. On unmarshal, the tagged JSON keys
|
|
// map to the outer fields; toHost() then copies them back onto the embedded
|
|
// Host so downstream code can read them in their natural positions.
|
|
//
|
|
// One envelope serves both LoadHostByNodeKey and LoadHostByOrbitNodeKey
|
|
// because their SELECT lists differ only in which fleet.Host fields they
|
|
// populate; unpopulated pointer/slice fields fall out via omitempty, and
|
|
// the handful of non-pointer orbit-specific fields (MDM.EncryptionKeyAvailable)
|
|
// are small enough that the constant overhead doesn't matter.
|
|
//
|
|
// When fleet.Host gains a new `json:"-"` field that downstream auth code
|
|
// reads, add a shadow here in lockstep. TestPBT_HostCacheEnvelopeRoundTrip
|
|
// catches drift by asserting full-struct equivalence after marshal/unmarshal.
|
|
type hostCacheEnvelope struct {
|
|
fleet.Host
|
|
|
|
OsqueryHostID *string `json:"osquery_host_id,omitempty"`
|
|
NodeKey *string `json:"node_key,omitempty"`
|
|
OrbitNodeKey *string `json:"orbit_node_key,omitempty"`
|
|
HasHostIdentityCert *bool `json:"has_host_identity_cert,omitempty"`
|
|
}
|
|
|
|
// envelopeFromHost builds an envelope suitable for JSON marshaling by copying
|
|
// the four json:"-" shadow fields out of the embedded Host. Caller must
|
|
// ensure h is non-nil.
|
|
func envelopeFromHost(h *fleet.Host) *hostCacheEnvelope {
|
|
return &hostCacheEnvelope{
|
|
Host: *h,
|
|
OsqueryHostID: h.OsqueryHostID,
|
|
NodeKey: h.NodeKey,
|
|
OrbitNodeKey: h.OrbitNodeKey,
|
|
HasHostIdentityCert: h.HasHostIdentityCert,
|
|
}
|
|
}
|
|
|
|
// toHost returns a fresh *fleet.Host populated from the envelope, with the
|
|
// shadow fields copied back onto the embedded Host so downstream auth code
|
|
// reads them in their natural positions.
|
|
func (e *hostCacheEnvelope) toHost() *fleet.Host {
|
|
h := e.Host
|
|
h.OsqueryHostID = e.OsqueryHostID
|
|
h.NodeKey = e.NodeKey
|
|
h.OrbitNodeKey = e.OrbitNodeKey
|
|
h.HasHostIdentityCert = e.HasHostIdentityCert
|
|
return &h
|
|
}
|