fleet/changes/43034-optimize-policy-queries-for-host
Victor Lyuboslavsky 2ddc2ae90a
Optimized PolicyQueriesForHost and ListPoliciesForHost SQL queries (#43035)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #43034

## Before (correlated subqueries):

The old query scans the policies table and for each policy row, MySQL
executes up to 3 separate subqueries against policy_labels +
label_membership:

```sql
  -- For EACH policy row p:

  -- Subquery 1: Does this policy have any include labels?
  NOT EXISTS (
      SELECT 1 FROM policy_labels pl
      WHERE pl.policy_id = p.id AND pl.exclude = 0
  )

  -- Subquery 2: Is the host in at least one include label?
  OR EXISTS (
      SELECT 1 FROM policy_labels pl
      INNER JOIN label_membership lm ON (lm.host_id = ? AND lm.label_id = pl.label_id)
      WHERE pl.policy_id = p.id AND pl.exclude = 0
  )

  -- Subquery 3: Is the host in any exclude label?
  AND NOT EXISTS (
      SELECT 1 FROM policy_labels pl
      INNER JOIN label_membership lm ON (lm.host_id = ? AND lm.label_id = pl.label_id)
      WHERE pl.policy_id = p.id AND pl.exclude = 1
  )
  ```

  With 200 policies, MySQL executes up to 600 subquery probes into policy_labels and label_membership.

## After (single aggregated LEFT JOIN):

The new query first builds one aggregated result set from policy_labels + label_membership for this host, grouped by policy_id, then joins it once:

```sql
  LEFT JOIN (
      SELECT pl.policy_id,
MAX(CASE WHEN pl.exclude = 0 THEN 1 ELSE 0 END) AS has_include_labels,
MAX(CASE WHEN pl.exclude = 0 AND lm.host_id IS NOT NULL THEN 1 ELSE 0
END) AS host_in_include,
MAX(CASE WHEN pl.exclude = 1 AND lm.host_id IS NOT NULL THEN 1 ELSE 0
END) AS host_in_exclude
      FROM policy_labels pl
LEFT JOIN label_membership lm ON lm.label_id = pl.label_id AND
lm.host_id = ?
      GROUP BY pl.policy_id
  ) pl_agg ON pl_agg.policy_id = p.id
```

  The subquery scans policy_labels once, LEFT JOINs to label_membership for the specific host, and aggregates per policy. Each policy gets three booleans:
  - has_include_labels: 1 if any policy_labels row with exclude=0 exists
  - host_in_include: 1 if any include label row matched a label_membership row for this host
  - host_in_exclude: 1 if any exclude label row matched a label_membership row for this host

  Then the WHERE clause uses these:
```sql
(COALESCE(pl_agg.has_include_labels, 0) = 0 OR pl_agg.host_in_include =
1)
  AND COALESCE(pl_agg.host_in_exclude, 0) = 0
```

The COALESCE handles policies with no policy_labels rows at all (the LEFT JOIN produces NULL).

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`.

## Testing

- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

## Release Notes

* **Refactor**
  * Optimized database query efficiency for policy operations, delivering approximately 77% faster query execution at scale while improving support for label-based policy scoping.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-06 08:50:18 -05:00

1 line
213 B
Text

* Optimized `PolicyQueriesForHost` and `ListPoliciesForHost` SQL queries by replacing correlated subqueries with a single aggregated LEFT JOIN for label-based policy scoping, reducing query time by ~77% at scale.