fleet/changes/32313-otel-improvements
Victor Lyuboslavsky 84e45f6fa1
OpenTelemetry minor improvements (#32324)
Fixes #32313

  OpenTelemetry Tracing

- Added tracing to async task collectors: FlushHostsLastSeen,
collectHostsLastSeen, collectLabelQueryExecutions,
collectPolicyQueryExecutions, collectScheduledQueryStats
- Updated HTTP middleware to use OTEL semantic convention for span names
({method} {route})
  - Added OTELEnabled() helper to FleetConfig

  Optimizations

- Reduced OTEL batch size from 512 to 256 spans to prevent gRPC message
size errors
  - Enabled gzip compression for trace exports

NOTE: I tried to improve OTEL instrumentation for cron jobs, but it got
too complicated due to goroutines in `schedule.go` so that effort should
be separate. We do have SQL instrumentation for cron jobs, but we are
missing root spans for cron jobs as a whole.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.

## Testing

- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Expanded OpenTelemetry tracing for async tasks (host last seen, label
membership, policy membership, scheduled query stats) to provide richer
observability.
* More descriptive HTTP span names using “METHOD /route” for clearer
trace analysis.

* **Bug Fixes**
* Improved OTLP gRPC exporter reliability by enabling gzip compression
and reducing export batch size, mitigating intermittent gRPC errors.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-08-28 19:32:46 -05:00

1 line
203 B
Text

Minor OpenTelemetry improvements: added tracing to async tasks (host seen, labels, policies, query stats). Improved HTTP span naming, enabled gzip compression, reduced batch size to prevent gRPC errors.