Commit graph

14 commits

Author SHA1 Message Date
Konstantin Sykulev
6ed3ba6801
Added OTEL DB stats metrics, renamed trace attributes to expected OTEL names (#42097)
1. Added DB metrics via otelsql.RegisterDBStatsMetrics()
`db.sql.connection.open`
`db.sql.connection.max_open`
`db.sql.connection.wait`
`db.sql.connection.wait_duration`
`db.sql.connection.closed_max_idle`
`db.sql.connection.closed_max_idle_time`
`db.sql.latency.*`
2. renamed these metrics to signoz convention/expected names
`db.sql.connection.open` -> `db.client.connection.usage`
`db.sql.connection.max_open` -> `db.client.connection.max`
`db.sql.connection.wait` -> `db.client.connection.wait_count`
`db.sql.connection.wait_duration` -> `db.client.connection.wait_time`
`db.sql.connection.closed_max_idle` -> `db.client.connection.idle.max`
`db.sql.connection.closed_max_idle_time` ->
`db.client.connection.idle.min`
3. created custom dashboard to display these metrics, (import via json)
<img width="1580" height="906" alt="Screenshot 2026-03-19 at 2 44 43 PM"
src="https://github.com/user-attachments/assets/f1b64ed6-e534-4490-8955-bc1205dd21d4"
/>
4. Fixed metrics for service db dashboards
Signoz expects

`db.system` : Identifies the database type (e.g., postgresql, mysql,
mongodb).
`db.statement` : The actual query being executed (e.g., SELECT * FROM
users).
`db.operation` : The type of operation (e.g., SELECT, INSERT).
`service.name` : The name of the service making the call.

We needed to set the `db.system` attribute explicitly.

`db.operation` is missing because otelsql doesn't capture this by
default. Decided not to add this for now as the dashboards work without.
Can be a future enhancement.

<img width="1563" height="487" alt="Screenshot 2026-03-19 at 2 45 18 PM"
src="https://github.com/user-attachments/assets/51028e16-ee2c-45a9-9025-26f17b0db67a"
/>


# Checklist for submitter

## Testing
- [x] QA'd all new/changed functionality manually

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added a new observability dashboard for database and connection
performance metrics, including RPS, latency, connection pool saturation,
and queue statistics.
* Enhanced database metrics collection with automatic registration of
connection and query performance indicators.
* Standardized OpenTelemetry metric naming to align with industry
conventions for improved observability compatibility.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-03-20 11:07:58 -05:00
Victor Lyuboslavsky
8c2ad7f901
Run multiple independent Fleet dev servers in parallel (#41865)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #41848 

Docs updates: https://github.com/fleetdm/fleet/pull/41868/changes

# Checklist for submitter

- changes not needed since this is a dev environment and test issue

## Testing

- [x] QA'd all new/changed functionality manually

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Enhanced test infrastructure to support environment-variable-based
configuration for SAML, mail, database, and S3 services, enabling more
flexible and dynamic test setups.

* **Chores**
* Updated Docker Compose configuration to use environment variables for
service ports, allowing runtime customization while maintaining backward
compatibility with default values.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-03-18 13:58:58 -05:00
Victor Lyuboslavsky
bf9180e6e3
slog migration: initLogger + serve.go + cron + schedule (#40699)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #40540 

Almost done with slog migration.

# Checklist for submitter

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
  - Changes present in previous PR

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated internal logging infrastructure to use Go's standard logging
library, modernizing the logging system while maintaining existing
functionality and error handling behavior.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-27 14:29:27 -06:00
Victor Lyuboslavsky
92bc1c650e
Move PostJSONWithTimeout to platform/http package and activity cleanup (#40561)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #38536

- Moved PostJSONWithTimeout to platform/http
- Created platform/errors package with only types needed by ctxerr. This
way, ctxerr did not need to import fleethttp.
- Made activity bounded context use PostJSONWithTimeout directly
- Removed some activity types from legacy code that were no longer
needed

# Checklist for submitter

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
- Changes file `38536-new-activity-bc` already present, and this is just
cleanup from that work.

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Refactor**
* Reorganized error handling utilities for improved clarity and
decoupling.
* Consolidated HTTP utilities to centralize JSON posting functionality
with timeout support.
* Simplified activity service initialization by removing unused internal
parameters.
* Cleaned up test utilities and removed webhook-related test
scaffolding.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-26 17:39:10 -06:00
Victor Lyuboslavsky
ccc36a9cb3
Finishing mysql package migration to slog (#40350)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #40054

# Checklist for submitter

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
  - Already present in previous PR

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Migrated logging to a structured, context-aware backend for clearer,
richer diagnostics and consistent log formatting.
* Introduced broader context propagation and adjusted internal
interfaces to support the new logging approach (no end-user behavior
changes).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-24 16:52:36 -06:00
Tim Lee
3fd665e200
Order By Vulnerability (#40143) 2026-02-23 09:42:36 -07:00
Victor Lyuboslavsky
6b3bb8a961
slog migration: platform/mysql and related logic (#40072)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #40054 

# Checklist for submitter

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
  - already included in previous PR

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Standardized logging infrastructure across the database and storage
layers for improved consistency and maintainability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-19 08:27:24 -06:00
Robert Fairburn
73dba23392
Allow MySQL IAM authentication when a custom TLS CA/TLS config is set (#39808)
# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.


## Testing

- [x] Added/updated automated tests

- [x] QA'd all new/changed functionality manually

For unreleased bug fixes in a release candidate, one of:

- [x] Confirmed that the fix is not expected to adversely impact load
test results


Note: this solves https://github.com/fleetdm/fleet/issues/39832

---------

Co-authored-by: Scott Gress <scott@fleetdm.com>
Co-authored-by: Scott Gress <scottmgress@gmail.com>
2026-02-18 08:15:29 -06:00
Victor Lyuboslavsky
abaeeec6b8
Change Datastore.logger type to *logging.Logger (#39938)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #38889

This is preparatory work before incrementally converting datastore/mysql
files to directly use *slog.Logger.
This will be done by using `logger.SlogLogger()` to get the underlying
`*slog.Logger`

# Checklist for submitter

- [ ] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
  - Changes file already exists from previous PR

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated internal logging infrastructure to use a standardized platform
logging package across database and utility components. This
consolidates logging dependencies and improves system consistency
without affecting user-facing functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-17 15:29:52 -06:00
Victor Lyuboslavsky
42d5f1fda6
Improve error handling on AWS DB failover (#39841)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #39228 

Manually tested by triggering a failover on loadtest.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Health checks now detect a primary DB becoming read-only and report
failure so the service restarts and reconnects to a writable primary.
* Write failures due to DB read-only state now trigger immediate fatal
handling to prompt graceful shutdown and recovery.
* Improved detection and handling of read-only DB conditions to increase
stability during failovers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-17 07:10:52 -06:00
Victor Lyuboslavsky
a10f05486f
Added OTEL log export support (#39279)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #38607

Contributor docs update:
https://github.com/fleetdm/fleet/pull/39285/changes
Another contributor docs update:
https://github.com/fleetdm/fleet/pull/39402/changes

Also:
- renamed OtelHandler to OtelTracingHandler
- made "opentelemetry" be the default when tracing is enabled
- updated OTEL dependencies

# Checklist for submitter

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually

## New Fleet configuration settings

- [x] Setting(s) is/are explicitly excluded from GitOps

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added OpenTelemetry log export capability, enabling logs to be sent to
OpenTelemetry collectors.
* New configuration option `logging.otel_logs_enabled` (requires tracing
to be enabled).

* **Chores**
* Updated OpenTelemetry dependencies to v1.40.0 with latest OTLP
exporters and logging support.
* Updated dependencies including gRPC (v1.78.0), Google libraries, and
cryptography packages.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-02-06 18:57:28 -06:00
Victor Lyuboslavsky
07949df530
Improved OpenTelemetry error handling (#38757)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #38756 

- Changed to NOT mark many client errors as exceptions
- Instead, added client_error and server_error metrics that can be used
to alert on unusual error rates

# Checklist for submitter

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.

## Testing

- [x] Added/updated automated tests
- [x] QA'd all new/changed functionality manually

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added separate metrics for distinguishing between client and server
errors, enhancing observability and monitoring capabilities.

* **Bug Fixes**
* Client request errors no longer incorrectly appear in error tracking
as exceptions; improved accuracy of error reporting to external
services.
* Adjusted logging levels for authentication and enrollment operations
to provide clearer diagnostics.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-26 17:07:32 -06:00
Victor Lyuboslavsky
089cf9a3ba
Added LoadDefaultSchema to platform testing utils to make test setup easier. (#38706)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #38234

Minor developer experience improvement for tests using MySQL.
This is a code review follow up.
2026-01-26 16:02:12 -06:00
Victor Lyuboslavsky
506901443d
Moved common_mysql package to server/platform/mysql (#38017)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #37244

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.

## Testing

- [x] QA'd all new/changed functionality manually



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Internal MySQL utility package reorganized and all internal imports
updated to the new platform location; no changes to end-user
functionality or behavior.

* **Documentation**
* Added platform package documentation describing infrastructure
responsibilities and architectural boundaries to guide maintainers.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-01-08 13:17:19 -06:00