- Enable `standard` RDS database performance insights
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Chores**
* Enhanced database monitoring capabilities by enabling Database
Insights for load testing infrastructure.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Disable performance insights
- Allow redis instance count >=1
- Properly set ecs_cluster logging config path
- Targeted apply with auto approve for pre-creating fleet and execution
roles
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Enhanced ECS cluster logging with CloudWatch integration and extended
log retention to 365 days.
* Adjusted RDS monitoring configuration and disabled performance
insights for operational optimization.
* Reduced minimum Redis instance requirement from 3 to 1 for greater
deployment flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Resolves#43671.
Bumps the Alpine base image from 3.23.3 to 3.23.4 in the Dockerfiles
that produce published images, picking up patched openssl, musl, and
zlib packages. Follows the same pattern as #38977.
### CVEs resolved
- HIGH: CVE-2026-28388, CVE-2026-28389, CVE-2026-28390, CVE-2026-31790,
CVE-2026-2673, CVE-2026-40200
- MEDIUM: CVE-2026-27171, CVE-2026-6042, CVE-2026-22184
### Test plan
- CI image build passes.
- Trivy/ECR scan on the resulting fleetdm/fleet image confirms the nine
listed CVEs are gone.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated Docker base images to Alpine 3.23.4 across infrastructure and
deployment components for improved stability and security.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Golang 1.26.2 has been released. It fixes some CVEs:
https://github.com/golang/go/issues?q=milestone%3AGo1.26.2+label%3ACherryPickApproved
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Updated Go toolchain to 1.26.2 across the repository and build
configs.
* Updated Docker build images to use Go 1.26.2.
* Expanded the set of tracked modules for the Go version update so
additional module files are included in automated updates.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Updates wording in `.github/workflows/loadtest-osquery-perf.yml`
- `4098` -> `4096`
- Removes: `(should be a multiple of 8, if setting
loadtest_containers_starting_index)`
- Updates `infrastructure/loadtesting/terraform/osquery_perf/enroll.sh`
to handle values that are not multiples of 8. If the value is not a
multiple of 8, logic has been added to apply the remainder.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Documentation**
* Updated load testing workflow configuration input descriptions for
improved clarity of parameters and their usage examples.
* **Bug Fixes**
* Fixed container count allocation logic in the load testing process to
ensure the final target count is always properly applied, even when
using increment values that don't divide evenly into the specified total
range.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Configures internal alb to log to the same bucket as the public alb
- Adds support for osquery-perf task size (cpu/memory) configuration
- Updates defaults for osquery-perf extra_flags
- Updates default enroll.sh loop sleep_time from 60s -> 300s
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves # N/A
- Resolves an issue that prevents some locally pulled docker images from
being pushed to ECR.
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#41749
# Checklist for submitter
- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
- Adds require_secure_transport for mysql connections to the db_cluster
parameter group for dogfood and loadtest environments.
```
db_cluster_parameters = {
require_secure_transport = "ON"
}
```
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#34677 and #35932
Adding ~450K software to the loadtest, including scripts to add more
software in the future.
Software is held in a `software.sql` file, which is used to create a
sqlite DB during osquery perf run/deployment.
# Checklist for submitter
## Testing
- [x] QA'd all new/changed functionality manually
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added support for loading software data from an external SQLite
database via a new `--software_db_path` command-line flag for more
realistic simulation scenarios.
* Added import and SQL generation tools to build and manage custom
software libraries.
* **Documentation**
* Added comprehensive README with setup instructions, tool usage, and
end-to-end workflow guidance for the software library.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#34677
Changing OTEL set up to group exceptions by type, which is an
OTEL/industry best practice.
- Removes timestamp from osquery_perf image
- Adds `default: 0` to loadtest osquery_perf workflow, `variable:
loadtest_containers_starting_index`
- Adds `variable: sleep_time` to loadtest osquery_perf workflow
- Adds osquery_perf docker repository in ECR
- Adds support for `sleep_time` to `enroll.sh`
- Updates terraform variables to enforce `git_branch` or `git_tag` for
osquery_perf
- Removes `firehose` logging from loadtesting environment
- Sets `filesystem` logging in loadtesting environment
- Updates fleet image to 4.76.0 as the default value
- Updates `migrations` and `logging_alb` modules with latest versions
- Removes `(Coming Soon)` from
`infrastructure/loadtesting/infra/README.md` with regards to deployment
via Github Actions
- Moves Signoz steps to `.header.md` to preserve steps in generated
`README.md`
- Adds support for `enroll.sh`, to deploy osquery_perf in batches
- Merges variables `tag` and `git_branch` into `git_tag_branch`. Only
one tag or git_branch should be specified.
- Still used for osquery_perf to check out the correct tag/branch.
- Removes fleet_image requirement for cutting osquery_perf images
---------
Co-authored-by: Robert Fairburn <8029478+rfairburn@users.noreply.github.com>
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#34500
Terraform changes after my latest loadtest.
VPC consolidation: updated (and deployed) shared VPC so that Signoz
backend can now use it
- Removed eks-vpc/ directory
- Moved VPC management to shared/vpc.tf
- Updated shared/init.tf to reflect VPC changes
Infra improvements
- infra/internal_alb.tf - changed suffix from -internal to -int since I
hit max 32 characters issue
OTEL
- OTEL Collector configuration overrides for production stability
untested end-to-end but works as a replacement for plans and doesn't
require a local arm64 build to work.
Co-authored-by: Jorge Falcon <22119513+BCTBB@users.noreply.github.com>
SigNoz converted from child module to standalone root module with
independent state.
**Critical Impact**
Deployment order is now required:
1. Deploy infrastructure/loadtesting/terraform/signoz/ FIRST
2. Then deploy infrastructure/loadtesting/terraform/infra/
Communication between modules via Terraform remote state.
**Key Configuration Changes**
- SigNoz creates its own EKS cluster: signoz-${workspace}
- Instance type: t3.xlarge (upgraded from t3.large for resource
headroom)
- ClickHouse disk: 200Gi (was 20Gi) with 2-day retention
- Resource limits configured to prevent OOMKills during loadtest
- wait_for_jobs = false to avoid Helm deployment deadlock
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#32331
<!-- Add the related story/sub-task/bug number, like Resolves#123, or
remove if NA -->
**Related issue:** Resolves#32331
This PR allows us to run loadtest with SigNoz OTEL backend by adding
`-var=enable_otel=true`
SigNoz is deployed via Helm chart.
Enhancements needed (in future PR):
- put SigNoz UI behind VPN
- combine the new eks-vpc with shared fleet-vpc
- make SigNoz shared, so multiple loadtests use the same instance? (But
what about updating to it to latest version?)
Next steps:
- Enable SigNoz in Dogfood environment
- SigNoz by default [keeps 15 days of logs and
traces](https://signoz.io/docs/userguide/retention-period), which is
quite a bit. How much would that cost us and should we reduce it?
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- New Features
- Optional OpenTelemetry tracing with SigNoz via a new enable_otel flag.
- Conditional deployment of a SigNoz stack (managed EKS, storage,
Helm-based apps) with internal OTLP collector endpoint.
- New outputs to retrieve OTLP endpoint, cluster name, and a kubectl
configuration command.
- Documentation
- Added guidance for deploying and using SigNoz with load testing.
- Updated examples to include -var=enable_otel=true.
- Chores
- Introduced required providers to support Helm and Kubernetes
resources.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
# Github Actions (New)
- New workflow to deploy/destroy loadtest infrastructure with one-click
(Needs to be tested)
- Common inputs drive configuration and deployment of loadtest
infrastructure
- tag
- fleet_task_count
- fleet_task_memory
- fleet_task_cpu
- fleet_database_instance_size
- fleet_database_instance_count
- fleet_redis_instance_size
- fleet_redis_instance_count
- terraform_workspace
- terraform_action
- New workflow to deploy/destroy osquery-perf to loadtest infrastructure
with one-click (Needs to be tested)
- Common inputs drive configuration and deployment of osquery-perf
resources
- tag
- git_branch
- loadtest_containers
- extra_flags
- terraform_workspace
- terraform_action
- New workflow to deploy shared loadtest resources with one-click (Needs
to be tested)
# Loadtest Infrastructure (New)
- New directory (`infrastructure/loadtesting/terraform/infra`) for
one-click deployment
- Loadtest environment updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/infra/README.md)
to reflect new steps
# Osquery-perf deployment (New)
- New directory (`infrastructure/loadtesting/terraform/osquery-perf`)
for the deployment of osquery-perf
- osquery-perf updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/osquery_perf)
to reflect new steps
Bumping memory and cpu on aws load test containers Creating multiple ecs
services with a single task. This allows us to specify different
settings per osquery perf container/task.
**Related issue:** No issue.
Ran
```
make update-go version=1.24.6
```
And then updated the `sha256`s manually in the Dockerfiles.
Fixes https://nvd.nist.gov/vuln/detail/CVE-2025-47907
```
Cancelling a query (e.g. by cancelling the context passed to one of the query methods) during a call
to the Scan method of the returned Rows can result in unexpected results if other queries are being
made in parallel. This can result in a race condition that may overwrite the expected results with those
of another query, causing the call to Scan to return either unexpected results from the other
query or an error.
```
# Added
- Added kms.tf to support encrypting keys, specifically cloudfront keys.
- Added template/cloudfront.tf.disabled for use in enabling cloudfront.-
Modified ecs-iam.tf to support log-alb.tf, cloudfront.tf policies that
are injected into `local.extra_execution_iam_policies` and `local.iam`.
- Added log-alb.tf to enable logging alb, required by cloudfront.tf.
# Changed
- Modified ecs.tf to support adding of additional secrets from
`local.secrets`.
- Modified firehose.tf to support provider required updates for
deprecated resource configurations.
- Modified init.tf to support `> v5.0` of `hashicorp/aws` provider.
- Modified locals.tf to add `extra_execution_iam_policies`, `iam`,
`software_installers_kms_policy`, `extra_secrets`, secrets, and
`cloudfront_key_basename`, to support cloudfront.
- Modified readme.md with instructions on how to enable cloudfront.tf
- Modified redis.tf to support provider required updates for deprecated
resource configurations
- Modified s3.tf to support kms keys and add kms iam.
- Modified terraform version in .github/workflows/tfvalidate.yml - 1.9.0
-> 1.10.4
## #30730
- Update Go version
- Update the docs for this process
- Confirmed `fleet`, `fleetctl`, and related docker images build
successfully
- Note that failing tests are unrelated: see [Slack
thread](https://fleetdm.slack.com/archives/C019WG4GH0A/p1752175318523689)
---------
Co-authored-by: Jacob Shandling <jacob@fleetdm.com>