Commit graph

143 commits

Author SHA1 Message Date
Jorge Falcon
34cb7ab6d1
Loadtest internal alb logging and osquery-perf scaling updates (#42581)
- Configures internal alb to log to the same bucket as the public alb
- Adds support for osquery-perf task size (cpu/memory) configuration
- Updates defaults for osquery-perf extra_flags
- Updates default enroll.sh loop sleep_time from 60s -> 300s
2026-03-31 11:15:07 -04:00
Jorge Falcon
2d09916f60
Fix loadtest/infra docker_image resource (#42537)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves # N/A

- Resolves an issue that prevents some locally pulled docker images from
being pushed to ECR.
2026-03-27 01:17:37 -04:00
Jorge Falcon
42b02483d4
Dogfood & Loadtest - Updating mysql engine version to 8.0.mysql_aurora.3.10.3 (#42120)
- Bumps Dogfood and Loadtest environment Aurora MySQL engine verison
from `8.0.mysql_aurora.3.08.2` -> `8.0.mysql_aurora.3.10.3`
2026-03-19 21:05:24 -05:00
Jorge Falcon
115e00decd
Configure software_installers defaults in Loadtest terraform (#41207)
- Adds software_installers {} configuration to loadtest terraform
- Modifies template/cloudfront.tf.disabled to use pkcs#8 format for the
private key
2026-03-19 20:17:54 -04:00
Victor Lyuboslavsky
ecee908157
Bumping signoz resources for 100K hosts loadtest. (#41961) 2026-03-19 12:49:36 -05:00
Victor Lyuboslavsky
fbc5b9d8b6
Updated go to 1.26.1 (#42027)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #41749

# Checklist for submitter

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
2026-03-19 07:01:00 -05:00
Jorge Falcon
45c4e47fab
Dogfood and loadtest - mysql require secure transport on (#40211)
- Adds require_secure_transport for mysql connections to the db_cluster
parameter group for dogfood and loadtest environments.

```
    db_cluster_parameters = {
      require_secure_transport = "ON"
    }
```
2026-02-20 15:57:10 -05:00
Robert Fairburn
dac2ef18f0
Ensure terraform docker compatibility with github actions (#39988)
Co-authored-by: Jorge Falcon <22119513+BCTBB@users.noreply.github.com>
2026-02-17 15:09:50 -05:00
Robert Fairburn
9f60dadae0
Allow gzip responses (#39700) 2026-02-12 10:24:49 -06:00
Jorge Falcon
502351dcde
Add FLEET_MYSQL_READ_REPLICA_TLS_CONFIG environment variable to dogfood and loadtesting (#39692)
- Adds `FLEET_MYSQL_READ_REPLICA_TLS_CONFIG = "custom"` to dogfood and
loadtesting environments.
2026-02-11 13:05:11 -05:00
Ian Littman
d4906dd3d6
Update to Go 1.25.7 (#39584)
- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
2026-02-09 17:47:51 -06:00
Victor Lyuboslavsky
0ae909fedf
Updated loadtest OTEL config to match dogfood (#38991)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #36494

I tried this with loadtest.
2026-01-29 10:18:02 -06:00
Ian Littman
ec06952245
Bump Alpine (to 3.23.3), Go (to 1.25.6) to resolve vulns (#38973) 2026-01-28 18:51:15 -06:00
Jorge Falcon
7ac24d8752
Loadtest (new) - MDM Updates (#37420)
- Adds `FLEET_DEV_MDM_APPLE_DISABLE_PUSH = 1`
- Adds `FLEET_DEV_MDM_APPLE_DISABLE_DEVICE_INFO_CERT_VERIFY = 1`
- Updates osquery_perf/README.md, providing an example fetching and
using mdm scep challenge secret.
2025-12-17 17:55:13 -05:00
Ian Littman
62755cbd82
Bump Go to 1.25.5, Alpine to 3.23.0 where relevant, bump Trivy to current version (#36848)
Fixes vulns reported in
https://github.com/fleetdm/fleet/actions/runs/19999992703. We'll
definitely want to at least cherry-pick this.
2025-12-07 20:04:14 -06:00
Victor Lyuboslavsky
6ab79dd5a7
Add more software to loadtest (#35756)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #34677 and #35932

Adding ~450K software to the loadtest, including scripts to add more
software in the future.
Software is held in a `software.sql` file, which is used to create a
sqlite DB during osquery perf run/deployment.

# Checklist for submitter

## Testing

- [x] QA'd all new/changed functionality manually

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added support for loading software data from an external SQLite
database via a new `--software_db_path` command-line flag for more
realistic simulation scenarios.
* Added import and SQL generation tools to build and manage custom
software libraries.

* **Documentation**
* Added comprehensive README with setup instructions, tool usage, and
end-to-end workflow guidance for the software library.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-21 10:42:19 -06:00
Jorge Falcon
e0be06fa76
Loadtesting - osquery deployment session timeout increase (#36097) 2025-11-20 21:08:52 -05:00
Jorge Falcon
fb5c90ad9c
Dogfood and loadtest module updates (#35990)
Updates `module.main` to `1.18.3` (dogfood)
- Adds `memory_tracking_target_value = 70` (dogfood and loadtesting)
- Adds `cpu_tracking_target_value = 70`  (dogfood and loadtesting)

Updates `module.migrations` to `2.2.1` (dogfood)
- Adds `max_capacity`

Updates `module.logging_alb` to `1.6.2` (dogfood)

Updates `module.monitoring` to `1.8.0` (dogfood)
- Adds `log_monitoring` configuration
2025-11-19 22:04:17 -05:00
Victor Lyuboslavsky
f8ce47ec88
Grouping OTEL exceptions by type. (#35794)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #34677 

Changing OTEL set up to group exceptions by type, which is an
OTEL/industry best practice.
2025-11-19 10:24:19 -06:00
George Karr
ca5d02d471
Adding changes for Fleet v4.76.1 (#35760) 2025-11-18 14:35:31 -06:00
Victor Lyuboslavsky
3029a3ac44
Adjusting OTEL resources for high throughput. (#35878) 2025-11-18 06:53:49 -06:00
Jorge Falcon
0b0c67a5d5
Loadtest - osquery_perf scaling fixes (#35798)
- Removes timestamp from osquery_perf image
- Adds `default: 0` to loadtest osquery_perf workflow, `variable:
loadtest_containers_starting_index`
- Adds `variable: sleep_time` to loadtest osquery_perf workflow
- Adds osquery_perf docker repository in ECR
- Adds support for `sleep_time` to `enroll.sh`
- Updates terraform variables to enforce `git_branch` or `git_tag` for
osquery_perf
2025-11-17 10:21:18 -05:00
Robert Fairburn
caf9e83968
Configure osquery-perf memory and cpu (#35786) 2025-11-14 14:35:42 -06:00
Jorge Falcon
776cd67647
Loadtest - Firehose logging removal, adds filesystem logging, and module updates (#35735)
- Removes `firehose` logging from loadtesting environment
- Sets `filesystem` logging in loadtesting environment
- Updates fleet image to 4.76.0 as the default value
- Updates `migrations` and `logging_alb` modules with latest versions
2025-11-13 19:16:00 -05:00
Jorge Falcon
e2085bfd86
Loadtesting documentation - Removes (Coming Soon) from README (#35649)
- Removes `(Coming Soon)` from
`infrastructure/loadtesting/infra/README.md` with regards to deployment
via Github Actions
- Moves Signoz steps to `.header.md` to preserve steps in generated
`README.md`
2025-11-12 16:54:14 -05:00
Jorge Falcon
0471b8ce19
Loadtest - osquery_perf - Removal of fleet_image requirement (#35365)
- Adds support for `enroll.sh`, to deploy osquery_perf in batches
- Merges variables `tag` and `git_branch` into `git_tag_branch`. Only
one tag or git_branch should be specified.
  - Still used for osquery_perf to check out the correct tag/branch.
- Removes fleet_image requirement for cutting osquery_perf images

---------

Co-authored-by: Robert Fairburn <8029478+rfairburn@users.noreply.github.com>
2025-11-10 16:16:20 -05:00
Victor Lyuboslavsky
73501e5755
Infra changes after latest loadtest (#35083)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #34500

Terraform changes after my latest loadtest.

VPC consolidation: updated (and deployed) shared VPC so that Signoz
backend can now use it

  - Removed eks-vpc/ directory
  - Moved VPC management to shared/vpc.tf
  - Updated shared/init.tf to reflect VPC changes

  Infra improvements

- infra/internal_alb.tf - changed suffix from -internal to -int since I
hit max 32 characters issue

OTEL

- OTEL Collector configuration overrides for production stability
2025-11-03 11:02:15 -06:00
Victor Lyuboslavsky
072ee68eda
Updating to Go 1.25.3 (#35082) 2025-11-03 09:47:07 -06:00
Jorge Falcon
6ea9185c1c
Loadtesting - osquery_perf docker image build fixes (#34901)
- Bumps docker provider from 2.16.0 to 3.6.x
- Moves builds from `docker_registry_image` to new `docker_image`
resource
2025-10-29 08:33:46 -04:00
Robert Fairburn
1fedabe7a8
Update alpine base image to latest (#34864)
Resolves openssl:3.3.3/CVE-2025-9230 in base images.
2025-10-28 11:24:05 -05:00
Robert Fairburn
30c4798ec6
Switch git providers for loadtesting tf (#34180)
untested end-to-end but works as a replacement for plans and doesn't
require a local arm64 build to work.

Co-authored-by: Jorge Falcon <22119513+BCTBB@users.noreply.github.com>
2025-10-23 14:53:13 -04:00
Victor Lyuboslavsky
e4e3c3f9ff
Fix issues with OTEL SigNoz deployments for loadtests (#34694)
SigNoz converted from child module to standalone root module with
independent state.

  **Critical Impact**

  Deployment order is now required:
  1. Deploy infrastructure/loadtesting/terraform/signoz/ FIRST
  2. Then deploy infrastructure/loadtesting/terraform/infra/

  Communication between modules via Terraform remote state.

  **Key Configuration Changes**

  - SigNoz creates its own EKS cluster: signoz-${workspace}
- Instance type: t3.xlarge (upgraded from t3.large for resource
headroom)
  - ClickHouse disk: 200Gi (was 20Gi) with 2-day retention
  - Resource limits configured to prevent OOMKills during loadtest
  - wait_for_jobs = false to avoid Helm deployment deadlock


<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331
2025-10-23 12:49:36 -05:00
Victor Lyuboslavsky
aef9b8400c
Added terraform files for Signoz OTEL backend. (#34058)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331 

This PR allows us to run loadtest with SigNoz OTEL backend by adding
`-var=enable_otel=true`
SigNoz is deployed via Helm chart.

Enhancements needed (in future PR):
- put SigNoz UI behind VPN
- combine the new eks-vpc with shared fleet-vpc
- make SigNoz shared, so multiple loadtests use the same instance? (But
what about updating to it to latest version?)

Next steps:
- Enable SigNoz in Dogfood environment
- SigNoz by default [keeps 15 days of logs and
traces](https://signoz.io/docs/userguide/retention-period), which is
quite a bit. How much would that cost us and should we reduce it?

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Optional OpenTelemetry tracing with SigNoz via a new enable_otel flag.
- Conditional deployment of a SigNoz stack (managed EKS, storage,
Helm-based apps) with internal OTLP collector endpoint.
- New outputs to retrieve OTLP endpoint, cluster name, and a kubectl
configuration command.

- Documentation
  - Added guidance for deploying and using SigNoz with load testing.
  - Updated examples to include -var=enable_otel=true.

- Chores
- Introduced required providers to support Helm and Kubernetes
resources.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-10-10 21:53:04 -05:00
Jorge Falcon
c0f753cb83
Updated permissions for GHA role - load test environment (#34059)
* Fixes missing STS permission on the load test environment GHA role
2025-10-09 15:10:52 -04:00
Jorge Falcon
e952ef06c0
Loadtesting IAC updates (#32629)
# Github Actions (New)
- New workflow to deploy/destroy loadtest infrastructure with one-click
(Needs to be tested)
- Common inputs drive configuration and deployment of loadtest
infrastructure
    - tag
    - fleet_task_count
    - fleet_task_memory
    - fleet_task_cpu
    - fleet_database_instance_size
    - fleet_database_instance_count
    - fleet_redis_instance_size
    - fleet_redis_instance_count
    - terraform_workspace
    - terraform_action
- New workflow to deploy/destroy osquery-perf to loadtest infrastructure
with one-click (Needs to be tested)
- Common inputs drive configuration and deployment of osquery-perf
resources
    - tag
    - git_branch
    - loadtest_containers
    - extra_flags
    - terraform_workspace
    - terraform_action
- New workflow to deploy shared loadtest resources with one-click (Needs
to be tested)

# Loadtest Infrastructure (New)
- New directory (`infrastructure/loadtesting/terraform/infra`) for
one-click deployment
- Loadtest environment updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/infra/README.md)
to reflect new steps

# Osquery-perf deployment (New)
- New directory (`infrastructure/loadtesting/terraform/osquery-perf`)
for the deployment of osquery-perf
- osquery-perf updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/osquery_perf)
to reflect new steps
2025-10-08 15:31:37 -04:00
Konstantin Sykulev
9e5c632c4c
Updating osquery perf loadtest infrastructure (#34003)
Bumping memory and cpu on aws load test containers Creating multiple ecs
services with a single task. This allows us to specify different
settings per osquery perf container/task.

**Related issue:** No issue.
2025-10-08 13:28:33 -05:00
Victor Lyuboslavsky
abc912bd03
Updated go to 1.25.1 (#32833) 2025-09-11 18:31:39 -05:00
Lucas Manuel Rodriguez
d849e01add
Update Go to 1.24.6 (#31784)
Ran
```
make update-go version=1.24.6
```
And then updated the `sha256`s manually in the Dockerfiles.

Fixes https://nvd.nist.gov/vuln/detail/CVE-2025-47907
```
Cancelling a query (e.g. by cancelling the context passed to one of the query methods) during a call
to the Scan method of the returned Rows can result in unexpected results if other queries are being
made in parallel. This can result in a race condition that may overwrite the expected results with those
of another query, causing the call to Scan to return either unexpected results from the other
query or an error.
```
2025-08-12 08:10:05 -03:00
Jorge Falcon
9618d72b54
Loadtesting MySQL engine_version update (#31351)
- MySQL engine version bumped from 8.0.mysql_aurora.3.07.1 ->
8.0.mysql_aurora.3.08.2
2025-07-29 12:02:49 -04:00
Janis Watts
7085ad2a74
Update enable cloudfront directions (#31152)
Just a couple small changes to help with the instructions
2025-07-22 16:31:12 -05:00
Jorge Falcon
dcf68ccd09
Loadtesting - Cloudfront iam fix (#31145)
- Added missed IAM permission for tasks to access cloudfront secret
2025-07-22 15:07:26 -04:00
Jorge Falcon
3a112afdb6
Loadtesting - Enable Cloudfront (#31073)
# Added
- Added kms.tf to support encrypting keys, specifically cloudfront keys.
- Added template/cloudfront.tf.disabled for use in enabling cloudfront.-
Modified ecs-iam.tf to support log-alb.tf, cloudfront.tf policies that
are injected into `local.extra_execution_iam_policies` and `local.iam`.
- Added log-alb.tf to enable logging alb, required by cloudfront.tf.

# Changed
- Modified ecs.tf to support adding of additional secrets from
`local.secrets`.
- Modified firehose.tf to support provider required updates for
deprecated resource configurations.
- Modified init.tf to support `> v5.0` of `hashicorp/aws` provider.
- Modified locals.tf to add `extra_execution_iam_policies`, `iam`,
`software_installers_kms_policy`, `extra_secrets`, secrets, and
`cloudfront_key_basename`, to support cloudfront.
- Modified readme.md with instructions on how to enable cloudfront.tf
- Modified redis.tf to support provider required updates for deprecated
resource configurations
- Modified s3.tf to support kms keys and add kms iam.
- Modified terraform version in .github/workflows/tfvalidate.yml - 1.9.0
-> 1.10.4
2025-07-21 16:41:06 -04:00
Jorge Falcon
91cedf039d
Allow Loadtesting environment non-empty s3 bucket cleanup on terraform destroy (#30899)
* Modified resource aws_s3_bucket blocks to include `force_destroy =
true` in firehose.tf and s3.tf.
2025-07-16 12:15:27 -04:00
jacobshandling
555ae5441e
Update Go to 1.24.5 (#30770)
## #30730 
- Update Go version
- Update the docs for this process
- Confirmed `fleet`, `fleetctl`, and related docker images build
successfully
- Note that failing tests are unrelated: see [Slack
thread](https://fleetdm.slack.com/archives/C019WG4GH0A/p1752175318523689)

---------

Co-authored-by: Jacob Shandling <jacob@fleetdm.com>
2025-07-15 10:59:17 -07:00
Lucas Manuel Rodriguez
5646062c85
Update go to 1.24.4 and add some automation (#29954)
Fixes CVE-2025-22874 reported by
https://github.com/fleetdm/fleet/actions/runs/15601368321/job/43941793647.

(IMO not a critical CVE, so it doesn't need to be cherry-picked into
v4.69.0.)

Added automation to make this easier next time.
2025-06-13 13:08:14 -05:00
Janis Watts
d1dbdfb0e0
Update load test instructions for migration testing (#29347)
Added additional information for performing loadtest migrations for
minor releases.
2025-05-22 10:05:05 -05:00
Lucas Manuel Rodriguez
bfe3b186d3
Fix detected CVEs and docker scout exit code to fail the Github Action (#28836)
For #28837.

Fixing this all of this because we got multiple reports from the
community and customers and these were also detected by Amazon
Inspector.

- Fixes CVE-2025-22871 by upgrading Go from 1.24.1 to 1.24.2.
- `docker scout` now fails the daily scheduled action if there are
CRITICAL,HIGH CVEs (we missed setting `exit-code: true`).
- Report CVE-2025-46569 as not affected by it because of our use of
OPA's go package.
- Report CVE-2024-8260 as not affected by it because Fleet doesn't run
on Windows.
- The `security/status.md` shows a lot of changes because we are now
sorting CVEs so that newest come first.

---

- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Committing-Changes.md#changes-files)
for more information.
- [ ] Manual QA for all new/changed functionality
- For Orbit and Fleet Desktop changes:
- [ ] Make sure fleetd is compatible with the latest released version of
Fleet (see [Must
rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/fleetd-development-and-release-strategy.md)).
- [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit
feature/bugfix should only apply to one platform (`runtime.GOOS`).
- [ ] Manual QA must be performed in the three main OSs, macOS, Windows
and Linux.
- [ ] Auto-update manual QA, from released version of component to new
version (see [tools/tuf/test](../tools/tuf/test/README.md)).
- [ ] For unreleased bug fixes in a release candidate, confirmed that
the fix is not expected to adversely impact load test results or alerted
the release DRI if additional load testing is needed.
2025-05-06 13:35:27 -03:00
Scott Gress
59f96651b6
Update to Go 1.24.1 (#27506)
For #26713 

# Details

This PR updates Fleet and its related tools and binaries to use Go
version 1.24.1.

Scanning through the changelog, I didn't see anything relevant to Fleet
that requires action. The only possible breaking change I spotted was:

> As [announced](https://tip.golang.org/doc/go1.23#linux) in the Go 1.23
release notes, Go 1.24 requires Linux kernel version 3.2 or later.

Linux kernel 3.2 was released in January of 2012, so I think we can
commit to dropping support for earlier kernel versions.

The new [tools directive](https://tip.golang.org/doc/go1.24#tools) is
interesting as it means we can move away from using `tools.go` files,
but it's not a required update.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
- [x] Manual QA for all new/changed functionality
- For Orbit and Fleet Desktop changes:
- [X] Make sure fleetd is compatible with the latest released version of
Fleet
   - [x] Orbit runs on macOS  , Linux   and Windows. 
- [x] Manual QA must be performed in the three main OSs, macOS ,
Windows and Linux .
2025-03-31 11:14:09 -05:00
Robert Fairburn
3e3b773e38
Add athena to loadtesting (#27437) 2025-03-24 11:55:28 -05:00
Lucas Manuel Rodriguez
ae00add76e
Update alpine to patch vulnerability with severity "HIGH" (#26593)
The vulnerability was posted by a prospect.

Posting manual command until we get #25902 done.
```sh
trivy image --ignore-unfixed --pkg-types os,library --severity CRITICAL,HIGH --show-suppressed fleetdm/fleet:v4.64.1
[...]
fleetdm/fleet:v4.64.1 (alpine 3.21.0)

Total: 2 (HIGH: 2, CRITICAL: 0)

┌────────────┬────────────────┬──────────┬────────┬───────────────────┬───────────────┬──────────────────────────────────────────────────────────┐
│  Library   │ Vulnerability  │ Severity │ Status │ Installed Version │ Fixed Version │                          Title                           │
├────────────┼────────────────┼──────────┼────────┼───────────────────┼───────────────┼──────────────────────────────────────────────────────────┤
│ libcrypto3 │ CVE-2024-12797 │ HIGH     │ fixed  │ 3.3.2-r4          │ 3.3.3-r0      │ openssl: RFC7250 handshakes with unauthenticated servers │
│            │                │          │        │                   │               │ don't abort as expected                                  │
│            │                │          │        │                   │               │ https://avd.aquasec.com/nvd/cve-2024-12797               │
├────────────┤                │          │        │                   │               │                                                          │
│ libssl3    │                │          │        │                   │               │                                                          │
│            │                │          │        │                   │               │                                                          │
│            │                │          │        │                   │               │                                                          │
└────────────┴────────────────┴──────────┴────────┴───────────────────┴───────────────┴──────────────────────────────────────────────────────────┘
```
2025-02-25 18:33:24 -03:00