Commit graph

485 commits

Author SHA1 Message Date
Robert Fairburn
ffe6df25be
Signoz action fixes (#38656) 2026-01-22 19:10:44 -06:00
Robert Fairburn
e0631aff76
Dogfood signoz (#38569) 2026-01-22 12:33:27 -06:00
George Karr
68452b8a1d
Adding changes for Fleet v4.79.1 (#38487) 2026-01-21 12:18:49 -06:00
Luke Heath
d0fd8e06e9
update main 4.79.0 changes (#38253) 2026-01-13 14:12:17 -06:00
George Karr
c2a913a4c7
Adding changes for Fleet v4.78.3 (#38201) 2026-01-13 14:01:48 -06:00
George Karr
d820f800c6
Adding changes for Fleet v4.78.2 (#38150) 2026-01-10 21:24:15 -06:00
George Karr
dc5f1cb753
Adding changes for Fleet v4.78.1 (#37874)
Co-authored-by: Luke Heath <luke@fleetdm.com>
2026-01-06 16:54:45 -06:00
Luke Heath
8648105fe3
Adding changes for Fleet v4.78.0 (#36813) (#37584) 2025-12-19 17:25:22 -06:00
Jorge Falcon
7ac24d8752
Loadtest (new) - MDM Updates (#37420)
- Adds `FLEET_DEV_MDM_APPLE_DISABLE_PUSH = 1`
- Adds `FLEET_DEV_MDM_APPLE_DISABLE_DEVICE_INFO_CERT_VERIFY = 1`
- Updates osquery_perf/README.md, providing an example fetching and
using mdm scep challenge secret.
2025-12-17 17:55:13 -05:00
Jorge Falcon
3247b86aa0
Dogfood FLEET_SERVER_VPP_VERIFY_TIMEOUT and Inconsistent Plan Fix (#37163)
- Adds `FLEET_SERVER_VPP_VERIFY_TIMEOUT = "20m"`
- Updates `mysql_password_secret_name` to search by a string, rather
than the ID being forced by the AWS Terraform provider (v6) to Fix
inconsistent final plan errors
2025-12-12 13:00:12 -05:00
Luke Heath
70ab8c2925
Adding changes for Fleet v4.77.0 (#35382) (#36614) 2025-12-08 16:32:47 -06:00
Ian Littman
62755cbd82
Bump Go to 1.25.5, Alpine to 3.23.0 where relevant, bump Trivy to current version (#36848)
Fixes vulns reported in
https://github.com/fleetdm/fleet/actions/runs/19999992703. We'll
definitely want to at least cherry-pick this.
2025-12-07 20:04:14 -06:00
Jorge Falcon
e8c3e26d60
cloudfront module update for dogfood to 1.1.0 (#36548)
- Updates dogfood cloudfront-software-installers module to 1.1.0
2025-12-01 20:08:33 -05:00
Victor Lyuboslavsky
6ab79dd5a7
Add more software to loadtest (#35756)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #34677 and #35932

Adding ~450K software to the loadtest, including scripts to add more
software in the future.
Software is held in a `software.sql` file, which is used to create a
sqlite DB during osquery perf run/deployment.

# Checklist for submitter

## Testing

- [x] QA'd all new/changed functionality manually

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added support for loading software data from an external SQLite
database via a new `--software_db_path` command-line flag for more
realistic simulation scenarios.
* Added import and SQL generation tools to build and manage custom
software libraries.

* **Documentation**
* Added comprehensive README with setup instructions, tool usage, and
end-to-end workflow guidance for the software library.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-21 10:42:19 -06:00
Jorge Falcon
e0be06fa76
Loadtesting - osquery deployment session timeout increase (#36097) 2025-11-20 21:08:52 -05:00
Jorge Falcon
fb5c90ad9c
Dogfood and loadtest module updates (#35990)
Updates `module.main` to `1.18.3` (dogfood)
- Adds `memory_tracking_target_value = 70` (dogfood and loadtesting)
- Adds `cpu_tracking_target_value = 70`  (dogfood and loadtesting)

Updates `module.migrations` to `2.2.1` (dogfood)
- Adds `max_capacity`

Updates `module.logging_alb` to `1.6.2` (dogfood)

Updates `module.monitoring` to `1.8.0` (dogfood)
- Adds `log_monitoring` configuration
2025-11-19 22:04:17 -05:00
Victor Lyuboslavsky
f8ce47ec88
Grouping OTEL exceptions by type. (#35794)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #34677 

Changing OTEL set up to group exceptions by type, which is an
OTEL/industry best practice.
2025-11-19 10:24:19 -06:00
George Karr
ca5d02d471
Adding changes for Fleet v4.76.1 (#35760) 2025-11-18 14:35:31 -06:00
Victor Lyuboslavsky
3029a3ac44
Adjusting OTEL resources for high throughput. (#35878) 2025-11-18 06:53:49 -06:00
Jorge Falcon
0b0c67a5d5
Loadtest - osquery_perf scaling fixes (#35798)
- Removes timestamp from osquery_perf image
- Adds `default: 0` to loadtest osquery_perf workflow, `variable:
loadtest_containers_starting_index`
- Adds `variable: sleep_time` to loadtest osquery_perf workflow
- Adds osquery_perf docker repository in ECR
- Adds support for `sleep_time` to `enroll.sh`
- Updates terraform variables to enforce `git_branch` or `git_tag` for
osquery_perf
2025-11-17 10:21:18 -05:00
Robert Fairburn
caf9e83968
Configure osquery-perf memory and cpu (#35786) 2025-11-14 14:35:42 -06:00
Jorge Falcon
776cd67647
Loadtest - Firehose logging removal, adds filesystem logging, and module updates (#35735)
- Removes `firehose` logging from loadtesting environment
- Sets `filesystem` logging in loadtesting environment
- Updates fleet image to 4.76.0 as the default value
- Updates `migrations` and `logging_alb` modules with latest versions
2025-11-13 19:16:00 -05:00
Jorge Falcon
e2085bfd86
Loadtesting documentation - Removes (Coming Soon) from README (#35649)
- Removes `(Coming Soon)` from
`infrastructure/loadtesting/infra/README.md` with regards to deployment
via Github Actions
- Moves Signoz steps to `.header.md` to preserve steps in generated
`README.md`
2025-11-12 16:54:14 -05:00
Jorge Falcon
0471b8ce19
Loadtest - osquery_perf - Removal of fleet_image requirement (#35365)
- Adds support for `enroll.sh`, to deploy osquery_perf in batches
- Merges variables `tag` and `git_branch` into `git_tag_branch`. Only
one tag or git_branch should be specified.
  - Still used for osquery_perf to check out the correct tag/branch.
- Removes fleet_image requirement for cutting osquery_perf images

---------

Co-authored-by: Robert Fairburn <8029478+rfairburn@users.noreply.github.com>
2025-11-10 16:16:20 -05:00
Luke Heath
0056d36d81
Adding changes for Fleet v4.76.0 (#34486) (#35380) 2025-11-07 19:19:12 -06:00
Robert Fairburn
12a19cbb42
Add fleet try account to state bucket (#35092) 2025-11-03 11:18:29 -06:00
Victor Lyuboslavsky
73501e5755
Infra changes after latest loadtest (#35083)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #34500

Terraform changes after my latest loadtest.

VPC consolidation: updated (and deployed) shared VPC so that Signoz
backend can now use it

  - Removed eks-vpc/ directory
  - Moved VPC management to shared/vpc.tf
  - Updated shared/init.tf to reflect VPC changes

  Infra improvements

- infra/internal_alb.tf - changed suffix from -internal to -int since I
hit max 32 characters issue

OTEL

- OTEL Collector configuration overrides for production stability
2025-11-03 11:02:15 -06:00
Victor Lyuboslavsky
072ee68eda
Updating to Go 1.25.3 (#35082) 2025-11-03 09:47:07 -06:00
Jorge Falcon
6ea9185c1c
Loadtesting - osquery_perf docker image build fixes (#34901)
- Bumps docker provider from 2.16.0 to 3.6.x
- Moves builds from `docker_registry_image` to new `docker_image`
resource
2025-10-29 08:33:46 -04:00
Robert Fairburn
1fedabe7a8
Update alpine base image to latest (#34864)
Resolves openssl:3.3.3/CVE-2025-9230 in base images.
2025-10-28 11:24:05 -05:00
Robert Fairburn
30c4798ec6
Switch git providers for loadtesting tf (#34180)
untested end-to-end but works as a replacement for plans and doesn't
require a local arm64 build to work.

Co-authored-by: Jorge Falcon <22119513+BCTBB@users.noreply.github.com>
2025-10-23 14:53:13 -04:00
Victor Lyuboslavsky
e4e3c3f9ff
Fix issues with OTEL SigNoz deployments for loadtests (#34694)
SigNoz converted from child module to standalone root module with
independent state.

  **Critical Impact**

  Deployment order is now required:
  1. Deploy infrastructure/loadtesting/terraform/signoz/ FIRST
  2. Then deploy infrastructure/loadtesting/terraform/infra/

  Communication between modules via Terraform remote state.

  **Key Configuration Changes**

  - SigNoz creates its own EKS cluster: signoz-${workspace}
- Instance type: t3.xlarge (upgraded from t3.large for resource
headroom)
  - ClickHouse disk: 200Gi (was 20Gi) with 2-day retention
  - Resource limits configured to prevent OOMKills during loadtest
  - wait_for_jobs = false to avoid Helm deployment deadlock


<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331
2025-10-23 12:49:36 -05:00
George Karr
304d581d87
Adding changes for Fleet v4.75.1 (#34571) 2025-10-22 10:11:58 -05:00
Luke Heath
2c8ae8cc78
Adding changes for Fleet v4.75.0 (#33583) (#34483) 2025-10-17 21:51:17 -05:00
George Karr
dcefbc4efa
Adding changes for Fleet v4.74.1 (#34227) 2025-10-15 10:00:27 -05:00
Victor Lyuboslavsky
aef9b8400c
Added terraform files for Signoz OTEL backend. (#34058)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331 

This PR allows us to run loadtest with SigNoz OTEL backend by adding
`-var=enable_otel=true`
SigNoz is deployed via Helm chart.

Enhancements needed (in future PR):
- put SigNoz UI behind VPN
- combine the new eks-vpc with shared fleet-vpc
- make SigNoz shared, so multiple loadtests use the same instance? (But
what about updating to it to latest version?)

Next steps:
- Enable SigNoz in Dogfood environment
- SigNoz by default [keeps 15 days of logs and
traces](https://signoz.io/docs/userguide/retention-period), which is
quite a bit. How much would that cost us and should we reduce it?

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Optional OpenTelemetry tracing with SigNoz via a new enable_otel flag.
- Conditional deployment of a SigNoz stack (managed EKS, storage,
Helm-based apps) with internal OTLP collector endpoint.
- New outputs to retrieve OTLP endpoint, cluster name, and a kubectl
configuration command.

- Documentation
  - Added guidance for deploying and using SigNoz with load testing.
  - Updated examples to include -var=enable_otel=true.

- Chores
- Introduced required providers to support Helm and Kubernetes
resources.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-10-10 21:53:04 -05:00
Jorge Falcon
c0f753cb83
Updated permissions for GHA role - load test environment (#34059)
* Fixes missing STS permission on the load test environment GHA role
2025-10-09 15:10:52 -04:00
Jorge Falcon
e952ef06c0
Loadtesting IAC updates (#32629)
# Github Actions (New)
- New workflow to deploy/destroy loadtest infrastructure with one-click
(Needs to be tested)
- Common inputs drive configuration and deployment of loadtest
infrastructure
    - tag
    - fleet_task_count
    - fleet_task_memory
    - fleet_task_cpu
    - fleet_database_instance_size
    - fleet_database_instance_count
    - fleet_redis_instance_size
    - fleet_redis_instance_count
    - terraform_workspace
    - terraform_action
- New workflow to deploy/destroy osquery-perf to loadtest infrastructure
with one-click (Needs to be tested)
- Common inputs drive configuration and deployment of osquery-perf
resources
    - tag
    - git_branch
    - loadtest_containers
    - extra_flags
    - terraform_workspace
    - terraform_action
- New workflow to deploy shared loadtest resources with one-click (Needs
to be tested)

# Loadtest Infrastructure (New)
- New directory (`infrastructure/loadtesting/terraform/infra`) for
one-click deployment
- Loadtest environment updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/infra/README.md)
to reflect new steps

# Osquery-perf deployment (New)
- New directory (`infrastructure/loadtesting/terraform/osquery-perf`)
for the deployment of osquery-perf
- osquery-perf updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/osquery_perf)
to reflect new steps
2025-10-08 15:31:37 -04:00
Konstantin Sykulev
9e5c632c4c
Updating osquery perf loadtest infrastructure (#34003)
Bumping memory and cpu on aws load test containers Creating multiple ecs
services with a single task. This allows us to specify different
settings per osquery perf container/task.

**Related issue:** No issue.
2025-10-08 13:28:33 -05:00
Robert Fairburn
5f98be0f08
Allow RW of state for github/infra (#33852) 2025-10-05 19:40:49 -05:00
Robert Fairburn
c63f3ca183
Update iam rules for github on infra account (#33812) 2025-10-05 18:45:50 -05:00
Robert Fairburn
b5ba6da738
Add github access for check cloudflare action (#33797) 2025-10-02 19:37:29 -05:00
Luke Heath
53b3479d94
Prepare Fleet v4.74.0 (#33579) 2025-09-29 13:27:42 -05:00
Luke Heath
437a1f563c
Prepare Fleet v4.73.3 (#33527) (#33575) 2025-09-29 12:23:36 -05:00
George Karr
611cf8cc2b
Adding changes for Fleet v4.73.2 (#33118)
Co-authored-by: Luke Heath <luke@fleetdm.com>
2025-09-24 08:02:17 -05:00
George Karr
a81b0b868e
Adding changes for Fleet v4.73.1 (#32889) (#33116) 2025-09-17 10:38:19 -05:00
Luke Heath
7a6f57bc36
update main 4.72.1 4.73.0 (#32755) 2025-09-11 22:00:41 -05:00
Victor Lyuboslavsky
abc912bd03
Updated go to 1.25.1 (#32833) 2025-09-11 18:31:39 -05:00
Jorge Falcon
fc94901cac
Dogfood & Dogfood Free - Terraform deprecation fixes (#32101)
Added support to allow terraform plan (dry-run) without apply for
dogfood deployment action

Updated infrastructure/dogfood/terraform/aws-tf-module/docker/main.tf
- Allow hashicorp/aws `>= 5.68.0` instead of `~> 5.0`

Updated infrastructure/dogfood/terraform/aws-tf-module/main.tf
- Updated occurences of `data.aws_region.current.id` ->
`data.aws_region.current.region`
- Updated occurences of `data.aws_region.current.name` ->
`data.aws_region.current.region`
- Allow hashicorp/aws `>= 5.68.0` instead of `~> 5.0`
- `tf-mod-root-v1.15.2` -> `tf-mod-root-v1.17.0`
- `tf-mod-addon-migrations-v2.0.1` -> `tf-mod-addon-migrations-v2.1.0`
- `tf-mod-addon-osquery-carve-v1.1.0` ->
`tf-mod-addon-osquery-carve-v1.1.1`
- `tf-mod-addon-logging-alb-v1.3.0` -> `tf-mod-addon-logging-alb-v1.4.0`
- `tf-mod-addon-ses-v1.3.0` -> `tf-mod-addon-ses-v1.4.0`
- `tf-mod-addon-external-vuln-scans-v2.2.1` ->
`tf-mod-addon-external-vuln-scans-v2.3.0`

Updated infrastructure/dogfood/terraform/aws-tf-module/free.tf
- Updated occurences of `data.aws_region.current.id` ->
`data.aws_region.current.region`
- Updated occurences of `data.aws_region.current.name` ->
`data.aws_region.current.region`
- `tf-mod-byo-vpc-v1.13.0` -> `tf-mod-byo-vpc-v1.18.3`
- `tf-mod-addon-ses-v1.3.0` -> `tf-mod-addon-ses-v1.4.0`
- `tf-mod-addon-migrations-v2.0.1` -> `tf-mod-addon-migrations-v2.1.0`

Updated infrastructure/dogfood/terraform/aws-tf-module/free-ecs-hosts.tf
- Updated occurences of `data.aws_region.current.name` ->
`data.aws_region.current.region`
2025-08-19 22:48:19 -04:00
George Karr
ecc173deeb
Adding changes for Fleet v4.72.0 (#31273) (#31975) 2025-08-15 12:31:18 -05:00