Commit graph

462 commits

Author SHA1 Message Date
Jorge Falcon
0471b8ce19
Loadtest - osquery_perf - Removal of fleet_image requirement (#35365)
- Adds support for `enroll.sh`, to deploy osquery_perf in batches
- Merges variables `tag` and `git_branch` into `git_tag_branch`. Only
one tag or git_branch should be specified.
  - Still used for osquery_perf to check out the correct tag/branch.
- Removes fleet_image requirement for cutting osquery_perf images

---------

Co-authored-by: Robert Fairburn <8029478+rfairburn@users.noreply.github.com>
2025-11-10 16:16:20 -05:00
Luke Heath
0056d36d81
Adding changes for Fleet v4.76.0 (#34486) (#35380) 2025-11-07 19:19:12 -06:00
Robert Fairburn
12a19cbb42
Add fleet try account to state bucket (#35092) 2025-11-03 11:18:29 -06:00
Victor Lyuboslavsky
73501e5755
Infra changes after latest loadtest (#35083)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #34500

Terraform changes after my latest loadtest.

VPC consolidation: updated (and deployed) shared VPC so that Signoz
backend can now use it

  - Removed eks-vpc/ directory
  - Moved VPC management to shared/vpc.tf
  - Updated shared/init.tf to reflect VPC changes

  Infra improvements

- infra/internal_alb.tf - changed suffix from -internal to -int since I
hit max 32 characters issue

OTEL

- OTEL Collector configuration overrides for production stability
2025-11-03 11:02:15 -06:00
Victor Lyuboslavsky
072ee68eda
Updating to Go 1.25.3 (#35082) 2025-11-03 09:47:07 -06:00
Jorge Falcon
6ea9185c1c
Loadtesting - osquery_perf docker image build fixes (#34901)
- Bumps docker provider from 2.16.0 to 3.6.x
- Moves builds from `docker_registry_image` to new `docker_image`
resource
2025-10-29 08:33:46 -04:00
Robert Fairburn
1fedabe7a8
Update alpine base image to latest (#34864)
Resolves openssl:3.3.3/CVE-2025-9230 in base images.
2025-10-28 11:24:05 -05:00
Robert Fairburn
30c4798ec6
Switch git providers for loadtesting tf (#34180)
untested end-to-end but works as a replacement for plans and doesn't
require a local arm64 build to work.

Co-authored-by: Jorge Falcon <22119513+BCTBB@users.noreply.github.com>
2025-10-23 14:53:13 -04:00
Victor Lyuboslavsky
e4e3c3f9ff
Fix issues with OTEL SigNoz deployments for loadtests (#34694)
SigNoz converted from child module to standalone root module with
independent state.

  **Critical Impact**

  Deployment order is now required:
  1. Deploy infrastructure/loadtesting/terraform/signoz/ FIRST
  2. Then deploy infrastructure/loadtesting/terraform/infra/

  Communication between modules via Terraform remote state.

  **Key Configuration Changes**

  - SigNoz creates its own EKS cluster: signoz-${workspace}
- Instance type: t3.xlarge (upgraded from t3.large for resource
headroom)
  - ClickHouse disk: 200Gi (was 20Gi) with 2-day retention
  - Resource limits configured to prevent OOMKills during loadtest
  - wait_for_jobs = false to avoid Helm deployment deadlock


<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331
2025-10-23 12:49:36 -05:00
George Karr
304d581d87
Adding changes for Fleet v4.75.1 (#34571) 2025-10-22 10:11:58 -05:00
Luke Heath
2c8ae8cc78
Adding changes for Fleet v4.75.0 (#33583) (#34483) 2025-10-17 21:51:17 -05:00
George Karr
dcefbc4efa
Adding changes for Fleet v4.74.1 (#34227) 2025-10-15 10:00:27 -05:00
Victor Lyuboslavsky
aef9b8400c
Added terraform files for Signoz OTEL backend. (#34058)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331 

This PR allows us to run loadtest with SigNoz OTEL backend by adding
`-var=enable_otel=true`
SigNoz is deployed via Helm chart.

Enhancements needed (in future PR):
- put SigNoz UI behind VPN
- combine the new eks-vpc with shared fleet-vpc
- make SigNoz shared, so multiple loadtests use the same instance? (But
what about updating to it to latest version?)

Next steps:
- Enable SigNoz in Dogfood environment
- SigNoz by default [keeps 15 days of logs and
traces](https://signoz.io/docs/userguide/retention-period), which is
quite a bit. How much would that cost us and should we reduce it?

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Optional OpenTelemetry tracing with SigNoz via a new enable_otel flag.
- Conditional deployment of a SigNoz stack (managed EKS, storage,
Helm-based apps) with internal OTLP collector endpoint.
- New outputs to retrieve OTLP endpoint, cluster name, and a kubectl
configuration command.

- Documentation
  - Added guidance for deploying and using SigNoz with load testing.
  - Updated examples to include -var=enable_otel=true.

- Chores
- Introduced required providers to support Helm and Kubernetes
resources.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-10-10 21:53:04 -05:00
Jorge Falcon
c0f753cb83
Updated permissions for GHA role - load test environment (#34059)
* Fixes missing STS permission on the load test environment GHA role
2025-10-09 15:10:52 -04:00
Jorge Falcon
e952ef06c0
Loadtesting IAC updates (#32629)
# Github Actions (New)
- New workflow to deploy/destroy loadtest infrastructure with one-click
(Needs to be tested)
- Common inputs drive configuration and deployment of loadtest
infrastructure
    - tag
    - fleet_task_count
    - fleet_task_memory
    - fleet_task_cpu
    - fleet_database_instance_size
    - fleet_database_instance_count
    - fleet_redis_instance_size
    - fleet_redis_instance_count
    - terraform_workspace
    - terraform_action
- New workflow to deploy/destroy osquery-perf to loadtest infrastructure
with one-click (Needs to be tested)
- Common inputs drive configuration and deployment of osquery-perf
resources
    - tag
    - git_branch
    - loadtest_containers
    - extra_flags
    - terraform_workspace
    - terraform_action
- New workflow to deploy shared loadtest resources with one-click (Needs
to be tested)

# Loadtest Infrastructure (New)
- New directory (`infrastructure/loadtesting/terraform/infra`) for
one-click deployment
- Loadtest environment updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/infra/README.md)
to reflect new steps

# Osquery-perf deployment (New)
- New directory (`infrastructure/loadtesting/terraform/osquery-perf`)
for the deployment of osquery-perf
- osquery-perf updated to use [fleet-terraform
modules](https://github.com/fleetdm/fleet-terraform)
- [Deployment documentation
updated](0c254bca40/infrastructure/loadtesting/terraform/osquery_perf)
to reflect new steps
2025-10-08 15:31:37 -04:00
Konstantin Sykulev
9e5c632c4c
Updating osquery perf loadtest infrastructure (#34003)
Bumping memory and cpu on aws load test containers Creating multiple ecs
services with a single task. This allows us to specify different
settings per osquery perf container/task.

**Related issue:** No issue.
2025-10-08 13:28:33 -05:00
Robert Fairburn
5f98be0f08
Allow RW of state for github/infra (#33852) 2025-10-05 19:40:49 -05:00
Robert Fairburn
c63f3ca183
Update iam rules for github on infra account (#33812) 2025-10-05 18:45:50 -05:00
Robert Fairburn
b5ba6da738
Add github access for check cloudflare action (#33797) 2025-10-02 19:37:29 -05:00
Luke Heath
53b3479d94
Prepare Fleet v4.74.0 (#33579) 2025-09-29 13:27:42 -05:00
Luke Heath
437a1f563c
Prepare Fleet v4.73.3 (#33527) (#33575) 2025-09-29 12:23:36 -05:00
George Karr
611cf8cc2b
Adding changes for Fleet v4.73.2 (#33118)
Co-authored-by: Luke Heath <luke@fleetdm.com>
2025-09-24 08:02:17 -05:00
George Karr
a81b0b868e
Adding changes for Fleet v4.73.1 (#32889) (#33116) 2025-09-17 10:38:19 -05:00
Luke Heath
7a6f57bc36
update main 4.72.1 4.73.0 (#32755) 2025-09-11 22:00:41 -05:00
Victor Lyuboslavsky
abc912bd03
Updated go to 1.25.1 (#32833) 2025-09-11 18:31:39 -05:00
Jorge Falcon
fc94901cac
Dogfood & Dogfood Free - Terraform deprecation fixes (#32101)
Added support to allow terraform plan (dry-run) without apply for
dogfood deployment action

Updated infrastructure/dogfood/terraform/aws-tf-module/docker/main.tf
- Allow hashicorp/aws `>= 5.68.0` instead of `~> 5.0`

Updated infrastructure/dogfood/terraform/aws-tf-module/main.tf
- Updated occurences of `data.aws_region.current.id` ->
`data.aws_region.current.region`
- Updated occurences of `data.aws_region.current.name` ->
`data.aws_region.current.region`
- Allow hashicorp/aws `>= 5.68.0` instead of `~> 5.0`
- `tf-mod-root-v1.15.2` -> `tf-mod-root-v1.17.0`
- `tf-mod-addon-migrations-v2.0.1` -> `tf-mod-addon-migrations-v2.1.0`
- `tf-mod-addon-osquery-carve-v1.1.0` ->
`tf-mod-addon-osquery-carve-v1.1.1`
- `tf-mod-addon-logging-alb-v1.3.0` -> `tf-mod-addon-logging-alb-v1.4.0`
- `tf-mod-addon-ses-v1.3.0` -> `tf-mod-addon-ses-v1.4.0`
- `tf-mod-addon-external-vuln-scans-v2.2.1` ->
`tf-mod-addon-external-vuln-scans-v2.3.0`

Updated infrastructure/dogfood/terraform/aws-tf-module/free.tf
- Updated occurences of `data.aws_region.current.id` ->
`data.aws_region.current.region`
- Updated occurences of `data.aws_region.current.name` ->
`data.aws_region.current.region`
- `tf-mod-byo-vpc-v1.13.0` -> `tf-mod-byo-vpc-v1.18.3`
- `tf-mod-addon-ses-v1.3.0` -> `tf-mod-addon-ses-v1.4.0`
- `tf-mod-addon-migrations-v2.0.1` -> `tf-mod-addon-migrations-v2.1.0`

Updated infrastructure/dogfood/terraform/aws-tf-module/free-ecs-hosts.tf
- Updated occurences of `data.aws_region.current.name` ->
`data.aws_region.current.region`
2025-08-19 22:48:19 -04:00
George Karr
ecc173deeb
Adding changes for Fleet v4.72.0 (#31273) (#31975) 2025-08-15 12:31:18 -05:00
Lucas Manuel Rodriguez
d849e01add
Update Go to 1.24.6 (#31784)
Ran
```
make update-go version=1.24.6
```
And then updated the `sha256`s manually in the Dockerfiles.

Fixes https://nvd.nist.gov/vuln/detail/CVE-2025-47907
```
Cancelling a query (e.g. by cancelling the context passed to one of the query methods) during a call
to the Scan method of the returned Rows can result in unexpected results if other queries are being
made in parallel. This can result in a race condition that may overwrite the expected results with those
of another query, causing the call to Scan to return either unexpected results from the other
query or an error.
```
2025-08-12 08:10:05 -03:00
George Karr
7d8f17f53a
gkarr update changelog (#31585)
- **Adding changes for Fleet v4.71.1 (#31531)**
- **updating changelog**
2025-08-04 15:41:10 -05:00
Jorge Falcon
e2340385a9
Dogfood - Fix error in log on empty cert.00.pem, when retrieving rds tls certificate (#31515)
Fix for error log that is generated by csplit when determining the
correct certificate to use.

```
Could not find certificate from cert.00.pem
```
2025-08-01 12:54:11 -04:00
Jorge Falcon
9618d72b54
Loadtesting MySQL engine_version update (#31351)
- MySQL engine version bumped from 8.0.mysql_aurora.3.07.1 ->
8.0.mysql_aurora.3.08.2
2025-07-29 12:02:49 -04:00
Jorge Falcon
d964e124cc
Dogfood - Enable Fleet TLS connectivity to MySQL (#31201)
- Added tls certificate retriever sidecar configuration and
dependencies, for dogfood
- Added tls certificate retriever sidecar configuration and
dependencies, for dogfood (free)
2025-07-23 22:01:26 -04:00
Luke Heath
99a0217db6
Adding changes for Fleet v4.71.0 (#30599) (#31198) 2025-07-23 16:04:33 -06:00
Jorge Falcon
2c773ae346
Dogfood - Increasing instance size for fleetbot from small -> medium (#31177)
- Modifying fleetbot instance size t3.small -> t3. medium to match
manual instance resize
2025-07-23 11:08:28 -05:00
Janis Watts
7085ad2a74
Update enable cloudfront directions (#31152)
Just a couple small changes to help with the instructions
2025-07-22 16:31:12 -05:00
Jorge Falcon
dcf68ccd09
Loadtesting - Cloudfront iam fix (#31145)
- Added missed IAM permission for tasks to access cloudfront secret
2025-07-22 15:07:26 -04:00
Jorge Falcon
a87ec09e16
Dogfood - Fleetbot ec2 instance deployment (#31120)
* Create fleetbot ec2 instance
* Create security group for fleetbot ec2 instance
* Create ingress/egress security group rules for fleetbot ec2 instance
2025-07-22 09:27:24 -04:00
Jorge Falcon
3a112afdb6
Loadtesting - Enable Cloudfront (#31073)
# Added
- Added kms.tf to support encrypting keys, specifically cloudfront keys.
- Added template/cloudfront.tf.disabled for use in enabling cloudfront.-
Modified ecs-iam.tf to support log-alb.tf, cloudfront.tf policies that
are injected into `local.extra_execution_iam_policies` and `local.iam`.
- Added log-alb.tf to enable logging alb, required by cloudfront.tf.

# Changed
- Modified ecs.tf to support adding of additional secrets from
`local.secrets`.
- Modified firehose.tf to support provider required updates for
deprecated resource configurations.
- Modified init.tf to support `> v5.0` of `hashicorp/aws` provider.
- Modified locals.tf to add `extra_execution_iam_policies`, `iam`,
`software_installers_kms_policy`, `extra_secrets`, secrets, and
`cloudfront_key_basename`, to support cloudfront.
- Modified readme.md with instructions on how to enable cloudfront.tf
- Modified redis.tf to support provider required updates for deprecated
resource configurations
- Modified s3.tf to support kms keys and add kms iam.
- Modified terraform version in .github/workflows/tfvalidate.yml - 1.9.0
-> 1.10.4
2025-07-21 16:41:06 -04:00
Jorge Falcon
91cedf039d
Allow Loadtesting environment non-empty s3 bucket cleanup on terraform destroy (#30899)
* Modified resource aws_s3_bucket blocks to include `force_destroy =
true` in firehose.tf and s3.tf.
2025-07-16 12:15:27 -04:00
jacobshandling
555ae5441e
Update Go to 1.24.5 (#30770)
## #30730 
- Update Go version
- Update the docs for this process
- Confirmed `fleet`, `fleetctl`, and related docker images build
successfully
- Note that failing tests are unrelated: see [Slack
thread](https://fleetdm.slack.com/archives/C019WG4GH0A/p1752175318523689)

---------

Co-authored-by: Jacob Shandling <jacob@fleetdm.com>
2025-07-15 10:59:17 -07:00
Robert Fairburn
ad28be9623
Fix maintenance window and rds engine version dogfood (#30791) 2025-07-14 17:46:13 -05:00
Robert Fairburn
6e52b61ef9
Fix secretsmanager policies in dogfood (#30765) 2025-07-10 16:25:20 -05:00
Robert Fairburn
372d31bfd0
Dogfood env var fixes (#30737) 2025-07-10 11:20:50 -05:00
George Karr
39e381be96
Adding changes for Fleet v4.70.1 (#30606) (#30733)
Co-authored-by: Dante Catalfamo
<43040593+dantecatalfamo@users.noreply.github.com>

Co-authored-by: Dante Catalfamo <43040593+dantecatalfamo@users.noreply.github.com>
2025-07-10 10:57:37 -05:00
Luke Heath
6c7d103fcd
Adding changes for Fleet v4.70.0 (#30048) (#30729)
Co-authored-by: Lucas Manuel Rodriguez <lucas@fleetdm.com>
Co-authored-by: Gabriel Hernandez <ghernandez345@gmail.com>
Co-authored-by: Ian Littman <iansltx@gmail.com>
Co-authored-by: jacobshandling
<61553566+jacobshandling@users.noreply.github.com>
Co-authored-by: Jacob Shandling <jacob@fleetdm.com>
Co-authored-by: Dante Catalfamo
<43040593+dantecatalfamo@users.noreply.github.com>
Co-authored-by: RachelElysia
<71795832+RachelElysia@users.noreply.github.com>
Co-authored-by: github-actions[bot]
<41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: RachelElysia <RachelElysia@users.noreply.github.com>
Co-authored-by: Noah Talerman
<47070608+noahtalerman@users.noreply.github.com>
Co-authored-by: Juan Fernandez <juan-fdz-hawa@users.noreply.github.com>
Co-authored-by: George Karr <georgekarrv@gmail.com>

Co-authored-by: Lucas Manuel Rodriguez <lucas@fleetdm.com>
Co-authored-by: Gabriel Hernandez <ghernandez345@gmail.com>
Co-authored-by: Ian Littman <iansltx@gmail.com>
Co-authored-by: jacobshandling <61553566+jacobshandling@users.noreply.github.com>
Co-authored-by: Jacob Shandling <jacob@fleetdm.com>
Co-authored-by: Dante Catalfamo <43040593+dantecatalfamo@users.noreply.github.com>
Co-authored-by: RachelElysia <71795832+RachelElysia@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: RachelElysia <RachelElysia@users.noreply.github.com>
Co-authored-by: Noah Talerman <47070608+noahtalerman@users.noreply.github.com>
Co-authored-by: Juan Fernandez <juan-fdz-hawa@users.noreply.github.com>
Co-authored-by: George Karr <georgekarrv@gmail.com>
2025-07-10 10:31:41 -05:00
Jorge Falcon
bc9c2b48ad
Adding support to dogfood for FLEET_MICROSOFT_COMPLIANCE_PARTNER_PROXY_API_KEY (#30709)
- Adding `FLEET_MICROSOFT_COMPLIANCE_PARTNER_PROXY_API_KEY` to dogfood
- Adding creation of secret and secret version for
`FLEET_MICROSOFT_COMPLIANCE_PARTNER_PROXY_API_KEY` value
2025-07-10 00:59:06 -04:00
Jorge Falcon
aa2a080711
Dogfood - re-enabling webhook log destination (#30690)
- Disabling firehose log destination
- Re-enabling webhook log destination

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Enabled webhook logging by activating environment variables for
webhook URLs.
* Webhook log plugin is now conditionally set based on the presence of a
webhook URL.

* **Chores**
* Updated environment variable management by removing firehose-logging
addon variables from the configuration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-09 12:43:08 -04:00
Jorge Falcon
e2827199b9
Dogfood - re-enabling firehose (#30688)
- Disabled webhook variables
- Re-enabled firehose variables

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Disabled certain environment variables related to webhook logging.
* Updated environment variable configuration to include additional
logging settings.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-09 12:21:38 -04:00
Luke Heath
653291c6b4
Prepare Fleet v4.69.0 (#30024) 2025-06-16 10:43:20 -05:00
Benjamin Edwards
e3711d0b11
added env vars for webhook osquery results logging destination (#29809)
Update dogfood deployment to utilize webhooks for the osquery results
logging destination configuration

@BCTBB already added a tines.io webhook URL to the repo secrets
`DOGFOOD_WEBHOOK_URL` where the value was provided by @harrisonravazzolo

Co-authored-by: Harrison Ravazzolo <38767391+harrisonravazzolo@users.noreply.github.com>
2025-06-16 10:22:31 -05:00