mirror of
https://github.com/fleetdm/fleet
synced 2026-04-21 13:37:30 +00:00
Reorg infrastructure and add changes for frontend's loadtesting environment (#4947)
* Reorganized infrastructure, updated for frontend's loadtesting * Add changes suggested by @chiiph * Moved files per suggestion by Ben * Update docs with new links * Add config for multi account assume role
This commit is contained in:
parent
67ca6d37dd
commit
2fbe53b6c9
71 changed files with 47 additions and 22 deletions
|
|
@ -8,7 +8,7 @@ go.mod @fleetdm/go
|
|||
|
||||
# Infra/terraform
|
||||
*.tf @edwardsb @zwinnerman-fleetdm
|
||||
/tools/loadtesting/terraform @zwinnerman-fleetdm
|
||||
/infrastructure/ @zwinnerman-fleetdm @edwardsb
|
||||
|
||||
# GitHub settings + actions
|
||||
/.github/ @zwass
|
||||
|
|
|
|||
|
|
@ -16,7 +16,7 @@ Note: Please prefix versions with `fleet-v` (eg. `fleet-v4.0.0`) in git tags, He
|
|||
|
||||
- [package.json](https://github.com/fleetdm/fleet/blob/main/tools/fleetctl-npm/package.json) (do not yet `npm publish`)
|
||||
- [Helm chart](https://github.com/fleetdm/fleet/blob/main/charts/fleet/Chart.yaml) and [values file](https://github.com/fleetdm/fleet/blob/main/charts/fleet/values.yaml)
|
||||
- [Terraform variables](https://github.com/fleetdm/fleet/blob/main/tools/terraform/variables.tf)
|
||||
- [Terraform variables](https://github.com/fleetdm/fleet/blob/main/infrastructure/dogfood/terraform/aws/variables.tf)
|
||||
|
||||
Commit these changes via Pull Request and pull the changes on the `main` branch locally. Check that
|
||||
`HEAD` of the `main` branch points to the commit with these changes.
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
# Reference architectures
|
||||
|
||||
You can easily run Fleet on a single VPS that would be capable of supporting hundreds if not thousands of hosts, but
|
||||
this page details an [opinionated view](https://github.com/fleetdm/fleet/tree/main/tools/terraform) of running Fleet in a production environment, as
|
||||
this page details an [opinionated view](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws) of running Fleet in a production environment, as
|
||||
well as different configuration strategies to enable High Availability (HA).
|
||||
|
||||
## Availability components
|
||||
|
|
@ -16,7 +16,7 @@ Fleet recommends RDS Aurora MySQL when running on AWS. More details about backup
|
|||
[here](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Backups.html). It is also
|
||||
possible to dynamically scale read replicas to increase performance and [enable database fail-over](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.AuroraHighAvailability.html).
|
||||
It is also possible to use [Aurora Global](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database.html) to
|
||||
span multiple regions for more advanced configurations(_not included in the [reference terraform](https://github.com/fleetdm/fleet/tree/main/tools/terraform)_).
|
||||
span multiple regions for more advanced configurations(_not included in the [reference terraform](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws)_).
|
||||
|
||||
In some cases adding a read replica can increase database performance for specific access patterns. In scenarios when automating the API or with `fleetctl`
|
||||
there can be benefits to read performance.
|
||||
|
|
@ -26,7 +26,7 @@ Load balancing enables distributing request traffic over many instances of the b
|
|||
Load Balancer can also [offload SSL termination](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-https-listener.html), freeing Fleet to spend the majority of it's allocated compute dedicated
|
||||
to its core functionality. More details about ALB can be found [here](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html).
|
||||
|
||||
_**Note if using [terraform reference architecture](https://github.com/fleetdm/fleet/tree/main/tools/terraform#terraform) all configurations can dynamically scale based on load(cpu/memory) and all configurations
|
||||
_**Note if using [terraform reference architecture](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws#terraform) all configurations can dynamically scale based on load(cpu/memory) and all configurations
|
||||
assume On-Demand pricing (savings are available through Reserved Instances). Calculations do not take into account NAT gateway charges or other networking related ingress/egress costs.**_
|
||||
|
||||
## Cloud providers
|
||||
|
|
@ -79,7 +79,7 @@ assume On-Demand pricing (savings are available through Reserved Instances). Cal
|
|||
| Redis | 6 | m6g.large | 3 |
|
||||
| MySQL | 5.7.mysql_aurora.2.10.0 | db.r6g.16xlarge | 2 |
|
||||
|
||||
AWS reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/tools/terraform). This configuration includes:
|
||||
AWS reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws). This configuration includes:
|
||||
|
||||
- VPC
|
||||
- Subnets
|
||||
|
|
@ -93,7 +93,7 @@ AWS reference architecture can be found [here](https://github.com/fleetdm/fleet/
|
|||
- Elasticache Redis Engine
|
||||
- Firehose osquery log destination
|
||||
- S3 bucket sync to allow further ingestion/processing
|
||||
- [Monitoring via Cloudwatch alarms](https://github.com/fleetdm/fleet/tree/main/tools/terraform/monitoring)
|
||||
- [Monitoring via Cloudwatch alarms](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws/monitoring)
|
||||
|
||||
Some AWS services used in the provider reference architecture are billed as pay-per-use such as Firehose. This means that osquery scheduled query frequency can have
|
||||
a direct correlation to how much these services cost, something to keep in mind when configuring Fleet in AWS.
|
||||
|
|
|
|||
|
|
@ -561,25 +561,25 @@ Once you have the public IP address for the load balancer, create an A record in
|
|||
|
||||
## Deploying Fleet on AWS ECS
|
||||
|
||||
Terraform reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/tools/terraform)
|
||||
Terraform reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws)
|
||||
|
||||
### Infrastructure dependencies
|
||||
|
||||
#### MySQL
|
||||
|
||||
In AWS we recommend running Aurora with MySQL Engine, see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/rds.tf#L62).
|
||||
In AWS we recommend running Aurora with MySQL Engine, see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/rds.tf#L62).
|
||||
|
||||
#### Redis
|
||||
|
||||
In AWS we recommend running ElastiCache (Redis Engine) see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/redis.tf#L13)
|
||||
In AWS we recommend running ElastiCache (Redis Engine) see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/redis.tf#L13)
|
||||
|
||||
#### Fleet server
|
||||
|
||||
Running Fleet in ECS consists of two main components the [ECS Service](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L79) & [Load Balancer](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L41). In our example the ALB is [handling TLS termination](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L46)
|
||||
Running Fleet in ECS consists of two main components the [ECS Service](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L79) & [Load Balancer](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L41). In our example the ALB is [handling TLS termination](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L46)
|
||||
|
||||
#### Fleet migrations
|
||||
|
||||
Migrations in ECS can be achieved (and is recommended) by running [dedicated ECS tasks](https://github.com/fleetdm/fleet/tree/main/tools/terraform#migrating-the-db) that run the `fleet prepare --no-prompt=true db` command. See [terraform for more details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L229)
|
||||
Migrations in ECS can be achieved (and is recommended) by running [dedicated ECS tasks](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws#migrating-the-db) that run the `fleet prepare --no-prompt=true db` command. See [terraform for more details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L229)
|
||||
|
||||
Alternatively you can bake the prepare command into the same task definition see [here for a discussion](https://github.com/fleetdm/fleet/pull/1761#discussion_r697599457), but this not recommended for production environments.
|
||||
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ Note that Firehose logging has limits [discussed in the documentation](https://d
|
|||
|
||||
To send logs to Snowflake, you must first configure Fleet to send logs to [Firehose](#firehose). This is because you'll use the Snowflake Snowpipe integration to direct logs to Snowflake.
|
||||
|
||||
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/tools/terraform/firehose.tf), Firehose is already configured as your log destination.
|
||||
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/infrastructure/dogfood/terraform/aws/firehose.tf), Firehose is already configured as your log destination.
|
||||
|
||||
With Fleet configured to send logs to Firehose, you then want to load the data from Firehose into a Snowflake database. AWS provides instructions on how to direct logs to a Snowflake database [here in the AWS documentation](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/automate-data-stream-ingestion-into-a-snowflake-database-by-using-snowflake-snowpipe-amazon-s3-amazon-sns-and-amazon-kinesis-data-firehose.html)
|
||||
|
||||
|
|
@ -41,7 +41,7 @@ To send logs to Splunk, you must first configure Fleet to send logs to [Firehose
|
|||
|
||||
With Fleet configured to send logs to Firehose, you then want to load the data from Firehose into Splunk. AWS provides instructions on how to enable Firehose to forward directly to Splunk [here in the AWS documentation](https://docs.aws.amazon.com/firehose/latest/dev/create-destination.html#create-destination-splunk).
|
||||
|
||||
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/tools/terraform), you want to replace the S3 destination with a Splunk destination. Hashicorp provides instructions on how to send Firehose data to Splunk [here in the Terraform documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kinesis_firehose_delivery_stream#splunk-destination).
|
||||
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/infrastructure/dogfood/terraform/aws), you want to replace the S3 destination with a Splunk destination. Hashicorp provides instructions on how to send Firehose data to Splunk [here in the Terraform documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kinesis_firehose_delivery_stream#splunk-destination).
|
||||
|
||||
Splunk provides instructions on how to prepare the Splunk platform for Firehose data [here in the Splunk documentation](https://docs.splunk.com/Documentation/AddOns/latest/Firehose/ConfigureFirehose).
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,8 @@
|
|||
bucket = "fleet-terraform-state20220408141538466600000002"
|
||||
key = "frontend-loadtesting/loadtesting/terraform.tfstate" # This should be set to account_alias/unique_key/terraform.tfstate
|
||||
workspace_key_prefix = "frontend-loadtesting" # This should be set to the account alias
|
||||
region = "us-east-2"
|
||||
encrypt = true
|
||||
kms_key_id = "9f98a443-ffd7-4dbe-a9c3-37df89b2e42a"
|
||||
dynamodb_table = "tf-remote-state-lock"
|
||||
role_arn = "arn:aws:iam::353365949058:role/terraform-frontend-loadtesting"
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
bucket = "fleet-terraform-state20220408141538466600000002"
|
||||
key = "loadtesting/loadtesting/terraform.tfstate" # This should be set to account_alias/unique_key/terraform.tfstate
|
||||
workspace_key_prefix = "loadtesting" # This should be set to the account alias
|
||||
region = "us-east-2"
|
||||
encrypt = true
|
||||
kms_key_id = "9f98a443-ffd7-4dbe-a9c3-37df89b2e42a"
|
||||
dynamodb_table = "tf-remote-state-lock"
|
||||
role_arn = "arn:aws:iam::353365949058:role/terraform-loadtesting"
|
||||
|
|
@ -3,7 +3,7 @@ resource "aws_ecs_service" "loadtest" {
|
|||
launch_type = "FARGATE"
|
||||
cluster = aws_ecs_cluster.fleet.id
|
||||
task_definition = aws_ecs_task_definition.loadtest.arn
|
||||
desired_count = var.scale_down ? 0 : 0
|
||||
desired_count = var.scale_down ? 0 : var.loadtest_containers
|
||||
deployment_minimum_healthy_percent = 100
|
||||
deployment_maximum_percent = 200
|
||||
|
||||
|
|
@ -28,12 +28,7 @@ terraform {
|
|||
version = "~> 0.1.0"
|
||||
}
|
||||
}
|
||||
backend "s3" {
|
||||
bucket = "fleet-loadtesting-tfstate"
|
||||
key = "loadtesting"
|
||||
region = "us-east-2"
|
||||
dynamodb_table = "fleet-loadtesting-tfstate"
|
||||
}
|
||||
backend "s3" {}
|
||||
}
|
||||
|
||||
data "aws_caller_identity" "current" {}
|
||||
|
|
@ -4,9 +4,17 @@ The interface into this code is designed to be minimal.
|
|||
If you require changes beyond whats described here, contact @zwinnerman-fleetdm.
|
||||
|
||||
### Deploying your code to the loadtesting environment
|
||||
1. Push your branch to https://github.com/fleetdm/fleet and wait for the build to complete (https://github.com/fleetdm/fleet/actions)
|
||||
1. Initialize your terraform environment with `terraform init`
|
||||
2. Apply terraform with your branch name with `terraform apply -var tag=BRANCH_NAME`
|
||||
1. Apply terraform with your branch name with `terraform apply -var tag=BRANCH_NAME`
|
||||
|
||||
### Running migrations
|
||||
After applying terraform with the commands above:
|
||||
`aws ecs run-task --region us-east-2 --cluster fleet-backend --task-definition fleet-migrate:"$(terraform output -raw fleet_migration_revision)" --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets="$(terraform output -raw fleet_migration_subnets)",securityGroups="$(terraform output -raw fleet_migration_security_groups)"}"`
|
||||
|
||||
### Running a loadtest
|
||||
We run simulated hosts in containers of 5,000 at a time. Once the infrastructure is running, you can run the following command:
|
||||
|
||||
`terraform apply -var tag=BRANCH_NAME -var loadtest_containers=8`
|
||||
|
||||
With the variable `loadtest_containers` you can specify how many containers of 5,000 hosts you want to start. In the example above, it will run 40,000.
|
||||
|
|
@ -13,3 +13,9 @@ variable "scale_down" {
|
|||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "loadtest_containers" {
|
||||
description = "The number of containers to loadtest with"
|
||||
type = number
|
||||
default = 0
|
||||
}
|
||||
Loading…
Reference in a new issue