Reorg infrastructure and add changes for frontend's loadtesting environment (#4947)

* Reorganized infrastructure, updated for frontend's loadtesting

* Add changes suggested by @chiiph

* Moved files per suggestion by Ben

* Update docs with new links

* Add config for multi account assume role
This commit is contained in:
Zachary Winnerman 2022-04-12 12:49:00 -04:00 committed by GitHub
parent 67ca6d37dd
commit 2fbe53b6c9
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
71 changed files with 47 additions and 22 deletions

View file

@ -8,7 +8,7 @@ go.mod @fleetdm/go
# Infra/terraform
*.tf @edwardsb @zwinnerman-fleetdm
/tools/loadtesting/terraform @zwinnerman-fleetdm
/infrastructure/ @zwinnerman-fleetdm @edwardsb
# GitHub settings + actions
/.github/ @zwass

View file

@ -16,7 +16,7 @@ Note: Please prefix versions with `fleet-v` (eg. `fleet-v4.0.0`) in git tags, He
- [package.json](https://github.com/fleetdm/fleet/blob/main/tools/fleetctl-npm/package.json) (do not yet `npm publish`)
- [Helm chart](https://github.com/fleetdm/fleet/blob/main/charts/fleet/Chart.yaml) and [values file](https://github.com/fleetdm/fleet/blob/main/charts/fleet/values.yaml)
- [Terraform variables](https://github.com/fleetdm/fleet/blob/main/tools/terraform/variables.tf)
- [Terraform variables](https://github.com/fleetdm/fleet/blob/main/infrastructure/dogfood/terraform/aws/variables.tf)
Commit these changes via Pull Request and pull the changes on the `main` branch locally. Check that
`HEAD` of the `main` branch points to the commit with these changes.

View file

@ -1,7 +1,7 @@
# Reference architectures
You can easily run Fleet on a single VPS that would be capable of supporting hundreds if not thousands of hosts, but
this page details an [opinionated view](https://github.com/fleetdm/fleet/tree/main/tools/terraform) of running Fleet in a production environment, as
this page details an [opinionated view](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws) of running Fleet in a production environment, as
well as different configuration strategies to enable High Availability (HA).
## Availability components
@ -16,7 +16,7 @@ Fleet recommends RDS Aurora MySQL when running on AWS. More details about backup
[here](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Backups.html). It is also
possible to dynamically scale read replicas to increase performance and [enable database fail-over](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.AuroraHighAvailability.html).
It is also possible to use [Aurora Global](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database.html) to
span multiple regions for more advanced configurations(_not included in the [reference terraform](https://github.com/fleetdm/fleet/tree/main/tools/terraform)_).
span multiple regions for more advanced configurations(_not included in the [reference terraform](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws)_).
In some cases adding a read replica can increase database performance for specific access patterns. In scenarios when automating the API or with `fleetctl`
there can be benefits to read performance.
@ -26,7 +26,7 @@ Load balancing enables distributing request traffic over many instances of the b
Load Balancer can also [offload SSL termination](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-https-listener.html), freeing Fleet to spend the majority of it's allocated compute dedicated
to its core functionality. More details about ALB can be found [here](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html).
_**Note if using [terraform reference architecture](https://github.com/fleetdm/fleet/tree/main/tools/terraform#terraform) all configurations can dynamically scale based on load(cpu/memory) and all configurations
_**Note if using [terraform reference architecture](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws#terraform) all configurations can dynamically scale based on load(cpu/memory) and all configurations
assume On-Demand pricing (savings are available through Reserved Instances). Calculations do not take into account NAT gateway charges or other networking related ingress/egress costs.**_
## Cloud providers
@ -79,7 +79,7 @@ assume On-Demand pricing (savings are available through Reserved Instances). Cal
| Redis | 6 | m6g.large | 3 |
| MySQL | 5.7.mysql_aurora.2.10.0 | db.r6g.16xlarge | 2 |
AWS reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/tools/terraform). This configuration includes:
AWS reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws). This configuration includes:
- VPC
- Subnets
@ -93,7 +93,7 @@ AWS reference architecture can be found [here](https://github.com/fleetdm/fleet/
- Elasticache Redis Engine
- Firehose osquery log destination
- S3 bucket sync to allow further ingestion/processing
- [Monitoring via Cloudwatch alarms](https://github.com/fleetdm/fleet/tree/main/tools/terraform/monitoring)
- [Monitoring via Cloudwatch alarms](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws/monitoring)
Some AWS services used in the provider reference architecture are billed as pay-per-use such as Firehose. This means that osquery scheduled query frequency can have
a direct correlation to how much these services cost, something to keep in mind when configuring Fleet in AWS.

View file

@ -561,25 +561,25 @@ Once you have the public IP address for the load balancer, create an A record in
## Deploying Fleet on AWS ECS
Terraform reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/tools/terraform)
Terraform reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws)
### Infrastructure dependencies
#### MySQL
In AWS we recommend running Aurora with MySQL Engine, see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/rds.tf#L62).
In AWS we recommend running Aurora with MySQL Engine, see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/rds.tf#L62).
#### Redis
In AWS we recommend running ElastiCache (Redis Engine) see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/redis.tf#L13)
In AWS we recommend running ElastiCache (Redis Engine) see [here for terraform details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/redis.tf#L13)
#### Fleet server
Running Fleet in ECS consists of two main components the [ECS Service](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L79) & [Load Balancer](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L41). In our example the ALB is [handling TLS termination](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L46)
Running Fleet in ECS consists of two main components the [ECS Service](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L79) & [Load Balancer](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L41). In our example the ALB is [handling TLS termination](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L46)
#### Fleet migrations
Migrations in ECS can be achieved (and is recommended) by running [dedicated ECS tasks](https://github.com/fleetdm/fleet/tree/main/tools/terraform#migrating-the-db) that run the `fleet prepare --no-prompt=true db` command. See [terraform for more details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/tools/terraform/ecs.tf#L229)
Migrations in ECS can be achieved (and is recommended) by running [dedicated ECS tasks](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws#migrating-the-db) that run the `fleet prepare --no-prompt=true db` command. See [terraform for more details](https://github.com/fleetdm/fleet/blob/589e11ebca40949fb568b2b68928450eecb718bf/infrastructure/dogfood/terraform/aws/ecs.tf#L229)
Alternatively you can bake the prepare command into the same task definition see [here for a discussion](https://github.com/fleetdm/fleet/pull/1761#discussion_r697599457), but this not recommended for production environments.

View file

@ -29,7 +29,7 @@ Note that Firehose logging has limits [discussed in the documentation](https://d
To send logs to Snowflake, you must first configure Fleet to send logs to [Firehose](#firehose). This is because you'll use the Snowflake Snowpipe integration to direct logs to Snowflake.
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/tools/terraform/firehose.tf), Firehose is already configured as your log destination.
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/infrastructure/dogfood/terraform/aws/firehose.tf), Firehose is already configured as your log destination.
With Fleet configured to send logs to Firehose, you then want to load the data from Firehose into a Snowflake database. AWS provides instructions on how to direct logs to a Snowflake database [here in the AWS documentation](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/automate-data-stream-ingestion-into-a-snowflake-database-by-using-snowflake-snowpipe-amazon-s3-amazon-sns-and-amazon-kinesis-data-firehose.html)
@ -41,7 +41,7 @@ To send logs to Splunk, you must first configure Fleet to send logs to [Firehose
With Fleet configured to send logs to Firehose, you then want to load the data from Firehose into Splunk. AWS provides instructions on how to enable Firehose to forward directly to Splunk [here in the AWS documentation](https://docs.aws.amazon.com/firehose/latest/dev/create-destination.html#create-destination-splunk).
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/tools/terraform), you want to replace the S3 destination with a Splunk destination. Hashicorp provides instructions on how to send Firehose data to Splunk [here in the Terraform documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kinesis_firehose_delivery_stream#splunk-destination).
If you're using Fleet's [terraform reference architecture](https://github.com/fleetdm/fleet/blob/main/infrastructure/dogfood/terraform/aws), you want to replace the S3 destination with a Splunk destination. Hashicorp provides instructions on how to send Firehose data to Splunk [here in the Terraform documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kinesis_firehose_delivery_stream#splunk-destination).
Splunk provides instructions on how to prepare the Splunk platform for Firehose data [here in the Splunk documentation](https://docs.splunk.com/Documentation/AddOns/latest/Firehose/ConfigureFirehose).

View file

@ -0,0 +1,8 @@
bucket = "fleet-terraform-state20220408141538466600000002"
key = "frontend-loadtesting/loadtesting/terraform.tfstate" # This should be set to account_alias/unique_key/terraform.tfstate
workspace_key_prefix = "frontend-loadtesting" # This should be set to the account alias
region = "us-east-2"
encrypt = true
kms_key_id = "9f98a443-ffd7-4dbe-a9c3-37df89b2e42a"
dynamodb_table = "tf-remote-state-lock"
role_arn = "arn:aws:iam::353365949058:role/terraform-frontend-loadtesting"

View file

@ -0,0 +1,8 @@
bucket = "fleet-terraform-state20220408141538466600000002"
key = "loadtesting/loadtesting/terraform.tfstate" # This should be set to account_alias/unique_key/terraform.tfstate
workspace_key_prefix = "loadtesting" # This should be set to the account alias
region = "us-east-2"
encrypt = true
kms_key_id = "9f98a443-ffd7-4dbe-a9c3-37df89b2e42a"
dynamodb_table = "tf-remote-state-lock"
role_arn = "arn:aws:iam::353365949058:role/terraform-loadtesting"

View file

@ -3,7 +3,7 @@ resource "aws_ecs_service" "loadtest" {
launch_type = "FARGATE"
cluster = aws_ecs_cluster.fleet.id
task_definition = aws_ecs_task_definition.loadtest.arn
desired_count = var.scale_down ? 0 : 0
desired_count = var.scale_down ? 0 : var.loadtest_containers
deployment_minimum_healthy_percent = 100
deployment_maximum_percent = 200

View file

@ -28,12 +28,7 @@ terraform {
version = "~> 0.1.0"
}
}
backend "s3" {
bucket = "fleet-loadtesting-tfstate"
key = "loadtesting"
region = "us-east-2"
dynamodb_table = "fleet-loadtesting-tfstate"
}
backend "s3" {}
}
data "aws_caller_identity" "current" {}

View file

@ -4,9 +4,17 @@ The interface into this code is designed to be minimal.
If you require changes beyond whats described here, contact @zwinnerman-fleetdm.
### Deploying your code to the loadtesting environment
1. Push your branch to https://github.com/fleetdm/fleet and wait for the build to complete (https://github.com/fleetdm/fleet/actions)
1. Initialize your terraform environment with `terraform init`
2. Apply terraform with your branch name with `terraform apply -var tag=BRANCH_NAME`
1. Apply terraform with your branch name with `terraform apply -var tag=BRANCH_NAME`
### Running migrations
After applying terraform with the commands above:
`aws ecs run-task --region us-east-2 --cluster fleet-backend --task-definition fleet-migrate:"$(terraform output -raw fleet_migration_revision)" --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets="$(terraform output -raw fleet_migration_subnets)",securityGroups="$(terraform output -raw fleet_migration_security_groups)"}"`
### Running a loadtest
We run simulated hosts in containers of 5,000 at a time. Once the infrastructure is running, you can run the following command:
`terraform apply -var tag=BRANCH_NAME -var loadtest_containers=8`
With the variable `loadtest_containers` you can specify how many containers of 5,000 hosts you want to start. In the example above, it will run 40,000.

View file

@ -13,3 +13,9 @@ variable "scale_down" {
type = bool
default = false
}
variable "loadtest_containers" {
description = "The number of containers to loadtest with"
type = number
default = 0
}