fleet/terraform/addons/monitoring
Roberto Dip 92c6c26d40
update to go1.23.1 (#21868)
for #21440

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Committing-Changes.md#changes-files)
for more information.
- [x] Manual QA for all new/changed functionality
2024-09-06 09:11:30 -03:00
..
lambda update to go1.23.1 (#21868) 2024-09-06 09:11:30 -03:00
.header.md Update docs: Best practice infra for deploy security agents (#19251) 2024-05-28 14:41:54 -04:00
.terraform-docs.yml Initial terraform monitoring addon module (#9864) 2023-02-16 14:30:08 -06:00
main.tf Terraform monitoring alert thresholds update (#18790) 2024-05-07 11:17:01 -05:00
README.md Update docs: Best practice infra for deploy security agents (#19251) 2024-05-28 14:41:54 -04:00
variables.tf Terraform monitoring alert thresholds update (#18790) 2024-05-07 11:17:01 -05:00

Monitoring addon

This addon enables Cloudwatch monitoring for Fleet.

This includes:

  • 5XX Errors on ALB
  • ECS Service Monitoring
  • RDS Monitoring
  • Redis Monitoring
  • ACM Certificate Monitoring
  • A custom Lambda to check the Fleet DB for Cron runs

Preparation

Some of the for_each and counts in this module cannot pre-determine the numbers until the main fleet module is applied.

You will need to terraform apply -target module.main prior applying monitoring assuming the use of a configuration matching the example at https://github.com/fleetdm/fleet/blob/main/terraform/example/main.tf.

Multiple alb support was added in order to allow monitoring saml-auth-proxy. See https://github.com/fleetdm/fleet/tree/main/terraform/addons/saml-auth-proxy

Example configuration

This assumes your fleet module is main and is configured with it's default documentation.

https://github.com/fleetdm/fleet/blob/main/terraform/example/main.tf for details.

Note if you haven't specified defined local.customer or customized service names, the default is "fleet" for anywhere that local.customer is specified below.

module "monitoring" {
  source                 = "github.com/fleetdm/fleet//terraform/addons/monitoring?ref=tf-mod-addon-monitoring-v1.4.0"
  customer_prefix        = local.customer
  fleet_ecs_service_name = module.main.byo-vpc.byo-db.byo-ecs.service.name
  albs = [
    {
      name                    = module.main.byo-vpc.byo-db.alb.lb_dns_name,
      target_group_name       = module.main.byo-vpc.byo-db.alb.target_group_names[0]
      target_group_arn_suffix = module.main.byo-vpc.byo-db.alb.target_group_arn_suffixes[0]
      arn_suffix              = module.main.byo-vpc.byo-db.alb.lb_arn_suffix
      ecs_service_name        = module.main.byo-vpc.byo-db.byo-ecs.service.name
      min_containers          = module.main.byo-vpc.byo-db.byo-ecs.appautoscaling_target.min_capacity
      alert_thresholds = {
        HTTPCode_ELB_5XX_Count = {
          period    = 3600
          threshold = 2
        },
        HTTPCode_Target_5XX_Count = {
          period    = 120
          threshold = 0
        }
      }
    },
  ]
  sns_topic_arns_map = {
    alb_httpcode_5xx = [var.slack_topic_arn]
    cron_monitoring  = [var.slack_topic_arn]
  }
  mysql_cluster_members = module.main.byo-vpc.rds.cluster_members
  # The cloudposse module seems to have a nested list here.
  redis_cluster_members = module.main.byo-vpc.redis.member_clusters[0]
  acm_certificate_arn   = module.acm.acm_certificate_arn
  cron_monitoring = {
    mysql_host                 = module.main.byo-vpc.rds.cluster_reader_endpoint
    mysql_database             = module.main.byo-vpc.rds.cluster_database_name
    mysql_user                 = module.main.byo-vpc.rds.cluster_master_username
    mysql_password_secret_name = module.main.byo-vpc.secrets.secret_ids["${local.customer}-database-password"]
    rds_security_group_id      = module.main.byo-vpc.rds.security_group_id
    subnet_ids                 = module.main.vpc.private_subnets
    vpc_id                     = module.main.vpc.vpc_id
    # Format of https://pkg.go.dev/time#ParseDuration
    delay_tolerance = "4h"
    # Interval format for: https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html#rate-based
    run_interval          = "1 hour"
    log_retention_in_days = 365
  }
}

SNS topic ARNs map

Valid targets for sns_topic_arns_map:

  • acm_certificate_expired
  • alb_helthyhosts
  • alb_httpcode_5xx
  • backend_response_time
  • cron_monitoring
  • rds_cpu_untilizaton_too_high
  • rds_db_event_subscription
  • redis_cpu_engine_utilization
  • redis_cpu_utilization
  • redis_current_connections
  • redis_database_memory_percentage
  • redis_replication_lag

If you want to publish to all, use default_sns_topic_arns instead and include your notification ARNs there.

Requirements

No requirements.

Providers

Name Version
archive 2.4.0
aws 5.22.0
null 3.2.1

Modules

No modules.

Resources

Name Type
aws_cloudwatch_event_rule.cron_monitoring_lambda resource
aws_cloudwatch_event_target.cron_monitoring_lambda resource
aws_cloudwatch_log_group.cron_monitoring_lambda resource
aws_cloudwatch_metric_alarm.acm_certificate_expired resource
aws_cloudwatch_metric_alarm.alb_healthyhosts resource
aws_cloudwatch_metric_alarm.cpu_utilization_too_high resource
aws_cloudwatch_metric_alarm.lb resource
aws_cloudwatch_metric_alarm.redis-current-connections resource
aws_cloudwatch_metric_alarm.redis-database-memory-percentage resource
aws_cloudwatch_metric_alarm.redis-replication-lag resource
aws_cloudwatch_metric_alarm.redis_cpu resource
aws_cloudwatch_metric_alarm.redis_cpu_engine_utilization resource
aws_cloudwatch_metric_alarm.target_response_time resource
aws_db_event_subscription.default resource
aws_iam_policy.cron_monitoring_lambda resource
aws_iam_role.cron_monitoring_lambda resource
aws_iam_role_policy_attachment.cron_monitoring_lambda resource
aws_iam_role_policy_attachment.cron_monitoring_lambda_managed resource
aws_lambda_function.cron_monitoring resource
aws_lambda_permission.cron_monitoring_cloudwatch resource
aws_security_group.cron_monitoring resource
aws_security_group_rule.cron_monitoring_to_rds resource
null_resource.cron_monitoring_build resource
archive_file.cron_monitoring_lambda data source
aws_caller_identity.current data source
aws_iam_policy_document.cron_monitoring_lambda data source
aws_iam_policy_document.cron_monitoring_lambda_assume_role data source
aws_region.current data source
aws_secretsmanager_secret.mysql_database_password data source

Inputs

Name Description Type Default Required
acm_certificate_arn n/a string null no
albs n/a
list(object({
name = string
arn_suffix = string
target_group_name = string
target_group_arn_suffix = string
min_containers = optional(string, 1)
ecs_service_name = string
alert_thresholds = optional(
object({
HTTPCode_ELB_5XX_Count = object({
period = number
threshold = number
})
HTTPCode_Target_5XX_Count = object({
period = number
threshold = number
})
}),
{
HTTPCode_ELB_5XX_Count = {
period = 120
threshold = 0
},
HTTPCode_Target_5XX_Count = {
period = 120
threshold = 0
}
}
)
}))
[] no
cron_monitoring n/a
object({
mysql_host = string
mysql_database = string
mysql_user = string
mysql_password_secret_name = string
vpc_id = string
subnet_ids = list(string)
rds_security_group_id = string
delay_tolerance = string
run_interval = string
log_retention_in_days = optional(number, 7)
})
null no
customer_prefix n/a string "fleet" no
default_sns_topic_arns n/a list(string) [] no
fleet_ecs_service_name n/a string null no
mysql_cluster_members n/a list(string) [] no
redis_cluster_members n/a list(string) [] no
sns_topic_arns_map n/a map(list(string)) {} no

Outputs

No outputs.