mirror of https://github.com/fleetdm/fleet synced 2026-04-21 21:47:20 +00:00

History

Victor Lyuboslavsky 0ae909fedf Updated loadtest OTEL config to match dogfood (#38991 ) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> Related issue: Resolves #36494 I tried this with loadtest.		2026-01-29 10:18:02 -06:00
..
main.tf	Adjusting OTEL resources for high throughput. (#35878 )	2025-11-18 06:53:49 -06:00
otel-collector-values.yaml	Updated loadtest OTEL config to match dogfood (#38991 )	2026-01-29 10:18:02 -06:00
outputs.tf	Added terraform files for Signoz OTEL backend. (#34058 )	2025-10-10 21:53:04 -05:00
README.md	Fix issues with OTEL SigNoz deployments for loadtests (#34694 )	2025-10-23 12:49:36 -05:00
variables.tf	Fix issues with OTEL SigNoz deployments for loadtests (#34694 )	2025-10-23 12:49:36 -05:00

README.md

SigNoz for Fleet Loadtesting

SigNoz provides OpenTelemetry tracing for Fleet loadtest environments. It's deployed as a standalone Terraform root module to ensure it's available before Fleet starts up.

Architecture

EKS Cluster: Per-workspace (e.g., signoz-victor-baseline)
Kubernetes: v1.31
Node group: 2x t3.xlarge nodes
Components:
- SigNoz UI (public LoadBalancer on port 8080)
- OTLP Collector (internal LoadBalancer on port 4317)
- ClickHouse (200Gi storage)

Deployment order

IMPORTANT: SigNoz must be deployed BEFORE the main Fleet infrastructure to capture telemetry from Fleet's initial bootup.

Deploy shared EKS VPC (one time, shared across workspaces, should already be deployed)
Deploy SigNoz (this directory)
Deploy Fleet infrastructure (../infra)

Usage

# 1. Initialize and select workspace
cd infrastructure/loadtesting/terraform/signoz
terraform init
terraform workspace new <workspace_name>  # Match your infra workspace

# 2. Deploy SigNoz
terraform apply

# 3. Wait for deployment to complete (~10-15 minutes)
# The OTLP collector endpoint will be shown in outputs

# 4. Now deploy Fleet infrastructure
cd ../infra
terraform apply

Accessing SigNoz UI

# Get the SigNoz UI URL
terraform output -raw get_signoz_ui_url | bash

# Or configure kubectl and access directly
$(terraform output -raw configure_kubectl)
kubectl get svc -n signoz signoz -o jsonpath='http://{.status.loadBalancer.ingress[0].hostname}:8080'

Managing storage and retention

IMPORTANT: ClickHouse has limited storage. To prevent running out of space:

Reduce trace retention period in the SigNoz UI:
- Navigate to Settings → Retention Period
- Lower the retention period for traces (default may be too long for loadtesting)
- Consider 1-3 days for active loadtest environments

Monitor ClickHouse storage:

# Check ClickHouse pod storage usage
kubectl exec -n signoz chi-signoz-clickhouse-cluster-0-0-0 -- df -h /var/lib/clickhouse

# Check database sizes
kubectl exec -n signoz chi-signoz-clickhouse-cluster-0-0-0 -- clickhouse-client --query "SELECT database, formatReadableSize(sum(bytes_on_disk)) AS size FROM system.parts WHERE active GROUP BY database ORDER BY sum(bytes_on_disk) DESC"

What happens when storage is full:
- ClickHouse will reject new writes
- New traces will NOT be captured
- OTEL collector will log errors about failed writes
- Fleet will continue running but traces will be lost

Outputs

The main Fleet infrastructure reads these outputs via remote state:

cluster_name: EKS cluster name
otel_collector_endpoint: Internal OTLP endpoint for Fleet to send traces
configure_kubectl: Command to configure kubectl access

Destroying

terraform destroy