Commit graph

2 commits

Author SHA1 Message Date
Victor Lyuboslavsky
e4e3c3f9ff
Fix issues with OTEL SigNoz deployments for loadtests (#34694)
SigNoz converted from child module to standalone root module with
independent state.

  **Critical Impact**

  Deployment order is now required:
  1. Deploy infrastructure/loadtesting/terraform/signoz/ FIRST
  2. Then deploy infrastructure/loadtesting/terraform/infra/

  Communication between modules via Terraform remote state.

  **Key Configuration Changes**

  - SigNoz creates its own EKS cluster: signoz-${workspace}
- Instance type: t3.xlarge (upgraded from t3.large for resource
headroom)
  - ClickHouse disk: 200Gi (was 20Gi) with 2-day retention
  - Resource limits configured to prevent OOMKills during loadtest
  - wait_for_jobs = false to avoid Helm deployment deadlock


<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331
2025-10-23 12:49:36 -05:00
Victor Lyuboslavsky
aef9b8400c
Added terraform files for Signoz OTEL backend. (#34058)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #32331 

This PR allows us to run loadtest with SigNoz OTEL backend by adding
`-var=enable_otel=true`
SigNoz is deployed via Helm chart.

Enhancements needed (in future PR):
- put SigNoz UI behind VPN
- combine the new eks-vpc with shared fleet-vpc
- make SigNoz shared, so multiple loadtests use the same instance? (But
what about updating to it to latest version?)

Next steps:
- Enable SigNoz in Dogfood environment
- SigNoz by default [keeps 15 days of logs and
traces](https://signoz.io/docs/userguide/retention-period), which is
quite a bit. How much would that cost us and should we reduce it?

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Optional OpenTelemetry tracing with SigNoz via a new enable_otel flag.
- Conditional deployment of a SigNoz stack (managed EKS, storage,
Helm-based apps) with internal OTLP collector endpoint.
- New outputs to retrieve OTLP endpoint, cluster name, and a kubectl
configuration command.

- Documentation
  - Added guidance for deploying and using SigNoz with load testing.
  - Updated examples to include -var=enable_otel=true.

- Chores
- Introduced required providers to support Helm and Kubernetes
resources.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-10-10 21:53:04 -05:00