mirror of
https://github.com/ToolJet/ToolJet
synced 2026-04-23 14:38:00 +00:00
267 lines
9.3 KiB
Markdown
267 lines
9.3 KiB
Markdown
---
|
|
id: observability-otel
|
|
title: Observability
|
|
---
|
|
|
|
ToolJet supports OpenTelemetry (OTEL) for comprehensive observability, enabling you to monitor application performance, track query executions, and analyze system health through metrics.
|
|
|
|
**Categories of Metrics**
|
|
|
|
1. **App-Based Metrics** - Monitor the performance and reliability of individual ToolJet applications. These metrics include detailed labels such as `app_name`, `query_name`, `environment`, `query_text`, and `query_mode` (SQL/GUI) for fine-grained analysis.
|
|
- **Query Executions:** Track total query executions per application
|
|
- **Query Duration:** Measure query execution times with histogram buckets
|
|
- **Query Failures:** Monitor failed queries with error categorization
|
|
- **Success Rates:** Application-level success rate percentages
|
|
- **App Usage:** Track application access and interaction events <br/> <br/>
|
|
2. **Platform-Based Metrics** - Monitor the overall health and performance of your ToolJet instance:
|
|
- **HTTP Server Metrics:** Request rates, response times, status codes
|
|
- **API Performance:** Endpoint-specific latency and throughput
|
|
- **Database Operations:** Query execution times and connection health
|
|
- **Node.js Runtime:** Event loop delays, garbage collection, memory usage
|
|
- **V8 Memory:** Heap usage and external memory tracking
|
|
|
|
## Configuration
|
|
|
|
Enable OpenTelemetry by setting the following environment variables in your ToolJet deployment:
|
|
|
|
#### Required Variables
|
|
|
|
```js
|
|
# Enable OpenTelemetry metrics collection
|
|
ENABLE_OTEL=true
|
|
```
|
|
|
|
#### Optional Variables
|
|
|
|
```js
|
|
# OTLP Endpoint Configuration
|
|
OTEL_EXPORTER_OTLP_TRACES=http://localhost:4318/v1/traces
|
|
OTEL_EXPORTER_OTLP_METRICS=http://localhost:4318/v1/metrics
|
|
|
|
# Service Identification
|
|
OTEL_SERVICE_NAME=tooljet
|
|
|
|
# Authentication (if required by your OTEL collector)
|
|
OTEL_EXPORTER_OTLP_HEADERS=api-key=your-api-key
|
|
|
|
# Advanced Configuration
|
|
OTEL_LOG_LEVEL=debug # Enable debug logging for OTEL
|
|
OTEL_ACTIVE_USER_WINDOW_MINUTES=5 # Activity window for concurrent user tracking (default: 5)
|
|
OTEL_MAX_TRACKED_USERS=10000 # Maximum tracked users/sessions (default: 10000)
|
|
|
|
# WARNING: High Cardinality - Only enable for debugging
|
|
OTEL_INCLUDE_QUERY_TEXT=false # Include actual query text in metrics (default: false)
|
|
# Creates HIGH CARDINALITY - use OTEL Collector to drop in production
|
|
```
|
|
|
|
For a complete list of OpenTelemetry environment variables, refer to the [OpenTelemetry documentation](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/).
|
|
|
|
## Setup Examples
|
|
|
|
### Local OTEL Collector
|
|
|
|
Deploy an OpenTelemetry Collector alongside ToolJet to receive and forward metrics:
|
|
|
|
```yaml
|
|
# docker-compose.yml excerpt
|
|
otel-collector:
|
|
image: otel/opentelemetry-collector-contrib:latest
|
|
command: ["--config=/etc/otel-collector-config.yaml"]
|
|
volumes:
|
|
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
|
|
ports:
|
|
- "4318:4318" # OTLP HTTP receiver
|
|
- "8889:8889" # Prometheus exporter
|
|
```
|
|
|
|
### Grafana Cloud
|
|
|
|
Configure ToolJet to send metrics directly to Grafana Cloud:
|
|
|
|
```bash
|
|
ENABLE_OTEL=true
|
|
OTEL_EXPORTER_OTLP_TRACES=https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/traces
|
|
OTEL_EXPORTER_OTLP_METRICS=https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/metrics
|
|
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64-encoded-credentials>
|
|
OTEL_SERVICE_NAME=tooljet-production
|
|
```
|
|
|
|
### Datadog
|
|
|
|
Send metrics to Datadog using the OTLP endpoint:
|
|
|
|
```bash
|
|
ENABLE_OTEL=true
|
|
OTEL_EXPORTER_OTLP_TRACES=https://api.datadoghq.com/v1/traces
|
|
OTEL_EXPORTER_OTLP_METRICS=https://api.datadoghq.com/v1/metrics
|
|
OTEL_EXPORTER_OTLP_HEADERS=dd-api-key=<your-datadog-api-key>
|
|
OTEL_SERVICE_NAME=tooljet
|
|
```
|
|
|
|
### New Relic
|
|
|
|
Configure for New Relic OTLP endpoint:
|
|
|
|
```bash
|
|
ENABLE_OTEL=true
|
|
OTEL_EXPORTER_OTLP_TRACES=https://otlp.nr-data.net:4318/v1/traces
|
|
OTEL_EXPORTER_OTLP_METRICS=https://otlp.nr-data.net:4318/v1/metrics
|
|
OTEL_EXPORTER_OTLP_HEADERS=api-key=<your-newrelic-license-key>
|
|
OTEL_SERVICE_NAME=tooljet
|
|
```
|
|
|
|
## Grafana Dashboards
|
|
|
|
ToolJet provides two pre-built Grafana dashboards for visualizing metrics:
|
|
|
|
### Per-App Metrics Dashboard
|
|
|
|
Download the dashboard:
|
|
```bash
|
|
curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-app-dashboard.json
|
|
```
|
|
|
|
This dashboard focuses on application-specific metrics and includes:
|
|
|
|
- **App Overview:** Total query executions, success rate gauge, p95 latency, failure counts
|
|
- **Query Performance:** Execution rates by query, latency percentiles, data source breakdown
|
|
- **Top Queries:** Most executed queries, slowest queries (p95), most failed queries
|
|
- **Environment Filtering:** Filter by app name, environment (production/staging/development), and mode (view/edit)
|
|
|
|
The dashboard automatically extracts query text and environment names for immediate debugging without consulting logs.
|
|
|
|
### Platform Metrics Dashboard
|
|
|
|
Download the dashboard:
|
|
```bash
|
|
curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-platform-dashboard.json
|
|
```
|
|
|
|
This dashboard provides comprehensive platform monitoring:
|
|
|
|
- **System Health:** P95 response time, request rate, error rate, total requests
|
|
- **API Analytics:** Traffic distribution, top endpoints by hits, slowest endpoints
|
|
- **Performance Trends:** Multi-percentile response time analysis (P50, P95, P99)
|
|
- **Status Codes:** Success/error distribution over time
|
|
- **Database Performance:** Query execution times, connection health
|
|
- **Runtime Metrics:** Node.js event loop, GC performance, V8 memory usage
|
|
- **Distributed Tracing:** Integration with Jaeger for trace viewing
|
|
|
|
### Importing Dashboards
|
|
|
|
To import the Grafana dashboards:
|
|
|
|
1. Download the dashboard JSON files:
|
|
```bash
|
|
# Download App-Based Metrics Dashboard
|
|
curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-app-dashboard.json
|
|
|
|
# Download Platform-Based Metrics Dashboard
|
|
curl -O https://tooljet-deployments.s3.us-west-1.amazonaws.com/tooljet-platform-dashboard.json
|
|
```
|
|
2. Open Grafana and navigate to **Dashboards** → **Import**
|
|
3. Click **Upload JSON file** and select the downloaded dashboard JSON file
|
|
4. Select your Prometheus data source
|
|
5. Click **Import**
|
|
|
|
The dashboards will be immediately available with real-time data from your ToolJet instance.
|
|
|
|
## Production Considerations
|
|
|
|
### High Cardinality Warning
|
|
|
|
The app-based metrics can optionally include a `query_text` label that contains the actual SQL or query content. **By default, this is disabled** to prevent high cardinality issues.
|
|
|
|
#### Enabling Query Text (For Debugging Only)
|
|
|
|
To enable query text in metrics for debugging purposes:
|
|
|
|
```bash
|
|
OTEL_INCLUDE_QUERY_TEXT=true
|
|
```
|
|
|
|
:::warning
|
|
Enabling `query_text` creates **high cardinality time series** that can significantly impact Prometheus storage and query performance. Only enable this temporarily for debugging specific query issues.
|
|
:::
|
|
|
|
#### Production Best Practices
|
|
|
|
If you must enable `query_text` in production:
|
|
|
|
1. **Use an OTEL Collector** to drop the label before metrics reach Prometheus:
|
|
|
|
```yaml
|
|
# otel-collector-config.yaml
|
|
processors:
|
|
attributes:
|
|
actions:
|
|
- key: query_text
|
|
action: delete
|
|
|
|
service:
|
|
pipelines:
|
|
metrics:
|
|
receivers: [otlp]
|
|
processors: [attributes]
|
|
exporters: [prometheus]
|
|
```
|
|
|
|
2. **Alternative: Hash the query text** to reduce cardinality:
|
|
|
|
```yaml
|
|
processors:
|
|
transform:
|
|
metric_statements:
|
|
- context: datapoint
|
|
statements:
|
|
- set(attributes["query_text"], SHA256(attributes["query_text"]))
|
|
```
|
|
|
|
### Performance Impact
|
|
|
|
OpenTelemetry metrics collection has minimal performance impact:
|
|
|
|
- Metric collection is asynchronous and non-blocking
|
|
- Histogram buckets are pre-configured for optimal performance
|
|
- Observable gauges (like success rates) are updated on a 15-minute interval
|
|
|
|
### Sampling and Filtering
|
|
|
|
For high-volume deployments, consider:
|
|
|
|
- **Filtering environments:** Only collect metrics from production environments
|
|
- **Sampling queries:** Use OTEL Collector sampling for high-frequency queries
|
|
- **Aggregation:** Pre-aggregate metrics at the collector level before storage
|
|
|
|
## Troubleshooting
|
|
|
|
### Metrics Not Appearing
|
|
|
|
1. Verify `ENABLE_OTEL=true` is set
|
|
2. Check OTEL collector endpoint is reachable:
|
|
```bash
|
|
curl http://localhost:4318/v1/metrics
|
|
curl http://localhost:4318/v1/traces
|
|
```
|
|
3. Review ToolJet server logs for OTEL connection errors
|
|
4. Verify OTEL collector configuration and Prometheus scrape targets
|
|
|
|
### High Memory Usage
|
|
|
|
If you experience high memory usage:
|
|
|
|
1. Remove high-cardinality labels like `query_text` using OTEL Collector processors
|
|
2. Reduce histogram bucket counts if needed
|
|
3. Implement metric filtering at the collector level
|
|
4. Consider using remote write to offload storage
|
|
|
|
### Missing Labels or Metrics
|
|
|
|
Ensure you're using ToolJet version 3.16.0-LTS or higher, which includes the full OTEL implementation with both app-based and platform-based metrics.
|
|
|
|
## Additional Resources
|
|
|
|
- [OpenTelemetry Documentation](https://opentelemetry.io/docs/)
|
|
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/)
|
|
- [Grafana OTLP Integration](https://grafana.com/docs/grafana-cloud/monitor-infrastructure/otlp/)
|
|
- [Prometheus OTLP Receiver](https://prometheus.io/docs/prometheus/latest/feature_flags/#otlp-receiver)
|