docs(router): document telemetry (#7658)

This commit is contained in:
Kamil Kisiela 2026-02-11 14:49:00 +01:00 committed by GitHub
parent 6d7cf29112
commit 39432a7aa0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 504 additions and 164 deletions

View file

@ -30,7 +30,7 @@ that explains how to use that feature.
- [`supergraph`](./configuration/supergraph): Tell the router where to find your supergraph schema.
- [`traffic_shaping`](./configuration/traffic_shaping): Manage connection pooling and request
handling to subgraphs.
- [`usage_reporting`](./configuration/usage_reporting): Configure usage reporting to Hive Console.
- [`telemetry`](./configuration/telemetry): Configure tracing and usage reporting to Hive Console.
- [`introspection`](./configuration/introspection): Enable and disable introspection queries for
added security.
- [`limits`](./configuration/limits): Set limits on operation cost, depth, and other factors to

View file

@ -0,0 +1,238 @@
---
title: 'telemetry'
---
# telemetry
The `telemetry` configuration controls client identification, Hive reporting, and OpenTelemetry
tracing behavior in Hive Router.
## client_identification
Configure how Hive Router identifies calling clients in telemetry, based on request headers.
| Field | Type | Notes |
| ---------------- | -------- | ------------------------------------------------------------ |
| `name_header` | `string` | HTTP header used to read client name for usage reporting. |
| `version_header` | `string` | HTTP header used to read client version for usage reporting. |
```yaml filename="router.config.yaml"
telemetry:
client_identification:
name_header: graphql-client-name # default
version_header: graphql-client-version # default
```
## resource
Attach OpenTelemetry resource attributes that describe this router instance, such as service name,
version, or environment.
| Field | Type | Default | Notes |
| ------------ | -------- | ------- | -------------------------------------------- |
| `attributes` | `object` | `{}` | Additional OpenTelemetry resource attributes |
```yaml filename="router.config.yaml"
telemetry:
resource:
attributes:
service.name:
expression: env("SERVICE_NAME")
service.version: 1.0.0
```
## hive
Hive-specific telemetry options.
<details>
<summary>Show hive configuration</summary>
| Field | Type | Notes |
| -------- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `token` | `StringOrExpression` | Your [Registry Access Token](/docs/management/targets#registry-access-tokens) with write permission.<br />You can also set `HIVE_ACCESS_TOKEN`. |
| `target` | `StringOrExpression` | Target ID as slug (`the-guild/graphql-hive/staging`) or UUID (`a0f4c605-6541-4350-8cfe-b31f21a4bf80`).<br />You can also set `HIVE_TARGET`. |
<div id="usage_reporting" style={{marginTop: 10}}>
<details>
<summary>usage_reporting</summary>
Allows you to control how the Hive Router does
[usage reporting](/docs/schema-registry/usage-reporting) to Hive Console.
> For additional information about the usage reporting process in Hive Router, see the
> [Usage Reporting page](/docs/router/observability/usage_reporting).
| Field | Type | Default | Notes |
| ---------------------- | ---------- | ------------------------------------ | ------------------------------------------------------------- |
| `enabled` | `boolean` | `false` | Explicitly enable or disable usage reporting. |
| `endpoint` | `string` | `https://app.graphql-hive.com/usage` | Override for self-hosted Hive. |
| `sample_rate` | `string` | `100%` | Percentage between `0%` and `100%`. |
| `exclude` | `string[]` | `[]` | Operation names to ignore (for example `IntrospectionQuery`). |
| `buffer_size` | `integer` | `1000` | Buffer size before flush. |
| `accept_invalid_certs` | `boolean` | `false` | Accept invalid SSL certificates for usage reporting. |
| `connect_timeout` | `string` | `5s` | Timeout for connect phase only. |
| `request_timeout` | `string` | `15s` | Timeout for the full request. |
| `flush_interval` | `string` | `5s` | Buffer flush interval. |
```yaml filename="router.config.yaml"
telemetry:
hive:
usage_reporting:
enabled: true
exclude: ['IntrospectionQuery']
```
</details>
</div>
<div id="hive-tracing" style={{marginTop: 10}}>
<details>
<summary>tracing</summary>
This configuration object controls sending traces to Hive Console.
| Field | Type | Default | Notes |
| ----------------- | -------------------- | --------------------------------------------- | ------------------------------------------------------------- |
| `enabled` | `boolean` | `false` | If `true`, the Hive Router sends traces to Hive Console. |
| `endpoint` | `StringOrExpression` | `https://api.graphql-hive.com/otel/v1/traces` | Hive Console traces ingestion endpoint. |
| `batch_processor` | `object` | - | See [`batch_processor`](#hive-tracing-batch-processor) below. |
<div id="hive-tracing-batch-processor"></div>
Batching settings for traces sent to Hive Console:
| Field | Type | Default | Notes |
| ------------------------ | --------- | ------- | --------------------------------------------- |
| `max_traces_in_memory` | `integer` | `30000` | Maximum number of traces in memory. |
| `max_spans_per_trace` | `integer` | `1000` | Maximum spans buffered per trace. |
| `max_export_timeout` | `string` | `5s` | Maximum time to wait for batch export. |
| `max_queue_size` | `integer` | `20000` | Capacity of the internal queue before export. |
| `max_export_batch_size` | `integer` | `500` | Maximum traces per single export batch. |
| `scheduled_delay` | `string` | `5s` | Maximum delay before exporting ready traces. |
| `max_concurrent_exports` | `integer` | `1` | Maximum number of concurrent export tasks. |
```yaml filename="router.config.yaml"
telemetry:
hive:
tracing:
enabled: true
```
</details>
</div>
</details>
## tracing
Top-level OpenTelemetry tracing configuration.
<details>
<summary>Show tracing configuration</summary>
`collect` - Collection and sampling limits for spans.
| Field | Type | Default | Notes |
| -------------------------- | --------- | ------- | -------------------------------------------- |
| `max_events_per_span` | `integer` | `128` | Maximum events to record per span. |
| `max_attributes_per_span` | `integer` | `128` | Maximum attributes to record per span. |
| `max_attributes_per_event` | `integer` | `16` | Maximum attributes to record per span event. |
| `max_attributes_per_link` | `integer` | `32` | Maximum attributes to record per span link. |
| `sampling` | `number` | `1.0` | Sampling ratio between `0.0` and `1.0`. |
| `parent_based_sampler` | `boolean` | `false` | Inherit sampling decisions from parent span. |
`propagation` - Incoming and outgoing trace context propagation formats.
These settings apply to both extracting trace context from incoming requests and injecting trace
context into outgoing requests.
| Field | Type | Default | Notes |
| --------------- | --------- | ------- | ------------------------------------- |
| `trace_context` | `boolean` | `true` | Enable W3C Trace Context propagation. |
| `baggage` | `boolean` | `false` | Enable W3C Baggage propagation. |
| `b3` | `boolean` | `false` | Enable B3 propagation. |
| `jaeger` | `boolean` | `false` | Enable Jaeger propagation. |
`instrumentation` - Instrumentation behavior for spans.
| Field | Type | Default | Allowed values | Notes |
| ------------ | -------- | ---------------- | ----------------------------------------------------- | --------------------------------------------------------- |
| `spans.mode` | `string` | `spec_compliant` | `spec_compliant`, `deprecated`, `spec_and_deprecated` | Controls which semantic conventions are emitted on spans. |
<div id="exporters" style={{marginTop: 10}}>
<details>
<summary>`exporters`</summary>
List of exporters used to send traces.
Each item in this array defines one exporter instance, so you can configure multiple OTLP
destinations if needed.
This reference documents the OTLP exporter configuration.
| Field | Type | Default | Notes |
| ----------------- | -------------------- | ------- | ---------------------------------------------------------------------------- |
| `kind` | `string` | - | Exporter kind. Supported value: `otlp`. |
| `enabled` | `boolean` | `true` | Enables or disables this exporter. |
| `endpoint` | `StringOrExpression` | - | OTLP endpoint. Must be set explicitly. |
| `batch_processor` | `object` | - | See [`batch_processor`](#telemetry-tracing-exporters-batch-processor) below. |
<div id="telemetry-tracing-exporters-batch-processor"></div>
`batch_processor` settings for this exporter:
| Field | Type | Default |
| ------------------------ | --------- | ------- |
| `max_concurrent_exports` | `integer` | `1` |
| `max_export_batch_size` | `integer` | `512` |
| `max_queue_size` | `integer` | `2048` |
| `max_export_timeout` | `string` | `5s` |
| `scheduled_delay` | `string` | `5s` |
OTLP over HTTP:
| Field | Type | Value / Default | Notes |
| -------------- | -------- | --------------- | ------------------------------------------------------------- |
| `protocol` | `string` | `http` | OTLP transport protocol. |
| `http.headers` | `object` | `{}` | Map of header names to values (`string` or `{ expression }`). |
```yaml filename="router.config.yaml"
telemetry:
tracing:
exporters:
- kind: 'otlp'
enabled: true
protocol: http
http:
headers:
x-otlp-header: value
```
OTLP over gRPC:
| Field | Type | Value / Default | Notes |
| ---------------------- | -------- | --------------- | ---------------------------------------------------------------------------- |
| `protocol` | `string` | `grpc` | OTLP transport protocol. |
| `grpc.metadata` | `object` | `{}` | Map of metadata keys to values (`string` or `{ expression }`). |
| `grpc.tls.domain_name` | `string` | - | Domain name used to verify the server certificate. |
| `grpc.tls.key` | `string` | - | Path to the client private key file. |
| `grpc.tls.cert` | `string` | - | Path to the client certificate file (PEM). |
| `grpc.tls.ca` | `string` | - | Path to the CA certificate file (PEM) used to verify the server certificate. |
```yaml filename="router.config.yaml"
telemetry:
tracing:
exporters:
- kind: 'otlp'
enabled: true
protocol: grpc
grpc:
metadata:
x-api-key: key
```
</details>
</div>
</details>

View file

@ -1,113 +0,0 @@
---
title: 'usage_reporting'
---
# usage_reporting
The `usage_reporting` configuration object allows you to control over how the Hive Router does
[usage reporting](../../schema-registry/usage-reporting) to Hive Console.
> For additional information about the usage reporting process in Hive Router, see the
> [Usage Reporting page](../observability/usage_reporting).
## Options
### `access_token`
- **Type:** `string`
Your
[Registry Access Token](https://the-guild.dev/graphql/hive/docs/management/targets#registry-access-tokens)
with write permission.
Alternatively, you can set the `HIVE_ACCESS_TOKEN` environment variable to provide the token.
### `target_id`
- **Type:** `string`
A target ID, this can either be a slug following the format
`$organizationSlug/$projectSlug/$targetSlug` (e.g `the-guild/graphql-hive/staging`) or an UUID (e.g.
`a0f4c605-6541-4350-8cfe-b31f21a4bf80`). To be used when the token is configured with an
organization access token.
Alternatively, you can set the `HIVE_TARGET` environment variable to provide the target ID.
### `endpoint`
- **Type:** `string`
- **Default:** `https://app.graphql-hive.com/usage`
For self-hosting, you can override `/usage` endpoint of your Hive instance.
### `sample_rate`
- **Type:** `string`
- **Default:** `100%`
A percentage value between `0%` and `100%` that indicates the percentage of requests to be reported.
For example, a value of `10%` means that approximately 10% of requests will be reported, while a
value of `100%` means that all requests will be reported.
### `exclude`
- **Type:** `string[]`
- **Default:** `[]`
A list of operations (by name) to be ignored by Hive. For example, if you want to exclude
introspection queries, you can add `IntrospectionQuery` to this list.
### `client_name_header`
- **Type:** `string`
- **Default:** `graphql-client-name`
The name of the HTTP header from which to read the client name for usage reporting. This is useful
if you want to identify different clients consuming your GraphQL API.
### `client_version_header`
- **Type:** `string`
- **Default:** `graphql-client-version`
The name of the HTTP header from which to read the client version for usage reporting. This is
useful if you want to identify different versions of clients consuming your GraphQL API.
### `buffer_size`
- **Type:** `integer`
- **Default:** `1000`
A maximum number of operations to hold in a buffer before sending to Hive Console. When the buffer
reaches this size, it will be flushed and sent to Hive Console.
### `accept_invalid_certs`
- **Type:** `boolean`
- **Default:** `false`
If set to `true`, the Hive Router will accept invalid SSL certificates when sending usage reports.
This can be useful for self-hosted Hive instances using self-signed certificates.
### `connect_timeout`
- **Type:** `string`
- **Default:** `5s`
A timeout for only the connect phase of a request to Hive Console, in duration format (e.g., `5s`
for 5 seconds).
### `request_timeout`
- **Type:** `string`
- **Default:** `15s`
A timeout for the entire request to Hive Console, in duration format (e.g., `15s` for 15 seconds).
### `flush_interval`
- **Type:** `string`
- **Default:** `5s`
The interval in seconds at which the usage report buffer is flushed and sent to Hive Console. In
duration format (e.g., `5s` for 5 seconds).

View file

@ -22,6 +22,8 @@ production traffic without breaking a sweat. Here's what makes Hive Router a sol
it however you need, and avoid vendor lock-in.
- **Feature-complete.** Get security, traffic management, and modern federation features without
hunting for plugins or add-ons.
- **Built-in observability.** Send traces to Hive Console or OTLP-compatible backends with
OpenTelemetry support.
Want to learn more about why we built this? Check out our
[introductory blog post](https://the-guild.dev/graphql/hive/blog/welcome-hive-router).

View file

@ -1,4 +1,5 @@
export default {
probes: 'Probes',
usage_reporting: 'Usage Reporting',
tracing: 'OpenTelemetry Tracing',
};

View file

@ -0,0 +1,229 @@
---
title: 'OpenTelemetry Tracing'
---
import { Tabs } from '@theguild/components'
# OpenTelemetry Tracing
Hive Router supports distributed tracing so you can follow requests across the gateway and your
subgraphs.
This guide explains how to configure tracing in a practical, developer-friendly way: where to send
traces, how to configure OTLP, how to tune throughput, and how to debug missing traces.
## Choose your tracing destination
Hive Router supports two common tracing paths. You can send traces directly to Hive Console through
`telemetry.hive.tracing`, or you can send them to an OTLP-compatible backend through
`telemetry.tracing.exporters`.
In practice, teams already running OpenTelemetry infrastructure (Jaeger, Tempo, Datadog, Honeycomb,
and others) usually prefer OTLP because it fits into existing telemetry pipelines and backend
routing rules.
## Send traces to Hive Console
If you are already using Hive, sending traces to Console is usually the smoothest starting point. It
keeps tracing data close to schema and usage insights, so it is easier to move from "this request is
slow" to "which operation and field caused it".
To make this work, Hive Router needs two pieces of information:
[an access token](/docs/schema-registry/management/access-tokens) with permission to send traces,
and [a target](/docs/schema-registry/management/targets) reference. The target can be either a
human-readable slug (`$organizationSlug/$projectSlug/$targetSlug`) or a target UUID
(`a0f4c605-6541-4350-8cfe-b31f21a4bf80`).
With those values available as environment variables (`HIVE_TARGET` and `HIVE_ACCESS_TOKEN`), enable
Hive tracing in the config file:
```yaml filename="router.config.yaml"
telemetry:
hive:
tracing:
enabled: true
# Optional for self-hosted Hive:
# endpoint: https://api.graphql-hive.com/otel/v1/traces
```
After enabling tracing, send a few GraphQL queries through your router and open that same target's
Traces view in Hive Console. You should start seeing new traces for recent requests.
If traces do not appear, it usually means one of four things: tracing is not enabled, the token does
not have necessary permissions, the configured target reference points to a different target, or the
self-hosted endpoint is not reachable from the router runtime.
## Send traces to OTLP-compatible backends
If your observability platform already supports OTLP ingestion, Hive Router can push traces straight
to that OTLP endpoint. The destination can be an OpenTelemetry Collector or any system that natively
understands OTLP.
<Tabs items={["OTLP over HTTP", "OTLP over gRPC"]}>
<Tabs.Tab>
```yaml filename="router.config.yaml"
telemetry:
tracing:
exporters:
- kind: otlp
enabled: true
protocol: http
endpoint: https://otel-collector.example.com/v1/traces
http:
headers:
authorization:
expression: |
"Bearer " + env("OTLP_TOKEN")
```
Once configured, send normal requests through the router and check your backend for fresh traces.
</Tabs.Tab>
<Tabs.Tab>
```yaml filename="router.config.yaml"
telemetry:
tracing:
exporters:
- kind: otlp
enabled: true
protocol: grpc
endpoint: https://otel-collector.example.com:4317
grpc:
metadata:
x-api-key:
expression: env("OTEL_API_KEY")
tls:
# Optional SNI/verification override
domain_name: otel-collector.example.com
# Optional custom CA bundle
ca: /etc/certs/ca.pem
# Optional client cert for mTLS
cert: /etc/certs/client.pem
# Optional client key for mTLS
key: /etc/certs/client.key
```
If gRPC export fails, metadata credentials and TLS files are usually the first places to inspect.
</Tabs.Tab>
</Tabs>
## Production baseline
For production workloads, define a clear service identity, begin with conservative sampling rates,
and use a single primary propagation format.
```yaml filename="router.config.yaml"
telemetry:
resource:
attributes:
service.name: hive-router
service.namespace: your-platform
deployment.environment:
expression: env("ENVIRONMENT")
tracing:
collect:
# Trace about 10% of requests
sampling: 0.1
# Respect upstream sampling decisions
parent_based_sampler: true
propagation:
# Recommended default
trace_context: true
baggage: false
b3: false
jaeger: false
exporters:
- kind: otlp
enabled: true
protocol: grpc
endpoint: https://otel-collector.example.com:4317
```
This configuration is designed to be a safe, predictable starting point. It gives each deployment a
clear identity in your telemetry backend, keeps trace volume under control, and sticks to a single
propagation format.
In practice, this means you'll see enough traces to understand real production behavior without
overwhelming storage or blowing up costs.
## Batching and throughput tuning
Batching settings control how traces move from the router to your OTLP endpoint. You're able to tune
these settings to control delivery latency of traces, resilience during traffic spikes and memory
pressure on the router.
| Field | You'd usually increase this when | Tradeoff |
| ------------------------ | ----------------------------------------------------------------- | -------------------------------- |
| `max_queue_size` | Traces are dropped during traffic spikes | Higher memory usage |
| `max_export_batch_size` | You want better export throughput per flush | Potentially higher burst latency |
| `scheduled_delay` | You want fewer export calls (`higher`) or lower latency (`lower`) | Throughput vs latency |
| `max_export_timeout` | Your OTLP endpoint or network is occasionally slow | Longer waits on blocked exports |
| `max_concurrent_exports` | Your OTLP endpoint can handle more parallel uploads | Higher downstream pressure |
As a quick rule:
- if traces arrive late, lower `scheduled_delay`.
- if traces drop under burst load, increase `max_queue_size` first.
- if your OTLP collector has headroom, raise `max_concurrent_exports`.
## Propagation
Propagation settings control how trace context flows between clients, the router, and subgraphs. In
most modern OpenTelemetry setups, `trace_context` is the safest default.
You should only enable `b3` or `jaeger` when those formats are required by other components.
If clients send custom tracing headers, make sure your
[CORS configuration](/docs/router/security/cors) allows those headers through.
## Compliance with OpenTelemetry Semantic Conventions
OpenTelemetry has standardized attribute names used on spans. Those conventions ensure that
telemetry produced by different services, libraries, and vendors is consistent and understandable
across tools.
The behavior is controlled by `telemetry.tracing.instrumentation.spans.mode`, which selects which
attribute set is written to spans:
- `spec_compliant` (default) - emits only the stable attributes
- `deprecated` - emits only the deprecated attributes
- `spec_and_deprecated` - emits both stable and deprecated attributes
```yaml filename="router.config.yaml"
telemetry:
tracing:
instrumentation:
spans:
mode: spec_compliant
```
Most teams should stay on `spec_compliant`. The other modes are primarily useful when migrating
legacy dashboards that still expect deprecated attributes.
## Troubleshooting
When traces are missing or incomplete, think in layers:
- exporter setup
- sampling behavior
- propagation
- transport
If no traces appear at all, verify if the exporter is enabled, the endpoint is reachable, and
credentials are valid.
If spans show up but links are broken, propagation formats are usually misaligned between services.
If under high load, traces are delayed or dropped, then often it's a batch processor issue. In that
case [tune the batch processor settings](#batching-and-throughput-tuning) and observe.
## Configuration reference
For all options and defaults, see
[telemetry configuration reference](/docs/router/configuration/telemetry).

View file

@ -1,69 +1,52 @@
import { Callout, Cards, Tabs } from '@theguild/components'
import { Callout } from '#components/callout'
# Usage Reporting
Hive Router can send usage reports to Hive Console to provide insights into the operations being
executed against your GraphQL API. This includes details such as operation names, client
information, and field-level usage statistics.
The Hive Router can report usage metrics to the Hive schema registry, giving you
Hive Router can report usage metrics to the Hive schema registry, giving you
[insights for executed GraphQL operations](/docs/dashboard/insights), and
[field level usage information](/docs/dashboard/explorer), but also enabling
[conditional breaking changes](/docs/management/targets#conditional-breaking-changes).
## Getting Started
Before proceeding, make sure you have
[created a registry token with write permissions on the Hive dashboard](/docs/management/targets#registry-access-tokens).
[created a registry token with write permissions on the Hive dashboard](/docs/schema-registry/management/targets#registry-access-tokens).
You can either provide the usage reporting configuration via environment variables or the
`router.config.yaml` file.
Next, set both environment variables:
<Tabs items={["Environment Variables", "Configuration File"]}>
{/* Environment Variables */}
<Tabs.Tab>
- `HIVE_TARGET`: The target ID, this can either be a slug following the format
`$organizationSlug/$projectSlug/$targetSlug` (e.g `the-guild/graphql-hive/staging`) or an UUID
(e.g. `a0f4c605-6541-4350-8cfe-b31f21a4bf80`). To be used when the token is configured with an
organization access token.
- `HIVE_ACCESS_TOKEN`: Your
[Registry Access Token](https://the-guild.dev/graphql/hive/docs/management/targets#registry-access-tokens)
- `HIVE_ACCESS_TOKEN`: Your [Registry Access Token](/docs/schema-registry/management/access-tokens)
with write permission.
- `HIVE_TARGET`: Target reference, either:
- slug: `$organizationSlug/$projectSlug/$targetSlug` (for example
`the-guild/graphql-hive/staging`)
- UUID: `a0f4c605-6541-4350-8cfe-b31f21a4bf80`
```sh filename="Run Hive Router with Usage Reporting enabled."
HIVE_ACCESS_TOKEN="<hive_usage_access_token>" \
HIVE_TARGET="<hive_usage_target>" \
hive-router
```
To send usage reports, set `telemetry.hive.usage_reporting.enabled: true` in `router.config.yaml`.
</Tabs.Tab>
{/* Configuration File */}
<Tabs.Tab>
Alternatively, you can provide the usage reporting configuration via the `router.config.yaml` file.
Example configuration:
```yaml filename="router.config.yaml"
usage_reporting:
# The registry token provided by Hive Registry
token: '<hive_usage_access_token>'
# The registry target which the usage data should be reported to defaulting to process.env.HIVE_USAGE_TARGET
# This can either be a slug following the format `$organizationSlug/$projectSlug/$targetSlug` (e.g `the-guild/graphql-hive/staging`)
# or an UUID (e.g. `a0f4c605-6541-4350-8cfe-b31f21a4bf80`).
target_id: '<hive_usage_target>'
# Endpoint override for self-hosting
# endpoint: 'https://my-hive-instance.com/usage'
telemetry:
hive:
usage_reporting:
enabled: true
# Optional: override endpoint for self-hosted Hive
# endpoint: "https://my-hive/usage"
```
</Tabs.Tab>
## Client identification
</Tabs>
To identify who's calling your GraphQL API and view traffic distribution and operation volume per
client in Hive Console, set up client identification in `router.config.yaml`.
If you want to control the usage reporting to the Hive Console like `client_name_header`,
`client_version_header` or `sample_rate` etc, please look at the configuration documentation to
learn more about other options.
[See more in the configuration reference](/docs/router/configuration/usage_reporting).
```yaml filename="router.config.yaml"
telemetry:
client_identification:
name_header: 'graphql-client-name' # default value
version_header: 'graphql-client-version' # default value
```
## Configuration reference
See the [telemetry configuration reference](/docs/router/configuration/telemetry#hive) for all
options and defaults under `telemetry.hive`.