hyperdx/docker/clickhouse/local/init-db-e2e.sh
Vineet Ahirkar 941d045077
feat: support sample-weighted aggregations for sampled trace data (#1963)
## Problem

High-throughput services can produce millions of spans per second. Storing every span is expensive, so we run the OpenTelemetry Collector's tail-sampling processor to keep only 1-in-N spans. Each kept span carries a `SampleRate` attribute recording N.

Once data is sampled, naive aggregations are wrong: count() returns N-x fewer events than actually occurred, sum()/avg() are biased, and percentiles shift. Dashboards show misleadingly low request counts, throughput, and error rates, making capacity planning and alerting unreliable.

### Why Materialized Views Cannot Solve This Alone

A materialized view that pre-aggregates sampled spans is a useful performance optimization for known dashboard queries, but it  cannot replace a sampling-aware query engine.

**Fixed dimensions.** A materialized view pre-aggregates by a fixed set of GROUP BY keys (e.g. `ServiceName`, `SpanName`, `StatusCode`, `TimestampBucket`). Trace exploration requires slicing by arbitrary span attributes -- `http.target`, `k8s.pod.name`, custom business tags -- in combinations that cannot be predicted at view creation time. Grouping by a different dimension either requires going back to raw table or a separate materialized views for every possible dimension combination. If you try to work around the fixed-dimensions problem by adding high-cardinality span attributes to the GROUP BY, the materialized table approaches a 1:1 row ratio with the raw table. You end up doubling storage without meaningful compression.

**Fixed aggregation fields.** A typical MV only aggregates a single numeric column like `Duration`. Users want weighted aggregations over any numeric attribute: request body sizes, queue depths, retry counts, custom metrics attached to spans. Each new field requires adding more `AggregateFunction` columns and recreating the view.

**Industry precedent.** Platforms that rely solely on pre-aggregation (Datadog, Splunk, New Relic, Elastic) get accurate RED dashboards but cannot correct ad-hoc queries over sampled span data. Only query-engine weighting (Honeycomb) produces correct results for arbitrary ad-hoc queries, including weighted percentiles and heatmaps.

A better solution is making the query engine itself sampling-aware, so that all queries from dashboards, alerts, an ad-hoc searches, automatically weights by `SampleRate` regardless of which dimensions or fields the user picks. Materialized views remain a useful complement for accelerating known, fixed-dimension dashboard panels, but they are not a substitute for correct query-time weighting.

## Summary

TraceSourceSchema gets a new optional field `sampleRateExpression` - the ClickHouse expression that evaluates to the per-span sample rate (e.g. `SpanAttributes['SampleRate']`). When not configured, all queries are unchanged.
When set, the query builder rewrites SQL aggregations to weight each span by its sample rate:

  aggFn          | Before                 | After (sample-corrected)                            | Overhead
  -------------- | ---------------------- | --------------------------------------------------- | --------
  count          | count()                | sum(weight)                                         | ~1x
  count + cond   | countIf(cond)          | sumIf(weight, cond)                                 | ~1x
  avg            | avg(col)               | sum(col * weight) / sum(weight)                     | ~2x
  sum            | sum(col)               | sum(col * weight)                                   | ~1x
  quantile(p)    | quantile(p)(col)       | quantileTDigestWeighted(p)(col, toUInt32(weight))   | ~1.5x
  min/max        | unchanged              | unchanged                                           | 1x
  count_distinct | unchanged              | unchanged (cannot correct)                          | 1x

**Types**:
- Add sampleRateExpression to TraceSourceSchema + Mongoose model
- Add sampleWeightExpression to ChartConfig schema

**Query builder:**
- sampleWeightExpression is wrapped as greatest(toUInt64OrZero(toString(expr)), 1) so
spans without a SampleRate attribute default to weight 1 (unsampled
data produces identical results to the original queries).
- Rewrite aggFnExpr in renderChartConfig.ts when sampleWeightExpression
  is set, with safe default-to-1 wrapping

**Integration** (propagate sampleWeightExpression to all chart configs):
- ChartEditor/utils.ts, DBSearchPage, ServicesDashboardPage, sessions
- DBDashboardPage (raw SQL + builder branches)
- AlertPreviewChart
- SessionSubpanel
- ServiceDashboardEndpointPerformanceChart
- ServiceDashboardSlowestEventsTile (p95 query + events table)
- ServiceDashboardEndpointSidePanel (error rate + throughput)
- ServiceDashboardDbQuerySidePanel (total query time + throughput)
- External API v2 charts, AI controller, alerts (index + template)

**UI**:
- Add Sample Rate Expression field to trace source admin form

### Screenshots or video



| Before | After |
| :----- | :---- |
|        |       |

### How to test locally or on Vercel



1.
2.
3.

### References



- Linear Issue:
- Related PRs:
2026-03-30 19:52:18 +00:00

210 lines
12 KiB
Bash
Executable file

#!/bin/bash
set -e
# E2E-specific database initialization script
# Creates tables with e2e_ prefix to avoid collision with local dev data
# We don't have a JSON schema yet, so let's let the collector create the tables
if [ "$BETA_CH_OTEL_JSON_SCHEMA_ENABLED" = "true" ]; then
exit 0
fi
DATABASE=${HYPERDX_OTEL_EXPORTER_CLICKHOUSE_DATABASE:-default}
clickhouse client -n <<EOFSQL
CREATE DATABASE IF NOT EXISTS ${DATABASE};
CREATE TABLE IF NOT EXISTS ${DATABASE}.e2e_otel_logs
(
\`Timestamp\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`TimestampTime\` DateTime DEFAULT toDateTime(Timestamp),
\`TraceId\` String CODEC(ZSTD(1)),
\`SpanId\` String CODEC(ZSTD(1)),
\`TraceFlags\` UInt8,
\`SeverityText\` LowCardinality(String) CODEC(ZSTD(1)),
\`SeverityNumber\` UInt8,
\`ServiceName\` LowCardinality(String) CODEC(ZSTD(1)),
\`Body\` String CODEC(ZSTD(1)),
\`ResourceSchemaUrl\` LowCardinality(String) CODEC(ZSTD(1)),
\`ResourceAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`ScopeSchemaUrl\` LowCardinality(String) CODEC(ZSTD(1)),
\`ScopeName\` String CODEC(ZSTD(1)),
\`ScopeVersion\` LowCardinality(String) CODEC(ZSTD(1)),
\`ScopeAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`LogAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`__hdx_materialized_k8s.cluster.name\` LowCardinality(String) MATERIALIZED ResourceAttributes['k8s.cluster.name'] CODEC(ZSTD(1)),
\`__hdx_materialized_k8s.container.name\` LowCardinality(String) MATERIALIZED ResourceAttributes['k8s.container.name'] CODEC(ZSTD(1)),
\`__hdx_materialized_k8s.deployment.name\` LowCardinality(String) MATERIALIZED ResourceAttributes['k8s.deployment.name'] CODEC(ZSTD(1)),
\`__hdx_materialized_k8s.namespace.name\` LowCardinality(String) MATERIALIZED ResourceAttributes['k8s.namespace.name'] CODEC(ZSTD(1)),
\`__hdx_materialized_k8s.node.name\` LowCardinality(String) MATERIALIZED ResourceAttributes['k8s.node.name'] CODEC(ZSTD(1)),
\`__hdx_materialized_k8s.pod.name\` LowCardinality(String) MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
\`__hdx_materialized_k8s.pod.uid\` LowCardinality(String) MATERIALIZED ResourceAttributes['k8s.pod.uid'] CODEC(ZSTD(1)),
\`__hdx_materialized_deployment.environment.name\` LowCardinality(String) MATERIALIZED ResourceAttributes['deployment.environment.name'] CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = MergeTree
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (ServiceName, TimestampTime)
ORDER BY (ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(30)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1;
CREATE TABLE IF NOT EXISTS ${DATABASE}.e2e_otel_traces
(
\`Timestamp\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`TraceId\` String CODEC(ZSTD(1)),
\`SpanId\` String CODEC(ZSTD(1)),
\`ParentSpanId\` String CODEC(ZSTD(1)),
\`TraceState\` String CODEC(ZSTD(1)),
\`SpanName\` LowCardinality(String) CODEC(ZSTD(1)),
\`SpanKind\` LowCardinality(String) CODEC(ZSTD(1)),
\`ServiceName\` LowCardinality(String) CODEC(ZSTD(1)),
\`ResourceAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`ScopeName\` String CODEC(ZSTD(1)),
\`ScopeVersion\` String CODEC(ZSTD(1)),
\`SpanAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`Duration\` UInt64 CODEC(ZSTD(1)),
\`StatusCode\` LowCardinality(String) CODEC(ZSTD(1)),
\`StatusMessage\` String CODEC(ZSTD(1)),
\`Events.Timestamp\` Array(DateTime64(9)) CODEC(ZSTD(1)),
\`Events.Name\` Array(LowCardinality(String)) CODEC(ZSTD(1)),
\`Events.Attributes\` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
\`Links.TraceId\` Array(String) CODEC(ZSTD(1)),
\`Links.SpanId\` Array(String) CODEC(ZSTD(1)),
\`Links.TraceState\` Array(String) CODEC(ZSTD(1)),
\`Links.Attributes\` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
\`__hdx_materialized_rum.sessionId\` String MATERIALIZED ResourceAttributes['rum.sessionId'] CODEC(ZSTD(1)),
\`SampleRate\` UInt64 MATERIALIZED greatest(toUInt64OrZero(SpanAttributes['SampleRate']), 1) CODEC(T64, ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_rum_session_id __hdx_materialized_rum.sessionId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_duration Duration TYPE minmax GRANULARITY 1,
INDEX idx_lower_span_name lower(SpanName) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = MergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toDateTime(Timestamp))
TTL toDate(Timestamp) + toIntervalDay(30)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1;
CREATE TABLE ${DATABASE}.e2e_hyperdx_sessions
(
\`Timestamp\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`TimestampTime\` DateTime DEFAULT toDateTime(Timestamp),
\`TraceId\` String CODEC(ZSTD(1)),
\`SpanId\` String CODEC(ZSTD(1)),
\`TraceFlags\` UInt8,
\`SeverityText\` LowCardinality(String) CODEC(ZSTD(1)),
\`SeverityNumber\` UInt8,
\`ServiceName\` LowCardinality(String) CODEC(ZSTD(1)),
\`Body\` String CODEC(ZSTD(1)),
\`ResourceSchemaUrl\` LowCardinality(String) CODEC(ZSTD(1)),
\`ResourceAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`ScopeSchemaUrl\` LowCardinality(String) CODEC(ZSTD(1)),
\`ScopeName\` String CODEC(ZSTD(1)),
\`ScopeVersion\` LowCardinality(String) CODEC(ZSTD(1)),
\`ScopeAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`LogAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`__hdx_materialized_rum.sessionId\` String MATERIALIZED ResourceAttributes['rum.sessionId'] CODEC(ZSTD(1)),
\`__hdx_materialized_type\` LowCardinality(String) MATERIALIZED toString(simpleJSONExtractInt(Body, 'type')) CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_rum_session_id __hdx_materialized_rum.sessionId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = MergeTree
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (ServiceName, TimestampTime)
ORDER BY (ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(30)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1;
CREATE TABLE IF NOT EXISTS ${DATABASE}.e2e_otel_metrics_gauge
(
\`ResourceAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`ResourceSchemaUrl\` String CODEC(ZSTD(1)),
\`ScopeName\` String CODEC(ZSTD(1)),
\`ScopeVersion\` String CODEC(ZSTD(1)),
\`ScopeAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`ScopeDroppedAttrCount\` UInt32 CODEC(ZSTD(1)),
\`ScopeSchemaUrl\` String CODEC(ZSTD(1)),
\`ServiceName\` LowCardinality(String) CODEC(ZSTD(1)),
\`MetricName\` String CODEC(ZSTD(1)),
\`MetricDescription\` String CODEC(ZSTD(1)),
\`MetricUnit\` String CODEC(ZSTD(1)),
\`Attributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`StartTimeUnix\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`TimeUnix\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`Value\` Float64 CODEC(ZSTD(1)),
\`Flags\` UInt32 CODEC(ZSTD(1)),
\`Exemplars.FilteredAttributes\` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
\`Exemplars.TimeUnix\` Array(DateTime64(9)) CODEC(ZSTD(1)),
\`Exemplars.Value\` Array(Float64) CODEC(ZSTD(1)),
\`Exemplars.SpanId\` Array(String) CODEC(ZSTD(1)),
\`Exemplars.TraceId\` Array(String) CODEC(ZSTD(1)),
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_attr_key mapKeys(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_attr_value mapValues(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(TimeUnix)
ORDER BY (ServiceName, MetricName, Attributes, toUnixTimestamp64Nano(TimeUnix))
TTL toDate(TimeUnix) + toIntervalDay(30)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1;
CREATE TABLE IF NOT EXISTS ${DATABASE}.e2e_otel_metrics_sum
(
\`ResourceAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`ResourceSchemaUrl\` String CODEC(ZSTD(1)),
\`ScopeName\` String CODEC(ZSTD(1)),
\`ScopeVersion\` String CODEC(ZSTD(1)),
\`ScopeAttributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`ScopeDroppedAttrCount\` UInt32 CODEC(ZSTD(1)),
\`ScopeSchemaUrl\` String CODEC(ZSTD(1)),
\`ServiceName\` LowCardinality(String) CODEC(ZSTD(1)),
\`MetricName\` String CODEC(ZSTD(1)),
\`MetricDescription\` String CODEC(ZSTD(1)),
\`MetricUnit\` String CODEC(ZSTD(1)),
\`Attributes\` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
\`StartTimeUnix\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`TimeUnix\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`Value\` Float64 CODEC(ZSTD(1)),
\`Flags\` UInt32 CODEC(ZSTD(1)),
\`AggregationTemporality\` Int32 CODEC(ZSTD(1)),
\`IsMonotonic\` Bool CODEC(Delta(1), ZSTD(1)),
\`Exemplars.FilteredAttributes\` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
\`Exemplars.TimeUnix\` Array(DateTime64(9)) CODEC(ZSTD(1)),
\`Exemplars.Value\` Array(Float64) CODEC(ZSTD(1)),
\`Exemplars.SpanId\` Array(String) CODEC(ZSTD(1)),
\`Exemplars.TraceId\` Array(String) CODEC(ZSTD(1)),
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_attr_key mapKeys(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_attr_value mapValues(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(TimeUnix)
ORDER BY (ServiceName, MetricName, Attributes, toUnixTimestamp64Nano(TimeUnix))
TTL toDate(TimeUnix) + toIntervalDay(30)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
EOFSQL