hyperdx/.changeset/sampling-improvements.md
Alex Fedotyev 68ef3d6f97
feat: deterministic sampling with adaptive sample size (#1849)
## Summary
Closes #1827

Replaces non-deterministic `ORDER BY rand()` with deterministic `cityHash64(SpanId)` sampling and introduces sampling configuration constants.

### What this PR does
- **Deterministic sampling**: `ORDER BY cityHash64(SpanId)` instead of `rand()` — same data always produces the same sample, so results are stable across re-renders
- **Named constants**: `SAMPLE_SIZE`, `STABLE_SAMPLE_EXPR` replace hardcoded `1000` and `'rand()'` in query configs
- **Adaptive sizing foundation**: `computeEffectiveSampleSize()` function with `MIN_SAMPLE_SIZE`/`MAX_SAMPLE_SIZE`/`SAMPLE_RATIO` constants, exported and tested (6 unit tests)

### What this PR does NOT do (follow-up)
- **Count query for adaptive sizing**: Wiring `computeEffectiveSampleSize` into the actual queries requires adding a lightweight `count()` query. This is deferred to keep this PR focused on the deterministic sampling change.
- **Dynamic column detection**: `STABLE_SAMPLE_EXPR` uses `SpanId` which is trace-specific. Event Deltas currently only renders on the traces search page where `SpanId` is always present. If the feature expands to logs/metrics, this should be parameterized per source (documented in code comment).

## Test plan
- [ ] Same data + same hover always highlights the same heatmap cells (deterministic)
- [ ] Run `npx jest src/components/__tests__/deltaChartSampling.test.ts` — 6 tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-03-05 16:52:54 +00:00

103 B

@hyperdx/app
patch

feat: deterministic sampling with adaptive sample size for Event Deltas