## Summary
- **Replace QEMU-emulated multi-platform builds with native ARM64 runners** for both `release.yml` and `release-nightly.yml`, significantly speeding up CI build times
- Each architecture (amd64/arm64) now builds in parallel on native hardware, then a manifest-merge job combines them into a multi-arch Docker tag using `docker buildx imagetools create`
- Migrate from raw Makefile `docker buildx build` commands to `docker/build-push-action@v6` for better GHA integration
## Changes
### `.github/workflows/release.yml`
- Removed QEMU setup entirely
- Replaced single `release` matrix job with per-image build+publish job pairs:
- `build-otel-collector` / `publish-otel-collector` (runners: `ubuntu-latest` / `ubuntu-latest-arm64`)
- `build-app` / `publish-app` (runners: `Large-Runner-x64-32` / `Large-Runner-ARM64-32`)
- `build-local` / `publish-local` (runners: `Large-Runner-x64-32` / `Large-Runner-ARM64-32`)
- `build-all-in-one` / `publish-all-in-one` (runners: `Large-Runner-x64-32` / `Large-Runner-ARM64-32`)
- Added `check_version` job to centralize skip-if-exists logic (replaces per-image `docker manifest inspect` in Makefile)
- Removed `check_release_app_pushed` artifact upload/download — `publish-app` now outputs `app_was_pushed` directly
- Scoped GHA build cache per image+arch (e.g. `scope=app-amd64`) to avoid collisions
- All 4 images build in parallel (8 build jobs total), then 4 manifest-merge jobs, then downstream notifications
### `.github/workflows/release-nightly.yml`
- Same native runner pattern (no skip logic since nightly always rebuilds)
- 8 build + 4 publish jobs running in parallel
- Slack failure notification and OTel trace export now depend on publish jobs
### `Makefile`
- Removed `release-*` and `release-*-nightly` targets (lines 203-361) — build logic moved into workflow YAML
- Local `build-*` targets preserved for developer use
## Architecture
Follows the same pattern as `release-ee.yml` in the EE repo:
```
check_changesets → check_version
│
┌───────────────────┼───────────────────┬───────────────────┐
v v v v
build-app(x2) build-otel(x2) build-local(x2) build-aio(x2)
│ │ │ │
publish-app publish-otel publish-local publish-aio
│ │ │ │
└─────────┬─────────┴───────────────────┴───────────────────┘
v
notify_helm_charts / notify_clickhouse_clickstack
│
otel-cicd-action
```
## Notes
- `--squash` flag dropped — it's an experimental Docker feature incompatible with `build-push-action` in multi-platform mode. `sbom` and `provenance` are preserved via action params.
- Per-arch intermediate tags (e.g. `hyperdx/hyperdx:2.21.0-amd64`) remain visible on DockerHub — this is standard practice.
- Dual DockerHub namespace tagging (`hyperdx/*` + `clickhouse/clickstack-*`) preserved.
## Sample Run
https://github.com/hyperdxio/hyperdx/actions/runs/23362835749
## Summary
- Enable multiple agents/developers to run `make dev-int` simultaneously from different git worktrees without Docker port conflicts
- Compute a deterministic port offset (0-99) from the worktree directory name via `cksum`, giving each worktree its own isolated Docker Compose project and port range
- Switch `.env.test` files to use `${HDX_CI_*:-default}` variable expansion (powered by `dotenv-expand`) so test processes connect to the correct dynamic ports
## How it works
Each worktree gets a unique **slot** derived from its directory name. All service ports are offset by that slot:
| Service | Base port | Example (slot 68) |
|-----------------|-----------|-------------------|
| ClickHouse HTTP | 18123 | 18191 |
| MongoDB | 39999 | 40067 |
| API test server | 19000 | 19068 |
| OpAMP | 14320 | 14388 |
Docker Compose project names are also unique (`int-<slot>`), isolating containers and networks.
Backward compatible — when no `HDX_CI_*` env vars are set, all ports fall back to their original defaults.
## Changes
- **Makefile**: Added `HDX_CI_SLOT` computation and dynamic project names/ports for all `dev-int` targets
- **docker-compose.ci.yml**: Ports use `${HDX_CI_*:-default}` env vars; removed unused OTel collector published port; removed hardcoded network name (auto-generated from project name)
- **packages/api/.env.test** / **packages/common-utils/.env.test**: Ports use `${HDX_CI_*:-default}` expansion syntax
- **packages/api/jest.config.js** / **packages/common-utils/jest.int.config.js**: Switched from `dotenv/config` to `dotenv-expand/config` to enable variable expansion
- **packages/api/package.json** / **packages/common-utils/package.json**: Added `dotenv-expand` devDependency
- **agent_docs/development.md**: Documented multi-agent worktree support
## Testing
Ran full Alert integration test suite (`make dev-int FILE=alerts`) — **6 test suites, 150 tests passed** on slot 68 with dynamic ports.
## Summary
- Silence noisy `console.debug` and `console.info` logs in test output across `api` and `common-utils` packages
- Add `DOTENV_CONFIG_OVERRIDE=true` to API integration test scripts so `.env.test` values take precedence
- Add shared jest setup for `common-utils` to suppress verbose console output during tests
Now the CI stdout should be much cleaner and readable (int tests especially)
## Summary
- Simplified `CLAUDE.md` integration test instructions to use `make dev-int-build` and `make dev-int FILE=<TEST_FILE_NAME>` instead of manual docker compose steps
- Added `npx nx run-many -t ci:build` to `dev-int-build` Makefile target to ensure common-utils is built before running tests
You can now prompt
```
run int tests for renderChartConfig
```
## Summary
- Fix tile alerts to support `groupBy` for Gauge/Sum metrics — each group-by value appears as its own column in the response
- Add missing `whereLanguage` to tile alert config so Lucene WHERE conditions are parsed correctly
- Replace stale fixture-based ClickHouse schema with otel-collector's canonical schema in integration tests
Ref: HDX-3576
## Summary
Addresses npm security vulnerabilities in transitive dependencies. Prefer direct dependency upgrades over broad resolutions where possible.
## Changes
**Direct upgrade:**
- **`@slack/webhook`**: `^6.1.0` → `^7.0.0` — v7 natively uses axios v1, eliminating the axios@0.21.4 SSRF/redirect vulnerabilities. Only breaking change in v7 is dropping Node <18 (we're on Node 22).
**Resolutions for transitive deps with no direct upgrade path:**
- **`fast-xml-parser`**: `^4.4.0` — fixes prototype pollution (High)
- **`systeminformation`**: `^5.24.0` — fixes command injection (High)
## Removed/Not Done
- `axios` resolution removed — covered by the `@slack/webhook` upgrade instead
- `tar` resolution removed — was a v6→v7 major jump on build-only tools (`cacache`, `node-gyp`); not present in the production image
- `glob` resolution removed — was breaking test coverage tooling (`test-exclude@6` depends on glob@^7)
## Related
Follow-up to #1731 which addressed base image vulnerabilities (Node, Go, ClickHouse).
TLDR: This PR changes playwright full-stack tests to run against a local clickhouse instance (with seeded data) instead of relying on the clickhouse demo server, which can be unpredictable at times. This workflow allows us to fully control the data to make tests more predictable.
This PR:
* Adds local CH instance to the e2e dockerfile
* Adds a schema creation script
* Adds a data seeding script
* Updates playwright config
* Updates various tests to change hardcoded fields, metrics, or areas relying on play demo data
* Updates github workflow to use the dockerfile instead of separate services
* Runs against a local clickhouse instead of the demo server
Fixes: HDX-3193
- Upgrade OTel collector-contrib and opampsupervisor from 0.136.0 to 0.145.0 to resolve Go stdlib CVEs from outdated binaries
- Pin Alpine base to 3.21 with fresh digest replacing stale alpine:latest pin
- Add HEALTHCHECK to both dev and prod stages using the health_check extension on port 13133
- Fix Makefile otel-collector build targets to use repo-root context with -f flag, matching the repo-root relative COPY paths
Followup from #1697#1698
- fix race condition when switching saved searches after changing sources
- Add e2e tests for saved search bug
- add ability to run e2e tests with UI easier
No changeset needed.. can bump off last commit
Fixes HDX-3127
Enables broader testing
Fixes: HDX-3069
To test:
- By default `make e2e` runs playwright tests with a docker compose for mongo
- To test the local-only mode, run `make e2e local=true`
- Since we manage play.hyperdx.io, I envision us running both commands on release
Closes HDX-2623
# Summary
This change improves the performance of `getKeyValues` when getting values of a JSON key.
Generally, columns that are not referenced outside of a CTE will be pruned by the query planner. For JSON however, if the outer select references one field in a JSON column, then the inner select will read (it seems) the entire JSON object.
This PR also adds integration tests for `getKeyValues` to ensure that the function generates queries that work as expected in ClickHouse.
## Performance impact (on single JSON Dashboard Filter)
- Original: 15.03s
<img width="584" height="71" alt="Screenshot 2025-10-21 at 3 28 07 PM" src="https://github.com/user-attachments/assets/184de198-cee1-4b1d-beed-ec4465d3e248" />
- Optimized: 0.443s
<img width="590" height="61" alt="Screenshot 2025-10-21 at 3 25 47 PM" src="https://github.com/user-attachments/assets/690d0ef0-15b8-47c5-9a7e-8b8f6b8f5e92" />
Ref: HDX-1976
1. Updated release-xxx commands to prevent image tag overrides
2. Updated release workflow so that notify-xxx steps won't be triggered if no new app image was pushed
1. Merge 'fullstack' and 'local' (auth + noauth) builds into a single Dockerfile
2. Introduce 'all-in-one-auth' and 'all-in-one-noauth' build stages
3. Lock `IS_LOCAL_APP_MODE` env var
4. Fix bug in ctrl-c exit with docker run
5. Enable alerts in local mode (no-auth)
6. Build `common-utils` on the fly (no longer needing pulling pkg from npm)
Ref: HDX-1709
Ref: HDX-1713
Ref: HDX-1254
Ref: HDX-1729
To match v2 product definition, we are going to release three images:
- hyperdx/hyperdx (--target=prod): app only without any other deps (clickhouse, otelcol, mongodb), used in default compose + helm deployment
- hyperdx/hyperdx-all-in-one (--target=all-in-one-auth): all-in-one build + auth
- hyperdx/hyperdx-local (--target=all-in-one-noauth): all-in-one build + no-auth
Production impacts:
- hyperdx/hyperdx: none
- hyperdx/hyperdx-all-in-one: new
- hyperdx/hyperdx-local: add server components (alerts, saved searches, dashboards)
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to v2, this PR will be updated.
⚠️⚠️⚠️⚠️⚠️⚠️
`v2` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `v2`.
⚠️⚠️⚠️⚠️⚠️⚠️
# Releases
## @hyperdx/common-utils@0.2.0-beta.3
### Patch Changes
- 092a292: fix: autocomplete for key-values complete for v2 lucene
- 2f626e1: fix: metric name filtering for some metadata
- b16c8e1: feat: compute charts ratio
- 4865ce7: Fixes the histogram query to perform quantile calculation across all data points
## @hyperdx/api@2.0.0-beta.14
### Patch Changes
- e5dfefb: Added test cases for the webhook and source routes.
- f5e9a07: chore: bump node version to v22
- Updated dependencies [092a292]
- Updated dependencies [2f626e1]
- Updated dependencies [b16c8e1]
- Updated dependencies [4865ce7]
- @hyperdx/common-utils@0.2.0-beta.3
## @hyperdx/app@2.0.0-beta.14
### Patch Changes
- 56e39dc: 36c3edc fix: remove several source change forms throughout the log drawer
- 092a292: fix: autocomplete for key-values complete for v2 lucene
- 2f626e1: fix: metric name filtering for some metadata
- f5e9a07: chore: bump node version to v22
- b16c8e1: feat: compute charts ratio
- 08009ac: feat: add saved filters for searches
- db761ba: fix: remove originalWhere tag from view. not used anyways
- 8c95b9e: Add search history
- Updated dependencies [092a292]
- Updated dependencies [2f626e1]
- Updated dependencies [b16c8e1]
- Updated dependencies [4865ce7]
- @hyperdx/common-utils@0.2.0-beta.3
Co-authored-by: Warren <5959690+wrn14897@users.noreply.github.com>
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to v2, this PR will be updated.
⚠️⚠️⚠️⚠️⚠️⚠️
`v2` is currently in **pre mode** so this branch has prereleases rather than normal releases. If you want to exit prereleases, run `changeset pre exit` on `v2`.
⚠️⚠️⚠️⚠️⚠️⚠️
# Releases
## @hyperdx/common-utils@0.2.0-beta.2
### Minor Changes
- a9dfa14: Added support to CTE rendering where you can now specify a CTE using a full chart config object instance. This CTE capability is then used to avoid the URI too long error for delta event queries.
- e002c2f: Support querying a sum metric as a value instead of a rate
### Patch Changes
- 50ce38f: Histogram metric query test cases
- 2e350e2: feat: implement logs > metrics correlation flow + introduce convertV1ChartConfigToV2
- a6fd5e3: feat: introduce k8s preset dashboard
- b9f7d32: Refactored renderWith to simplify logic and ship more tests with the changes.
- eaa6bfa: fix: transform partition_key to be the same format as others
- bd9dc18: perf: reuse existing queries promises to avoid duplicate requests
- 5db2767: Fixed CI linting and UI release task.
- 414ff92: feat: export 'Connection' type
- e884d85: fix: metrics > logs correlation flow
- e5a210a: feat: support search on multi implicit fields (BETA)
## @hyperdx/app@2.0.0-beta.13
### Minor Changes
- 9579251: Stores the collapse vs expand status of the side navigation in local storage so it's carried across browser windows/sessions.
### Patch Changes
- 3be7f4d: fix: input does not overlap with language select button anymore
- 2e350e2: feat: implement logs > metrics correlation flow + introduce convertV1ChartConfigToV2
- a6fd5e3: feat: introduce k8s preset dashboard
- a9dfa14: Added support to CTE rendering where you can now specify a CTE using a full chart config object instance. This CTE capability is then used to avoid the URI too long error for delta event queries.
- 5a10ae1: fix: delete huge z-value for tooltip
- 6864836: fix: don't show ellipses on search when query is in-flight
- b99236d: fix: autocomplete options for dashboard page
- 5db2767: Fixed CI linting and UI release task.
- 2580ddd: chore: bump next to v13.5.10
- 5044083: Session Replay tab for traces is disabled unless the source is configured with a sessionId
- 6dafb87: fix: View Events not shown for multiple series; grabs where clause when single series
- decd622: fix: k8s dashboard uptime metrics + warning k8s event body
- e884d85: fix: metrics > logs correlation flow
- e5a210a: feat: support search on multi implicit fields (BETA)
- Updated dependencies [50ce38f]
- Updated dependencies [2e350e2]
- Updated dependencies [a6fd5e3]
- Updated dependencies [a9dfa14]
- Updated dependencies [e002c2f]
- Updated dependencies [b9f7d32]
- Updated dependencies [eaa6bfa]
- Updated dependencies [bd9dc18]
- Updated dependencies [5db2767]
- Updated dependencies [414ff92]
- Updated dependencies [e884d85]
- Updated dependencies [e5a210a]
- @hyperdx/common-utils@0.2.0-beta.2
## @hyperdx/api@2.0.0-beta.13
### Patch Changes
- 50ce38f: Histogram metric query test cases
- 2e350e2: feat: implement logs > metrics correlation flow + introduce convertV1ChartConfigToV2
- b9f7d32: Refactored renderWith to simplify logic and ship more tests with the changes.
- 5db2767: Fixed CI linting and UI release task.
- d326610: feat: introduce RUN_SCHEDULED_TASKS_EXTERNALLY + enable in-app task
- 414ff92: perf + fix: single clickhouse proxy middleware instance
- Updated dependencies [50ce38f]
- Updated dependencies [2e350e2]
- Updated dependencies [a6fd5e3]
- Updated dependencies [a9dfa14]
- Updated dependencies [e002c2f]
- Updated dependencies [b9f7d32]
- Updated dependencies [eaa6bfa]
- Updated dependencies [bd9dc18]
- Updated dependencies [5db2767]
- Updated dependencies [414ff92]
- Updated dependencies [e884d85]
- Updated dependencies [e5a210a]
- @hyperdx/common-utils@0.2.0-beta.2
Co-authored-by: Warren <5959690+wrn14897@users.noreply.github.com>
1. new env var RUN_SCHEDULED_TASKS_EXTERNALLY to opt out in-app task process
2. introduce new `prod-extended` build that includes mongodb process
3. GA k8s dashboard (only picking the connection)
4. bake check-alert task into fullstack app build
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to v2, this PR will be updated.
# Releases
## @hyperdx/common-utils@0.0.10
### Patch Changes
- fc4548f: feat: add alert schema + types
Co-authored-by: Warren <5959690+wrn14897@users.noreply.github.com>
For better self-hosting experience, users should be able to run
```
docker run -e MONGO_URI=xxx -p 8080:8080 hyperdx/hyperdx:2-beta
```
to spin up the project that includes the server components
The goal is to generate a page that shows what alerts have been created and what their current status is i.e. alarmed or OK, with a small historical view of alert failures/successes recently
- [x] Testing alertHistory changes
- [x] Testing alert API changes
- One of
- [ ] Making the disable button work
- [x] Omitting disable button for now
- [ ] ~Ensuring that alertHistory and alert queries are indexed~ - ticket filed in lieu
- [ ] ~converting dashboard/logview/alerthistory queries to single $in query for performance~ - ticket filed in lieu
- [x] Add explanatory text about how to create alerts (on another page!)
- [x] Comment out AppNav link for now
Co-authored-by: Warren <5959690+wrn14897@users.noreply.github.com>
fixes#120
Intended to handle the case where a nonsensical rate would be returned (e.g. massively negative) or the metric `_string_attributes` change mid-query, to either clamp to zero or filter out possible NaN being currently returned. This should create cleaner graphs for rate metrics, allowing the rate to interpolate over missing data instead of creating a gap in the graph
1. Implement an additional API route to enable users to access data from the HyperDX API using a standard bearer token authentication method
2. Setup Mongo DB migration tool