Closes HDX-3118
# Summary
This PR ensures that "previous period" queries are not run when `compareToPreviousPeriod` is undefined. Previously, these queries were running unnecessarily, increasing the burden on ClickHouse.
Adds support for histogram `count` aggregations. This partially resolves https://github.com/hyperdxio/hyperdx/issues/1441, which should probably be split into a new ticket to only address `sum`.
As part of this, I also moved the translation functionality for histograms to a new file `histogram.ts` to avoid contributing even more bloat to `renderChartConfig`. Happy to revert this and move that stuff back into the file if that's preferred.
I also noticed by doing this that there was actually a SQL error in the snapshots for the tests--the existing quantile test was missing a trailing `,` after the time bucket if no group was provided https://github.com/hyperdxio/hyperdx/blob/main/packages/common-utils/src/__tests__/__snapshots__/renderChartConfig.test.ts.snap#L194 so centralizing like this is probably desirable to keep things consistent.
I also personally use webstorm so I added that stuff to the gitignore.
- Sync with Upstream to avoid future conflicts
- Move WebhookSections to its own file
- Group Webhooks by Type
- Add Webhook Icons Support
- Ensure Link is used instead of Slack to represent Webhooks Generically
<img width="959" height="752" alt="Screenshot 2025-12-23 at 1 35 40 PM" src="https://github.com/user-attachments/assets/0df2d5a2-4396-415c-ba38-685d65d69836" />
Fixes HDX-2794
We removed the rotator script when we used the named pipe approach to the otel collector logging. There were some references left over that caused the docker build to fail.
This commit updates multiple components to streamline the usage of Link elements by removing the legacyBehavior and passHref props.
No functionality changes introduced.
Fixes HDX-3071
Fixes: HDX-3075
* Refactors to using Page model
* Extracts common interactions into components
* Re-writes tests to conform to new model
* Adds eslint plugin for playwright best practices
* Fixes bad lints
Note: The best practice is to not use `.waitForLoadState('networkidle')` however there are several instances where components are re-rendered completely due to underlying db queries. This causes flakiness in the tests. We will re-evaluate the best solution for this in a future ticket and remove the `networkidle` from the eslint ignore list.
Closes HDX-3082
# Summary
This PR back-ports support for materialized views from the EE repo. Note that this feature is in **Beta**, and is subject to significant changes.
This feature is intended to support:
1. Configuring AggregatingMergeTree (or SummingMergeTree) Materialized Views which are associated with a Source
2. Automatically selecting and querying an associated materialized view when a query supports it, in Chart Explorer, Custom Dashboards, the Services Dashboard, and the Search Page Histogram.
3. A UX for understanding what materialized views are available for a source, and whether (and why) it is or is not being used for a particular visualization.
## Note to Reviewer(s)
This is a large PR, but the code has largely already been reviewed.
- For net-new files, types, components, and utility functions, the code does not differ from the EE repo
- Changes to the various services dashboard pages do not differ from the EE repo
- Changes to `useOffsetPaginatedQuery`, `useChartConfig`, and `DBEditTimeChart` differ slightly due to unrelated (to MVs) drift between this repo and the EE repo, and due to the lack of feature toggles in this repo. **This is where slightly closer review would be most valuable.**
## Demo
<details>
<summary>Demo: MV Configuration</summary>
https://github.com/user-attachments/assets/fedf3bcf-892c-4b8d-a788-7e231e23bcc3
</details>
<details>
<summary>Demo: Chart Explorer</summary>
https://github.com/user-attachments/assets/fc8d1efa-7edc-42fc-98f0-75431cc056b8
</details>
<details>
<summary>Demo: Dashboards</summary>
https://github.com/user-attachments/assets/f3cb247e-711f-4d90-95b8-cf977e94f065
</details>
## Known Limitations
This feature is in Beta due to the following known limitations, which will be addressed in subsequent PRs:
1. Visualization start and end time, when not aligned with the granularity of MVs, will result in statistics based on the MV "time buckets" which fall inside the date range. This may not align exactly with the source table data which is in the selected date range.
2. Alerts do not make use of MVs, even if the associated visualization does. Due to (1), this means that alert values may not exactly match the values shown in the associated visualization.
## Differences in OSS vs EE Support
- In OSS, there is a beta label on the MV configurations section
- In EE there are feature toggles to enable MV support, in OSS the feature is enabled for all teams, but will only run for sources with MVs configured.
## Testing
To test, a couple of MVs can be created on the default `otel_traces` table, directly in ClickHouse:
<details>
<summary>Example MVs DDL</summary>
```sql
CREATE TABLE default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` SimpleAggregateFunction(sum, UInt64),
`sum__Duration` SimpleAggregateFunction(sum, UInt64),
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, StatusCode, SpanKind, ServiceName);
CREATE MATERIALIZED VIEW default.metrics_rollup_1m_mv TO default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`version` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` UInt64,
`sum__Duration` Int64,
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
StatusCode,
count() AS count,
sum(Duration) AS sum__Duration,
avgState(Duration) AS avg__Duration,
quantileTDigestState(0.5)(Duration) AS quantile__Duration,
minSimpleState(Duration) AS min__Duration,
maxSimpleState(Duration) AS max__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind,
StatusCode;
```
```sql
CREATE TABLE default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, ServiceName, SpanKind);
CREATE MATERIALIZED VIEW default.span_kind_rollup_1m_mv TO default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
histogramState(20)(Duration) AS histogram__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind;
```
</details>
Then you'll need to configure the materialized views in your source settings:
<details>
<summary>Source Configuration (should auto-infer when MVs are selected)</summary>
<img width="949" height="1011" alt="Screenshot 2025-12-19 at 10 26 54 AM" src="https://github.com/user-attachments/assets/fc46a1b9-de8b-4b95-a8ef-ba5fee905685" />
</details>
Improves the Chart Explorer page to only run the sample query when the accordion is open and visible. Also changed default to closed since it's below the fold.
@pulpdrew assigned to you as you originally observed this issue
Demo:
https://github.com/user-attachments/assets/6108323a-767f-4e9f-88cf-4b9e2de9def1
Fixes HDX-2895
Closes HDX-3015
# Summary
This PR adds custom filters to the services dashboard.
Notes:
- These filters are per-source, per-dashboard. Different sources have different schemas, so we must store them per-source to avoid invalid filters being available for some sources.
- These filters are stored in a new collection in MongoDB (PresetDashboardFilters) and accessed via a new set of CRUD APIs
- The UI is 99% re-used from the existing custom dashboard filters
## Demo
https://github.com/user-attachments/assets/82a4a55f-9b8b-46eb-be24-82254a86eed3
Enables broader testing
Fixes: HDX-3069
To test:
- By default `make e2e` runs playwright tests with a docker compose for mongo
- To test the local-only mode, run `make e2e local=true`
- Since we manage play.hyperdx.io, I envision us running both commands on release
Closes HDX-3033
# Summary
This PR fixes three bugs in the Services Dashboard
1. When using CTEs in chart configs, as we do on the HTTP and Databases tabs, there were frequent console errors as we tried to `DESCRIBE` the CTE names, to support the materialized columns optimization. With this PR, we no longer try to DESCRIBE CTEs, by skipping the materialized column optimization for configs without a `from.databaseName`.
2. Previously, the Request Throughput chart would reload whenever switching the Request Error Rate chart from `Overall` to `By Endpoint`. This was because the `displayType` in the Request Throughput chart was based on the toggle state, despite being unrelated. Now, the displayType of the Request Throughput chart is constant, eliminating the extra refetch.
3. Previously, when switching to the Services dashboard with a non-Trace Source ID in the URL params, the Services dashboard would initially be empty, then after toggling to a Trace Source, queries would briefly be issued against the non-Trace source (they would fail and/or be cancelled a moment later). Now, non-Trace sources are filtered out so that a Trace source is chosen as the default, and non-Trace sources are not queried.
4. Previously, we were spreading the entirety of `...source` into each config, which resulted in `metricTables` being in the config under particular circumstances (HDX-3035), which in turn caused errors from renderChartConfig. This has been fixed by `pick`ing only the fields we need from source.
- ignore next-env.d.ts from eslint - it's throwing errors but shouldn't be modified by the user
- suppress next upgrade logging changes, currently it logs all api calls but this isn't necessary because pino logs on the backend
This PR removes bootstrap-icons entirely from the app. It also adds an eslint plugin to detect uses and throw an error, this will help in the immediate short term with PRs in flight and merging downstream.
Fixes HDX-3050
- Multiple workflow runs can now run in parallel for different commits
- The release job (Docker builds) won't be cancelled once it starts
- New commits will queue their release jobs to run after the current one finishes (due to the concurrency group per matrix item)
Ref: HDX-3008
There were two issues with the log rotation script:
1. Logs could be lost since copying and then truncating the file might not finish before logs arrive.
2. The otel collector application will keeps the file handle and offset cached. After truncating, it will write starting at the last offset leaving the unallocated garbage in the beginning of the file. This garbage uses space.
This commit moves the file instead of copying. That allows the collector to continue writing to the rolled file until a SIGHUP is sent. This causes a config refresh, which also opens a new log file. After, the rolled file and the new log file have correct sizes.
--
**ADDITIONAL NOTES**:
Claude's code review is not accurate here.
* The alpine image is based on busybox and fuser is a command implemented by busybox. This can be verified by just running the collector and watching the log rotate behavior.
* The mv command updates the name of the file in the file system but doesn't change the inode number. A process only uses the file path the first time the file is open to resolve it into a inode number. Moving the file changes the name but doesn't change the inode number so the process will continue to write to that file.
We should be able to send most chart series queries all at once and view the results as the data comes in. This also ensures the data arrives in order.
This is only enabled it for the histogram on DBSearchPage so far.
Closes HDX-3051
Some queries benefit from being windowed, but do not require being run in series. The histogram on top of the search page is the perfect example! This PR enables that query to be run in parallel. It does essentially the same thing as the code that runs it in series, but wraps in a `Promise.all`.
View the video here and compare the speed of loading a histogram. Compare by testing play.hyperdx.io and then the published preview.
https://github.com/user-attachments/assets/d519f643-9d84-4ed0-a8b7-84570f80a58a
Closes HDX-3030
Closes HDX-3009
# Summary
This PR updates the Root Spans Only filter to support a materialized `isRootSpan` column, which is expected to be present on some trace table schemas. If that column is present, then the `Root Spans Only` filter will add a filter like 'isRootSpan IN (TRUE)` to the query, instead of the default `ParentSpanId IN ('')`. The UI has also been updated to support displaying and pinning boolean filter values.
**Note:** We still only query string type filter values, so we won't show isRootSpan unless Root Spans Only has been toggled.
<details>
<summary>I confirmed that if `isRootSpan` is in the ordering key, then this new condition will utilize the key to prune granules:</summary>
<img width="1678" height="1065" alt="Screenshot 2025-12-11 at 11 54 35 AM" src="https://github.com/user-attachments/assets/e22ae689-25a9-4d6b-b0f6-cc8f8396c35b" />
</details>
## Demo
<details>
<summary>For a source with the isRootSpan filter</summary>
https://github.com/user-attachments/assets/ccc7a890-b16e-4de6-bbb9-295fb10aa214
</details>
<details>
<summary>For a source without the isRootSpan filter (no change)</summary>
https://github.com/user-attachments/assets/33d4dd0a-136a-4284-812c-ddd12e67246e
</details>
- Improves common-utils build process so the server is ready immediately when started. Currently, when the server starts common-utils hasn't finished building, so it starts, crashes, then restarts correctly after build. Now it runs as expected the first try.
- Adds support for `.env.local` so you can easily provide secret keys without always passing it in via the CLI
- These features already exist downstream, but they seem necessary fro oss as well.