Commit graph

23 commits

Author SHA1 Message Date
Dan Hable
3bc5abbfb1
fix: reject wrapped toStartOf expressions in parseToStartOfFunction (#1876)
## Summary

- Fixes a bug where `parseToStartOfFunction` incorrectly parsed `toStartOf` expressions nested inside wrapper functions (e.g. `-toInt64(toStartOfInterval(timestamp, toIntervalMinute(15)))`), causing invalid SQL with missing arguments in the time filter WHERE clause.
- Adds a prefix guard that returns `undefined` when `toStartOf` is not the outermost function, preventing the broken optimization from being applied.

Fixes [HDX-3662](https://linear.app/clickhouse/issue/HDX-3662/invalid-sql-generated-for-tables-with-wrapped-tostartof-in-order-by)

## Test plan

- [x] Added `parseToStartOfFunction` unit tests for wrapped expressions (`-toInt64(...)`, `negate(...)`)
- [x] Added `optimizeTimestampValueExpression` unit test for primary key with wrapped `toStartOf`
- [x] Added `timeFilterExpr` unit test verifying no broken compound filter is generated
- [x] Added `renderChartConfig` snapshot test against the full input schema from the bug report
- [x] All 637 unit tests pass


Made with [Cursor](https://cursor.com)
2026-03-10 19:00:08 +00:00
Drew Davis
fd9f290e2a
feat: Add query params, sorting, and placeholders to Raw-SQL tables (#1857)
Closes HDX-3580
Closes HDX-3034

## Summary

This PR enhances the support for SQL-driven tables

1. The selected date range can now be referenced in the query with query params. The available query params are outlined in a new section above the SQL input.
2. The table no longer OOMs on large result sets (it is now truncated to the first 10k results), or crashes when selecting columns that are Map or JSON type
3. The table can now be sorted client-side for sql-driven tables
4. There is now placeholder / example SQL in the input

### Screenshots or video

https://github.com/user-attachments/assets/4f39fd0a-d33e-4f8c-9e91-84143d23e293

### How to test locally or on Vercel

This feature can be tested locally or on the preview environment without any special toggles.

### References



- Linear Issue: HDX-3580
- Related PRs:
2026-03-06 16:05:34 +00:00
Drew Davis
32f1189a7d
feat: Add RawSqlChartConfig types for SQL-based Table (#1846)
## Summary



This PR is the first step towards raw SQL-driven charts. 
- It introduces updated ChartConfig types, which are now unions of `BuilderChartConfig` (which is unchanged from the current `ChartConfig` types` and `RawSqlChartConfig` types which represent sql-driven charts. 
- It adds _very basic_ support for SQL-driven tables in the Chart Explorer and Dashboard pages. This is currently behind a feature toggle and enabled only in preview environments and for local development.

The changes in most of the files in this PR are either type updates or the addition of type guards to handle the new ChartConfig union type. 

The DBEditTimeChartForm has been updated significantly to (a) add the Raw SQL option to the table chart editor and (b) handle conversion from internal form state (which can now include properties from either branch of the ChartConfig union) to valid SavedChartConfigs (which may only include properties from one branch).

Significant changes are in:
- packages/app/src/components/ChartEditor/types.ts
- packages/app/src/components/ChartEditor/RawSqlChartEditor.tsx
- packages/app/src/components/ChartEditor/utils.ts
- packages/app/src/components/DBEditTimeChartForm.tsx
- packages/app/src/components/DBTableChart.tsx
- packages/app/src/components/SQLEditor.tsx
- packages/app/src/hooks/useOffsetPaginatedQuery.tsx

Future PRs will add templating to the Raw SQL driven charts for date range and granularity injection; support for other chart types driven by SQL; improved placeholder, validation, and error states; and improved support in the external API and import/export.

### Screenshots or video

https://github.com/user-attachments/assets/008579cc-ef3c-496e-9899-88bbb21eaa5e

### How to test locally or on Vercel

The SQL-driven table can be tested in the preview environment or locally. 

### References



- Linear Issue: HDX-3580
- Related PRs:
2026-03-05 20:30:58 +00:00
Drew Davis
4a85617320
feat: Add hasAllTokens for text index support (#1637)
Closes HDX-3245

# Summary

This PR updates the Lucene to SQL compilation process to generate conditions using `hasAllTokens` when the target column has a text index defined.

`hasAllTokens` has a couple of limitations which are solved for:

1. The `needle` argument must be no more than 64 tokens, or `hasAllTokens` will error. To support search terms with more than 64 tokens, terms are first broken up into batches of 50 tokens, each batch is passed to a separate `hasAllTokens` call. When multiple `hasAllTokens` calls are used, we also use substring matching `lower(Body) LIKE '%term with many tokens...%'`.
2. `hasAllTokens` may only be used when `enable_full_text_index = 1`.  The existence of a text index does not guarantee that `enable_full_text_index = 1`, since the text index could have been created with a query that explicitly specified `SETTINGS enable_full_text_index = 1`. We cannot set this option in every query HyperDX makes, because the setting was not available prior to v25.12. To solve for this, we check the value of `enable_full_text_index` in `system.settings`, and only use `hasAllTokens` if the setting exists and is enabled.

## Testing Setup

### Enable Full Text Index

First, make sure you're running at least ClickHouse 25.12.

Then, update the ClickHouse `users.xml`'s default profile with the following (or otherwise update your user's profile):
```xml
<clickhouse>
    <profiles>
        <default>
            ...
            <enable_full_text_index>1</enable_full_text_index>
        </default>
    </profiles>
    ...
<clickhouse>
```

### Add a Full Text Index

```sql
ALTER TABLE otel_logs ADD INDEX text_idx(Body) 
	TYPE text(tokenizer=splitByNonAlpha, preprocessor=lower(Body))
	SETTINGS enable_full_text_index=1;

ALTER TABLE otel_logs MATERIALIZE INDEX text_idx;
```

## Limitations

1. We currently only support the `splitByNonAlpha` tokenizer. If the text index is created with a different tokenizer, `hasAllTokens` will not be used. If needed, this limitation can be removed in the future by implementing `tokenizeTerm`, `termContainsSeparators`, and token batching logic specific to the other tokenizers.
2. This requires the latest (Beta) version of the full text index and related setting, available in ClickHouse v25.12.
2026-01-22 19:50:08 +00:00
Karl Power
bc8c4eec9a
feat: allow applying session settings to queries (#1609)
Closes HDX-3154

This PR adds a feature that allows the user to add settings to a source. These settings are then added to the end of every query that is rendered through the `renderChartConfig` function, along with any other chart specific settings. 

See: https://clickhouse.com/docs/sql-reference/statements/select#settings-in-select-query

Most of the work was to pass the `source` or `source.querySettings` value through the code to the `renderChartConfig` calls and to update the related tests. There are also some UI changes in the `SourceForm` components.

`SQLParser.Parser` from the `node-sql-parser` throws an error when it encounters a SETTINGS clause in a sql string, so a function was added to remove that clause from any sql that is passed to the parser. It assumes that the SETTINGS clause will always be at the end of the sql string, it removes any part of the string including and after the SETTINGS clause.


https://github.com/user-attachments/assets/7ac3b852-2c86-4431-88bc-106f982343bb
2026-01-21 16:07:30 +00:00
Mike Shi
00854da8e2
Add support for full text search on bloom_filter(tokens(..)) (#1620)
Resolves HDX-1977

Co-authored-by: Drew Davis <6097246+pulpdrew@users.noreply.github.com>
2026-01-21 14:55:02 +00:00
Sam Garfinkel
ca693c0f56
feat: add support for visualizing histogram counts (#1477)
Adds support for histogram `count` aggregations. This partially resolves https://github.com/hyperdxio/hyperdx/issues/1441, which should probably be split into a new ticket to only address `sum`.

As part of this, I also moved the translation functionality for histograms to a new file `histogram.ts` to avoid contributing even more bloat to `renderChartConfig`. Happy to revert this and move that stuff back into the file if that's preferred.

I also noticed by doing this that there was actually a SQL error in the snapshots for the tests--the existing quantile test was missing a trailing `,` after the time bucket if no group was provided https://github.com/hyperdxio/hyperdx/blob/main/packages/common-utils/src/__tests__/__snapshots__/renderChartConfig.test.ts.snap#L194 so centralizing like this is probably desirable to keep things consistent.

I also personally use webstorm so I added that stuff to the gitignore.
2025-12-23 22:56:20 +00:00
Drew Davis
a5a04aa92c
feat: Add materialized view support (Beta) (#1507)
Closes HDX-3082

# Summary

This PR back-ports support for materialized views from the EE repo. Note that this feature is in **Beta**, and is subject to significant changes.

This feature is intended to support:

1. Configuring AggregatingMergeTree (or SummingMergeTree) Materialized Views which are associated with a Source
2. Automatically selecting and querying an associated materialized view when a query supports it, in Chart Explorer, Custom Dashboards, the Services Dashboard, and the Search Page Histogram.
3. A UX for understanding what materialized views are available for a source, and whether (and why) it is or is not being used for a particular visualization.

## Note to Reviewer(s)

This is a large PR, but the code has largely already been reviewed.

- For net-new files, types, components, and utility functions, the code does not differ from the EE repo
- Changes to the various services dashboard pages do not differ from the EE repo
- Changes to `useOffsetPaginatedQuery`, `useChartConfig`, and `DBEditTimeChart` differ slightly due to unrelated (to MVs) drift between this repo and the EE repo, and due to the lack of feature toggles in this repo. **This is where slightly closer review would be most valuable.**

## Demo

<details>
<summary>Demo: MV Configuration</summary>

https://github.com/user-attachments/assets/fedf3bcf-892c-4b8d-a788-7e231e23bcc3
</details>

<details>
<summary>Demo: Chart Explorer</summary>

https://github.com/user-attachments/assets/fc8d1efa-7edc-42fc-98f0-75431cc056b8
</details>

<details>
<summary>Demo: Dashboards</summary>

https://github.com/user-attachments/assets/f3cb247e-711f-4d90-95b8-cf977e94f065
</details>

## Known Limitations

This feature is in Beta due to the following known limitations, which will be addressed in subsequent PRs:

1. Visualization start and end time, when not aligned with the granularity of MVs, will result in statistics based on the MV "time buckets" which fall inside the date range. This may not align exactly with the source table data which is in the selected date range.
2. Alerts do not make use of MVs, even if the associated visualization does. Due to (1), this means that alert values may not exactly match the values shown in the associated visualization.

## Differences in OSS vs EE Support

 - In OSS, there is a beta label on the MV configurations section
 - In EE there are feature toggles to enable MV support, in OSS the feature is enabled for all teams, but will only run for sources with MVs configured.

## Testing

To test, a couple of MVs can be created on the default `otel_traces` table, directly in ClickHouse:

<details>

<summary>Example MVs DDL</summary>

```sql
CREATE TABLE default.metrics_rollup_1m
(
    `Timestamp` DateTime,
    `ServiceName` LowCardinality(String),
    `SpanKind` LowCardinality(String),
    `StatusCode` LowCardinality(String),
    `count` SimpleAggregateFunction(sum, UInt64),
    `sum__Duration` SimpleAggregateFunction(sum, UInt64),
    `avg__Duration` AggregateFunction(avg, UInt64),
    `quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
    `min__Duration` SimpleAggregateFunction(min, UInt64),
    `max__Duration` SimpleAggregateFunction(max, UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, StatusCode, SpanKind, ServiceName);

CREATE MATERIALIZED VIEW default.metrics_rollup_1m_mv TO default.metrics_rollup_1m
(
    `Timestamp` DateTime,
    `ServiceName` LowCardinality(String),
    `SpanKind` LowCardinality(String),
    `version` LowCardinality(String),
    `StatusCode` LowCardinality(String),
    `count` UInt64,
    `sum__Duration` Int64,
    `avg__Duration` AggregateFunction(avg, UInt64),
    `quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
    `min__Duration` SimpleAggregateFunction(min, UInt64),
    `max__Duration` SimpleAggregateFunction(max, UInt64)
)
AS SELECT
    toStartOfMinute(Timestamp) AS Timestamp,
    ServiceName,
    SpanKind,
    StatusCode,
    count() AS count,
    sum(Duration) AS sum__Duration,
    avgState(Duration) AS avg__Duration,
    quantileTDigestState(0.5)(Duration) AS quantile__Duration,
    minSimpleState(Duration) AS min__Duration,
    maxSimpleState(Duration) AS max__Duration
FROM default.otel_traces
GROUP BY
    Timestamp,
    ServiceName,
    SpanKind,
    StatusCode;
```

```sql
CREATE TABLE default.span_kind_rollup_1m
(
    `Timestamp` DateTime,
    `ServiceName` LowCardinality(String),
    `SpanKind` LowCardinality(String),
    `histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, ServiceName, SpanKind);

CREATE MATERIALIZED VIEW default.span_kind_rollup_1m_mv TO default.span_kind_rollup_1m
(
    `Timestamp` DateTime,
    `ServiceName` LowCardinality(String),
    `SpanKind` LowCardinality(String),
    `histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
AS SELECT
    toStartOfMinute(Timestamp) AS Timestamp,
    ServiceName,
    SpanKind,
    histogramState(20)(Duration) AS histogram__Duration
FROM default.otel_traces
GROUP BY
    Timestamp,
    ServiceName,
    SpanKind;
```
</details>

Then you'll need to configure the materialized views in your source settings:

<details>
<summary>Source Configuration (should auto-infer when MVs are selected)</summary>

<img width="949" height="1011" alt="Screenshot 2025-12-19 at 10 26 54 AM" src="https://github.com/user-attachments/assets/fc46a1b9-de8b-4b95-a8ef-ba5fee905685" />

</details>
2025-12-19 16:17:23 +00:00
Brandon Pereira
8dee21c81a
Minor Heatmap Improvements (#1317)
Small improvements to heatmap logic:

1. Improve the logic around filtering the outliers. Previously it was hardcoded to Duration, now it will correctly use the `Value` from the user. If the value contains an aggregate function, it will also perform a CTE to properly calculate.
1. If the outliers query fails, we show the user the query error
2. We prioritize the outlier keys in the event deltas view over inliers (this was how it was before, now it only includes inliers if no outliers are found)
3. Ensure the autocomplete suggestions are displayed (there was a zindex issue)
2025-11-04 21:48:39 +00:00
Brandon Pereira
43dfb3aaff
chore to move critical path files (#1314)
moves them into a core folder, this allows us to easily track when core files are modified via path

no changeset because no version bump required

fixes HDX-2589
2025-10-30 15:16:33 +00:00
Drew Davis
2162a69039
feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576
Closes HDX-2491

# Summary

It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:

1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).

With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.

## Testing

### Testing the fix

The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.

<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>

https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>

<details>
<summary>With the fix, you should see logs in the selected time range</summary>

https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>

### Testing the optimization

The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both  `toStartOfMinute(TimestampTime)` and `TimestampTime`.

To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).

<details>
<summary>DDL for log table with optimized key</summary>

```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
    `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
    `TraceId` String CODEC(ZSTD(1)),
    `SpanId` String CODEC(ZSTD(1)),
    `TraceFlags` UInt8,
    `SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
    `SeverityNumber` UInt8,
    `ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
    `Body` String CODEC(ZSTD(1)),
    `ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
    `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
    `ScopeName` String CODEC(ZSTD(1)),
    `ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
    `ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
    INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
    INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
    INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>

Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
2025-10-27 17:20:36 +00:00
Warren
43e32aafc7
feat: revisit Otel metrics semantic convention migration logics (#1267)
Since users can still switch to the new metric name using feature gate

Follow up https://github.com/hyperdxio/hyperdx/pull/1248
2025-10-14 22:06:31 +00:00
Warren
5efa2ffa0d
feat: handle k8s metrics semantic convention updates (#1248)
Handle OpenTelemetry semantic versions based on the ScopeVersion field (metrics)
Related to [changes](https://opentelemetry.io/blog/2025/kubeletstats-receiver-metrics-deprecation/)

Old (switched to v0.137.0)
<img width="818" height="317" alt="image" src="https://github.com/user-attachments/assets/ceea52c6-ad06-4295-afae-a44f21b2e962" />

New (be able to handle multiple versions)
<img width="568" height="329" alt="image" src="https://github.com/user-attachments/assets/d2e282b2-cfd7-490a-a64d-502881a360a2" />


Ref: HDX-2322, HDX-2562
2025-10-08 21:04:40 +00:00
Drew Davis
fa45875d38
feat: Add delta() function for gauge metrics (#1147) 2025-09-11 17:10:43 -04:00
Aaron Knudtson
fa7875c427
feat: add Summary and Exponential Histogram metrics to source form (#832) 2025-05-21 13:22:04 -04:00
Dan Hable
96b8c50898
fix(metrics): fix histogram metric query (#823)
Fix the query to address issues with the value calculation as well as allow for grouping.

Ref: HDX-1726
2025-05-19 14:47:24 +00:00
Dan Hable
b9f7d32efa
refactor: clean up the chart config CTE render logic (#686)
Some additional refactoring and testing around the more complex CTE rendering.

Ref: HDX-1511
2025-03-17 14:45:26 +00:00
Dan Hable
a9dfa14930
fix: use CTE instead of listing all index parts in query (#666)
## feat: allow CTE definitions to be nested chart configs

In order to easily use a CTE for fixing large index issues with delta
trace events, this commit updates the type and `renderWith` function to
render a nested chart config.

Ref: HDX-1343

---

## fix: use CTE instead of listing all index parts in query

Instead of sending 2 queries to the DB and enumerating all of parts
and offsets in the query, this change uses a CTE to select the parts.
This reduces the size of the HTTP request, which fixes the URI too
long response.

Ref: HDX-1343
2025-03-14 13:34:47 +00:00
Dan Hable
99b60d50b2
fix: update sum metric query based on v1 integration test (#650)
Fix the sum query to produce the correct results from the min/max test case from v1.

Ref: HDX-1421
2025-03-07 07:03:03 +00:00
Warren
cd0e4fd71c
fix: correct handling of gauge metrics in renderChartConfig (#654) 2025-03-06 00:06:57 +00:00
Dan Hable
e80630c107
feat: supporting quantile histogram metrics (#635)
Additional `renderChartConfig` support to transform a histogram select into the correct SQL syntax to generate a chart. For parity with v1, this query only handles quantile queries.

<img width="1939" alt="Screenshot 2025-02-26 at 12 58 55 PM" src="https://github.com/user-attachments/assets/1126ac6c-c431-4d89-92d7-9df1e49e25cf" />

<img width="1960" alt="Screenshot 2025-02-26 at 3 11 07 PM" src="https://github.com/user-attachments/assets/e4fa09bf-1e27-4a90-ad25-6c6cb2890877" />

Ref: HDX-1339
2025-02-27 16:55:36 +00:00
Tom Alexander
521793df2d
fix: Ensure group-by works with sum metrics (#636)
Adds all available columns into the query so that we can properly apply the group by clause.

Ref: HDX-1419
2025-02-27 15:51:23 +00:00
Warren
57a6bc399f
feat: BETA metrics support (sum + gauge) (#629)
<img width="1310" alt="Screenshot 2025-02-25 at 3 43 11 PM" src="https://github.com/user-attachments/assets/38c98bc2-2ff2-412c-b26d-4ed9952439f2" />


Co-authored-by: Mike Shi <2781687+MikeShi42@users.noreply.github.com>
Co-authored-by: Dan Hable <418679+dhable@users.noreply.github.com>
Co-authored-by: Tom Alexander <3245235+teeohhem@users.noreply.github.com>
2025-02-26 00:00:48 +00:00