feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576
Closes HDX-2491
# Summary
It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:
1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).
With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.
## Testing
### Testing the fix
The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.
<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>
https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>
<details>
<summary>With the fix, you should see logs in the selected time range</summary>
https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>
### Testing the optimization
The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both `toStartOfMinute(TimestampTime)` and `TimestampTime`.
To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).
<details>
<summary>DDL for log table with optimized key</summary>
```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`TraceFlags` UInt8,
`SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
`SeverityNumber` UInt8,
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`Body` String CODEC(ZSTD(1)),
`ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>
Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
2025-10-27 17:20:36 +00:00
import { chSql , ColumnMeta , parameterizedQueryToSql } from '@/clickhouse' ;
2025-10-30 15:16:33 +00:00
import { Metadata } from '@/core/metadata' ;
2025-02-26 00:00:48 +00:00
import {
ChartConfigWithOptDateRange ,
DisplayType ,
MetricsDataType ,
} from '@/types' ;
2025-10-30 15:16:33 +00:00
import { renderChartConfig , timeFilterExpr } from '../core/renderChartConfig' ;
2025-02-26 00:00:48 +00:00
describe ( 'renderChartConfig' , ( ) = > {
feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576
Closes HDX-2491
# Summary
It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:
1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).
With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.
## Testing
### Testing the fix
The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.
<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>
https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>
<details>
<summary>With the fix, you should see logs in the selected time range</summary>
https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>
### Testing the optimization
The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both `toStartOfMinute(TimestampTime)` and `TimestampTime`.
To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).
<details>
<summary>DDL for log table with optimized key</summary>
```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`TraceFlags` UInt8,
`SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
`SeverityNumber` UInt8,
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`Body` String CODEC(ZSTD(1)),
`ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>
Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
2025-10-27 17:20:36 +00:00
let mockMetadata : jest.Mocked < Metadata > ;
2025-02-26 00:00:48 +00:00
beforeEach ( ( ) = > {
feat: Add materialized view support (Beta) (#1507)
Closes HDX-3082
# Summary
This PR back-ports support for materialized views from the EE repo. Note that this feature is in **Beta**, and is subject to significant changes.
This feature is intended to support:
1. Configuring AggregatingMergeTree (or SummingMergeTree) Materialized Views which are associated with a Source
2. Automatically selecting and querying an associated materialized view when a query supports it, in Chart Explorer, Custom Dashboards, the Services Dashboard, and the Search Page Histogram.
3. A UX for understanding what materialized views are available for a source, and whether (and why) it is or is not being used for a particular visualization.
## Note to Reviewer(s)
This is a large PR, but the code has largely already been reviewed.
- For net-new files, types, components, and utility functions, the code does not differ from the EE repo
- Changes to the various services dashboard pages do not differ from the EE repo
- Changes to `useOffsetPaginatedQuery`, `useChartConfig`, and `DBEditTimeChart` differ slightly due to unrelated (to MVs) drift between this repo and the EE repo, and due to the lack of feature toggles in this repo. **This is where slightly closer review would be most valuable.**
## Demo
<details>
<summary>Demo: MV Configuration</summary>
https://github.com/user-attachments/assets/fedf3bcf-892c-4b8d-a788-7e231e23bcc3
</details>
<details>
<summary>Demo: Chart Explorer</summary>
https://github.com/user-attachments/assets/fc8d1efa-7edc-42fc-98f0-75431cc056b8
</details>
<details>
<summary>Demo: Dashboards</summary>
https://github.com/user-attachments/assets/f3cb247e-711f-4d90-95b8-cf977e94f065
</details>
## Known Limitations
This feature is in Beta due to the following known limitations, which will be addressed in subsequent PRs:
1. Visualization start and end time, when not aligned with the granularity of MVs, will result in statistics based on the MV "time buckets" which fall inside the date range. This may not align exactly with the source table data which is in the selected date range.
2. Alerts do not make use of MVs, even if the associated visualization does. Due to (1), this means that alert values may not exactly match the values shown in the associated visualization.
## Differences in OSS vs EE Support
- In OSS, there is a beta label on the MV configurations section
- In EE there are feature toggles to enable MV support, in OSS the feature is enabled for all teams, but will only run for sources with MVs configured.
## Testing
To test, a couple of MVs can be created on the default `otel_traces` table, directly in ClickHouse:
<details>
<summary>Example MVs DDL</summary>
```sql
CREATE TABLE default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` SimpleAggregateFunction(sum, UInt64),
`sum__Duration` SimpleAggregateFunction(sum, UInt64),
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, StatusCode, SpanKind, ServiceName);
CREATE MATERIALIZED VIEW default.metrics_rollup_1m_mv TO default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`version` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` UInt64,
`sum__Duration` Int64,
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
StatusCode,
count() AS count,
sum(Duration) AS sum__Duration,
avgState(Duration) AS avg__Duration,
quantileTDigestState(0.5)(Duration) AS quantile__Duration,
minSimpleState(Duration) AS min__Duration,
maxSimpleState(Duration) AS max__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind,
StatusCode;
```
```sql
CREATE TABLE default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, ServiceName, SpanKind);
CREATE MATERIALIZED VIEW default.span_kind_rollup_1m_mv TO default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
histogramState(20)(Duration) AS histogram__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind;
```
</details>
Then you'll need to configure the materialized views in your source settings:
<details>
<summary>Source Configuration (should auto-infer when MVs are selected)</summary>
<img width="949" height="1011" alt="Screenshot 2025-12-19 at 10 26 54 AM" src="https://github.com/user-attachments/assets/fc46a1b9-de8b-4b95-a8ef-ba5fee905685" />
</details>
2025-12-19 16:17:23 +00:00
const columns = [
{ name : 'timestamp' , type : 'DateTime' } ,
{ name : 'value' , type : 'Float64' } ,
{ name : 'TraceId' , type : 'String' } ,
{ name : 'ServiceName' , type : 'String' } ,
] ;
2025-02-26 00:00:48 +00:00
mockMetadata = {
getColumns : jest.fn ( ) . mockResolvedValue ( [
{ name : 'timestamp' , type : 'DateTime' } ,
{ name : 'value' , type : 'Float64' } ,
] ) ,
2025-03-06 00:06:57 +00:00
getMaterializedColumnsLookupTable : jest.fn ( ) . mockResolvedValue ( null ) ,
feat: Add materialized view support (Beta) (#1507)
Closes HDX-3082
# Summary
This PR back-ports support for materialized views from the EE repo. Note that this feature is in **Beta**, and is subject to significant changes.
This feature is intended to support:
1. Configuring AggregatingMergeTree (or SummingMergeTree) Materialized Views which are associated with a Source
2. Automatically selecting and querying an associated materialized view when a query supports it, in Chart Explorer, Custom Dashboards, the Services Dashboard, and the Search Page Histogram.
3. A UX for understanding what materialized views are available for a source, and whether (and why) it is or is not being used for a particular visualization.
## Note to Reviewer(s)
This is a large PR, but the code has largely already been reviewed.
- For net-new files, types, components, and utility functions, the code does not differ from the EE repo
- Changes to the various services dashboard pages do not differ from the EE repo
- Changes to `useOffsetPaginatedQuery`, `useChartConfig`, and `DBEditTimeChart` differ slightly due to unrelated (to MVs) drift between this repo and the EE repo, and due to the lack of feature toggles in this repo. **This is where slightly closer review would be most valuable.**
## Demo
<details>
<summary>Demo: MV Configuration</summary>
https://github.com/user-attachments/assets/fedf3bcf-892c-4b8d-a788-7e231e23bcc3
</details>
<details>
<summary>Demo: Chart Explorer</summary>
https://github.com/user-attachments/assets/fc8d1efa-7edc-42fc-98f0-75431cc056b8
</details>
<details>
<summary>Demo: Dashboards</summary>
https://github.com/user-attachments/assets/f3cb247e-711f-4d90-95b8-cf977e94f065
</details>
## Known Limitations
This feature is in Beta due to the following known limitations, which will be addressed in subsequent PRs:
1. Visualization start and end time, when not aligned with the granularity of MVs, will result in statistics based on the MV "time buckets" which fall inside the date range. This may not align exactly with the source table data which is in the selected date range.
2. Alerts do not make use of MVs, even if the associated visualization does. Due to (1), this means that alert values may not exactly match the values shown in the associated visualization.
## Differences in OSS vs EE Support
- In OSS, there is a beta label on the MV configurations section
- In EE there are feature toggles to enable MV support, in OSS the feature is enabled for all teams, but will only run for sources with MVs configured.
## Testing
To test, a couple of MVs can be created on the default `otel_traces` table, directly in ClickHouse:
<details>
<summary>Example MVs DDL</summary>
```sql
CREATE TABLE default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` SimpleAggregateFunction(sum, UInt64),
`sum__Duration` SimpleAggregateFunction(sum, UInt64),
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, StatusCode, SpanKind, ServiceName);
CREATE MATERIALIZED VIEW default.metrics_rollup_1m_mv TO default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`version` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` UInt64,
`sum__Duration` Int64,
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
StatusCode,
count() AS count,
sum(Duration) AS sum__Duration,
avgState(Duration) AS avg__Duration,
quantileTDigestState(0.5)(Duration) AS quantile__Duration,
minSimpleState(Duration) AS min__Duration,
maxSimpleState(Duration) AS max__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind,
StatusCode;
```
```sql
CREATE TABLE default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, ServiceName, SpanKind);
CREATE MATERIALIZED VIEW default.span_kind_rollup_1m_mv TO default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
histogramState(20)(Duration) AS histogram__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind;
```
</details>
Then you'll need to configure the materialized views in your source settings:
<details>
<summary>Source Configuration (should auto-infer when MVs are selected)</summary>
<img width="949" height="1011" alt="Screenshot 2025-12-19 at 10 26 54 AM" src="https://github.com/user-attachments/assets/fc46a1b9-de8b-4b95-a8ef-ba5fee905685" />
</details>
2025-12-19 16:17:23 +00:00
getColumn : jest
. fn ( )
. mockImplementation ( async ( { column } ) = >
columns . find ( col = > col . name === column ) ,
) ,
feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576
Closes HDX-2491
# Summary
It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:
1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).
With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.
## Testing
### Testing the fix
The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.
<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>
https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>
<details>
<summary>With the fix, you should see logs in the selected time range</summary>
https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>
### Testing the optimization
The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both `toStartOfMinute(TimestampTime)` and `TimestampTime`.
To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).
<details>
<summary>DDL for log table with optimized key</summary>
```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`TraceFlags` UInt8,
`SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
`SeverityNumber` UInt8,
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`Body` String CODEC(ZSTD(1)),
`ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>
Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
2025-10-27 17:20:36 +00:00
getTableMetadata : jest
. fn ( )
. mockResolvedValue ( { primary_key : 'timestamp' } ) ,
} as unknown as jest . Mocked < Metadata > ;
2025-02-26 00:00:48 +00:00
} ) ;
2025-09-11 21:10:43 +00:00
const gaugeConfiguration : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
// metricTables is added from the Source object via spread operator
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'quantile' ,
aggCondition : '' ,
aggConditionLanguage : 'lucene' ,
valueExpression : 'Value' ,
level : 0.95 ,
metricName : 'nodejs.event_loop.utilization' ,
metricType : MetricsDataType.Gauge ,
2025-02-26 00:00:48 +00:00
} ,
2025-09-11 21:10:43 +00:00
] ,
where : '' ,
whereLanguage : 'lucene' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '1 minute' ,
limit : { limit : 10 } ,
} ;
2025-02-26 00:00:48 +00:00
2025-09-11 21:10:43 +00:00
it ( 'should generate sql for a single gauge metric' , async ( ) = > {
const generatedSql = await renderChartConfig (
gaugeConfiguration ,
mockMetadata ,
) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate sql for a single gauge metric with a delta() function applied' , async ( ) = > {
const generatedSql = await renderChartConfig (
{
. . . gaugeConfiguration ,
select : [
{
aggFn : 'max' ,
valueExpression : 'Value' ,
metricName : 'nodejs.event_loop.utilization' ,
metricType : MetricsDataType.Gauge ,
isDelta : true ,
} ,
] ,
} ,
mockMetadata ,
) ;
2025-02-26 00:00:48 +00:00
const actual = parameterizedQueryToSql ( generatedSql ) ;
2025-03-06 00:06:57 +00:00
expect ( actual ) . toMatchSnapshot ( ) ;
2025-02-26 00:00:48 +00:00
} ) ;
it ( 'should generate sql for a single sum metric' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
// metricTables is added from the Source object via spread operator
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
2025-05-21 17:22:04 +00:00
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
2025-02-26 00:00:48 +00:00
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'avg' ,
aggCondition : '' ,
aggConditionLanguage : 'lucene' ,
valueExpression : 'Value' ,
metricName : 'db.client.connections.usage' ,
metricType : MetricsDataType.Sum ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
2025-02-27 16:55:36 +00:00
granularity : '5 minute' ,
2025-02-26 00:00:48 +00:00
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
2025-03-07 07:03:03 +00:00
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should throw error for string select on sum metric' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
2025-05-21 17:22:04 +00:00
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
2025-03-07 07:03:03 +00:00
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : 'Value' ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '5 minute' ,
limit : { limit : 10 } ,
} ;
await expect ( renderChartConfig ( config , mockMetadata ) ) . rejects . toThrow (
'multi select or string select on metrics not supported' ,
2025-02-27 16:55:36 +00:00
) ;
} ) ;
2025-05-19 14:47:24 +00:00
describe ( 'histogram metric queries' , ( ) = > {
it ( 'should generate a query without grouping or time bucketing' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
2025-05-21 17:22:04 +00:00
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
2025-02-27 16:55:36 +00:00
} ,
2025-05-19 14:47:24 +00:00
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'quantile' ,
level : 0.5 ,
valueExpression : 'Value' ,
metricName : 'http.server.duration' ,
metricType : MetricsDataType.Histogram ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
limit : { limit : 10 } ,
} ;
2025-02-27 16:55:36 +00:00
2025-05-19 14:47:24 +00:00
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate a query without grouping but time bucketing' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
2025-05-21 17:22:04 +00:00
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
2025-05-19 14:47:24 +00:00
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'quantile' ,
level : 0.5 ,
valueExpression : 'Value' ,
metricName : 'http.server.duration' ,
metricType : MetricsDataType.Histogram ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '2 minute' ,
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate a query with grouping and time bucketing' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
2025-05-21 17:22:04 +00:00
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
2025-05-19 14:47:24 +00:00
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'quantile' ,
level : 0.5 ,
valueExpression : 'Value' ,
metricName : 'http.server.duration' ,
metricType : MetricsDataType.Histogram ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '2 minute' ,
groupBy : ` ResourceAttributes['host'] ` ,
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
2025-02-26 00:00:48 +00:00
} ) ;
2025-03-14 13:34:47 +00:00
2025-03-17 14:45:26 +00:00
describe ( 'containing CTE clauses' , ( ) = > {
it ( 'should render a ChSql CTE configuration correctly' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
connection : 'test-connection' ,
from : {
databaseName : '' ,
tableName : 'TestCte' ,
} ,
with : [
{ name : 'TestCte' , sql : chSql ` SELECT TimeUnix, Line FROM otel_logs ` } ,
] ,
select : [ { valueExpression : 'Line' } ] ,
where : '' ,
whereLanguage : 'sql' ,
} ;
2025-03-14 13:34:47 +00:00
2025-03-17 14:45:26 +00:00
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
2025-03-14 13:34:47 +00:00
2025-03-17 14:45:26 +00:00
it ( 'should render a chart config CTE configuration correctly' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
connection : 'test-connection' ,
with : [
{
name : 'Parts' ,
chartConfig : {
connection : 'test-connection' ,
timestampValueExpression : '' ,
select : '_part, _part_offset' ,
from : { databaseName : 'default' , tableName : 'some_table' } ,
where : '' ,
whereLanguage : 'sql' ,
filters : [
{
type : 'sql' ,
condition : ` FieldA = 'test' ` ,
} ,
] ,
orderBy : [ { ordering : 'DESC' , valueExpression : 'rand()' } ] ,
limit : { limit : 1000 } ,
} ,
2025-03-14 13:34:47 +00:00
} ,
2025-03-17 14:45:26 +00:00
] ,
select : '*' ,
filters : [
{
type : 'sql' ,
condition : ` FieldA = 'test' ` ,
} ,
{
type : 'sql' ,
condition : ` indexHint((_part, _part_offset) IN (SELECT tuple(_part, _part_offset) FROM Parts)) ` ,
} ,
] ,
from : {
databaseName : '' ,
tableName : 'Parts' ,
2025-03-14 13:34:47 +00:00
} ,
2025-03-17 14:45:26 +00:00
where : '' ,
whereLanguage : 'sql' ,
orderBy : [ { ordering : 'DESC' , valueExpression : 'rand()' } ] ,
limit : { limit : 1000 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should throw if the CTE is missing both sql and chartConfig' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
connection : 'test-connection' ,
with : [
{
name : 'InvalidCTE' ,
// Intentionally omitting both sql and chartConfig properties
} ,
] ,
select : [ { valueExpression : 'Line' } ] ,
from : {
databaseName : 'default' ,
tableName : 'some_table' ,
2025-03-14 13:34:47 +00:00
} ,
2025-03-17 14:45:26 +00:00
where : '' ,
whereLanguage : 'sql' ,
} ;
await expect ( renderChartConfig ( config , mockMetadata ) ) . rejects . toThrow (
"must specify either 'sql' or 'chartConfig' in with clause" ,
) ;
} ) ;
it ( 'should throw if the CTE sql param is invalid' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
connection : 'test-connection' ,
with : [
{
name : 'InvalidCTE' ,
sql : 'SELECT * FROM some_table' as any , // Intentionally not a ChSql object
} ,
] ,
select : [ { valueExpression : 'Line' } ] ,
from : {
databaseName : 'default' ,
tableName : 'some_table' ,
2025-03-14 13:34:47 +00:00
} ,
2025-03-17 14:45:26 +00:00
where : '' ,
whereLanguage : 'sql' ,
} ;
2025-03-14 13:34:47 +00:00
2025-03-17 14:45:26 +00:00
await expect ( renderChartConfig ( config , mockMetadata ) ) . rejects . toThrow (
'non-conforming sql object in CTE' ,
) ;
} ) ;
it ( 'should throw if the CTE chartConfig param is invalid' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
connection : 'test-connection' ,
with : [
{
name : 'InvalidCTE' ,
chartConfig : {
// Missing required properties like select, from, etc.
connection : 'test-connection' ,
} as any , // Intentionally invalid chartConfig
} ,
] ,
select : [ { valueExpression : 'Line' } ] ,
from : {
databaseName : 'default' ,
tableName : 'some_table' ,
} ,
where : '' ,
whereLanguage : 'sql' ,
} ;
await expect ( renderChartConfig ( config , mockMetadata ) ) . rejects . toThrow (
'non-conforming chartConfig object in CTE' ,
) ;
} ) ;
2025-03-14 13:34:47 +00:00
} ) ;
2025-10-08 21:04:40 +00:00
describe ( 'k8s semantic convention migrations' , ( ) = > {
it ( 'should generate SQL with metricNameSql for k8s.pod.cpu.utilization gauge metric' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'avg' ,
aggCondition : '' ,
aggConditionLanguage : 'lucene' ,
valueExpression : 'Value' ,
metricName : 'k8s.pod.cpu.utilization' ,
metricNameSql :
2025-10-14 22:06:31 +00:00
"MetricName IN ('k8s.pod.cpu.utilization', 'k8s.pod.cpu.usage')" ,
2025-10-08 21:04:40 +00:00
metricType : MetricsDataType.Gauge ,
} ,
] ,
where : '' ,
whereLanguage : 'lucene' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '1 minute' ,
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
2025-10-14 22:06:31 +00:00
// Verify the SQL contains the IN-based metric name condition
expect ( actual ) . toContain ( 'k8s.pod.cpu.utilization' ) ;
expect ( actual ) . toContain ( 'k8s.pod.cpu.usage' ) ;
expect ( actual ) . toMatch ( /MetricName IN / ) ;
2025-10-08 21:04:40 +00:00
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate SQL with metricNameSql for k8s.node.cpu.utilization sum metric' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'max' ,
aggCondition : '' ,
aggConditionLanguage : 'lucene' ,
valueExpression : 'Value' ,
metricName : 'k8s.node.cpu.utilization' ,
metricNameSql :
2025-10-14 22:06:31 +00:00
"MetricName IN ('k8s.node.cpu.utilization', 'k8s.node.cpu.usage')" ,
2025-10-08 21:04:40 +00:00
metricType : MetricsDataType.Sum ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '5 minute' ,
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
2025-10-14 22:06:31 +00:00
expect ( actual ) . toContain ( 'k8s.node.cpu.utilization' ) ;
expect ( actual ) . toContain ( 'k8s.node.cpu.usage' ) ;
expect ( actual ) . toMatch ( /MetricName IN / ) ;
2025-10-08 21:04:40 +00:00
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate SQL with metricNameSql for container.cpu.utilization histogram metric' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'quantile' ,
level : 0.95 ,
valueExpression : 'Value' ,
metricName : 'container.cpu.utilization' ,
metricNameSql :
2025-10-14 22:06:31 +00:00
"MetricName IN ('container.cpu.utilization', 'container.cpu.usage')" ,
2025-10-08 21:04:40 +00:00
metricType : MetricsDataType.Histogram ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '2 minute' ,
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
2025-10-14 22:06:31 +00:00
expect ( actual ) . toContain ( 'container.cpu.utilization' ) ;
expect ( actual ) . toContain ( 'container.cpu.usage' ) ;
expect ( actual ) . toMatch ( /MetricName IN / ) ;
2025-10-08 21:04:40 +00:00
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate SQL with metricNameSql for histogram metric with groupBy' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'quantile' ,
level : 0.99 ,
valueExpression : 'Value' ,
metricName : 'k8s.pod.cpu.utilization' ,
metricNameSql :
2025-10-14 22:06:31 +00:00
"MetricName IN ('k8s.pod.cpu.utilization', 'k8s.pod.cpu.usage')" ,
2025-10-08 21:04:40 +00:00
metricType : MetricsDataType.Histogram ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '1 minute' ,
groupBy : ` ResourceAttributes['k8s.pod.name'] ` ,
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
2025-10-14 22:06:31 +00:00
expect ( actual ) . toContain ( 'k8s.pod.cpu.utilization' ) ;
expect ( actual ) . toContain ( 'k8s.pod.cpu.usage' ) ;
expect ( actual ) . toMatch ( /MetricName IN / ) ;
2025-10-08 21:04:40 +00:00
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should handle metrics without metricNameSql (backward compatibility)' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
metricTables : {
gauge : 'otel_metrics_gauge' ,
histogram : 'otel_metrics_histogram' ,
sum : 'otel_metrics_sum' ,
summary : 'otel_metrics_summary' ,
'exponential histogram' : 'otel_metrics_exponential_histogram' ,
} ,
from : {
databaseName : 'default' ,
tableName : '' ,
} ,
select : [
{
aggFn : 'avg' ,
aggCondition : '' ,
aggConditionLanguage : 'lucene' ,
valueExpression : 'Value' ,
metricName : 'some.regular.metric' ,
// No metricNameSql provided
metricType : MetricsDataType.Gauge ,
} ,
] ,
where : '' ,
whereLanguage : 'lucene' ,
timestampValueExpression : 'TimeUnix' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-12-14' ) ] ,
granularity : '1 minute' ,
limit : { limit : 10 } ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
2025-10-14 22:06:31 +00:00
// Should use the simple string comparison for regular metrics (not IN-based)
2025-10-08 21:04:40 +00:00
expect ( actual ) . toContain ( "MetricName = 'some.regular.metric'" ) ;
2025-10-14 22:06:31 +00:00
expect ( actual ) . not . toMatch ( /MetricName IN / ) ;
2025-10-08 21:04:40 +00:00
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
} ) ;
feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576
Closes HDX-2491
# Summary
It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:
1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).
With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.
## Testing
### Testing the fix
The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.
<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>
https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>
<details>
<summary>With the fix, you should see logs in the selected time range</summary>
https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>
### Testing the optimization
The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both `toStartOfMinute(TimestampTime)` and `TimestampTime`.
To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).
<details>
<summary>DDL for log table with optimized key</summary>
```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`TraceFlags` UInt8,
`SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
`SeverityNumber` UInt8,
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`Body` String CODEC(ZSTD(1)),
`ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>
Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
2025-10-27 17:20:36 +00:00
2025-11-04 21:48:39 +00:00
describe ( 'HAVING clause' , ( ) = > {
it ( 'should render HAVING clause with SQL language' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'logs' ,
} ,
select : [
{
aggFn : 'count' ,
valueExpression : '*' ,
aggCondition : '' ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'severity' ,
having : 'count(*) > 100' ,
havingLanguage : 'sql' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toContain ( 'HAVING' ) ;
expect ( actual ) . toContain ( 'count(*) > 100' ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should render HAVING clause with multiple conditions' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'metrics' ,
} ,
select : [
{
aggFn : 'avg' ,
valueExpression : 'response_time' ,
aggCondition : '' ,
} ,
{
aggFn : 'count' ,
valueExpression : '*' ,
aggCondition : '' ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'endpoint' ,
having : 'avg(response_time) > 500 AND count(*) > 10' ,
havingLanguage : 'sql' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toContain ( 'HAVING' ) ;
expect ( actual ) . toContain ( 'avg(response_time) > 500 AND count(*) > 10' ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should not render HAVING clause when not provided' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'logs' ,
} ,
select : [
{
aggFn : 'count' ,
valueExpression : '*' ,
aggCondition : '' ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'severity' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . not . toContain ( 'HAVING' ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should render HAVING clause with granularity and groupBy' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Line ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'events' ,
} ,
select : [
{
aggFn : 'count' ,
valueExpression : '*' ,
aggCondition : '' ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'event_type' ,
having : 'count(*) > 50' ,
havingLanguage : 'sql' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
granularity : '5 minute' ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toContain ( 'HAVING' ) ;
expect ( actual ) . toContain ( 'count(*) > 50' ) ;
expect ( actual ) . toContain ( 'GROUP BY' ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should not render HAVING clause when having is empty string' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'logs' ,
} ,
select : [
{
aggFn : 'count' ,
valueExpression : '*' ,
aggCondition : '' ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'severity' ,
having : '' ,
havingLanguage : 'sql' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . not . toContain ( 'HAVING' ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
} ) ;
feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576
Closes HDX-2491
# Summary
It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:
1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).
With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.
## Testing
### Testing the fix
The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.
<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>
https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>
<details>
<summary>With the fix, you should see logs in the selected time range</summary>
https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>
### Testing the optimization
The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both `toStartOfMinute(TimestampTime)` and `TimestampTime`.
To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).
<details>
<summary>DDL for log table with optimized key</summary>
```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`TraceFlags` UInt8,
`SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
`SeverityNumber` UInt8,
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`Body` String CODEC(ZSTD(1)),
`ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>
Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
2025-10-27 17:20:36 +00:00
describe ( 'timeFilterExpr' , ( ) = > {
type TimeFilterExprTestCase = {
timestampValueExpression : string ;
dateRangeStartInclusive? : boolean ;
dateRangeEndInclusive? : boolean ;
dateRange : [ Date , Date ] ;
includedDataInterval? : string ;
expected : string ;
description : string ;
tableName? : string ;
databaseName? : string ;
primaryKey? : string ;
} ;
const testCases : TimeFilterExprTestCase [ ] = [
{
description : 'with basic timestampValueExpression' ,
timestampValueExpression : 'timestamp' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (timestamp >= fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ) AND timestamp <= fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } )) ` ,
} ,
{
description : 'with dateRangeEndInclusive=false' ,
timestampValueExpression : 'timestamp' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
dateRangeEndInclusive : false ,
expected : ` (timestamp >= fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ) AND timestamp < fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } )) ` ,
} ,
{
description : 'with dateRangeStartInclusive=false' ,
timestampValueExpression : 'timestamp' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
dateRangeStartInclusive : false ,
expected : ` (timestamp > fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ) AND timestamp <= fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } )) ` ,
} ,
{
description : 'with includedDataInterval' ,
timestampValueExpression : 'timestamp' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
includedDataInterval : '1 WEEK' ,
expected : ` (timestamp >= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ), INTERVAL 1 WEEK) - INTERVAL 1 WEEK AND timestamp <= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ), INTERVAL 1 WEEK) + INTERVAL 1 WEEK) ` ,
} ,
{
description : 'with date type timestampValueExpression' ,
timestampValueExpression : 'date' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (date >= toDate(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } )) AND date <= toDate(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ))) ` ,
} ,
{
description : 'with multiple timestampValueExpression parts' ,
timestampValueExpression : 'timestamp, date' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (timestamp >= fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ) AND timestamp <= fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ))AND(date >= toDate(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } )) AND date <= toDate(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ))) ` ,
} ,
{
description : 'with toStartOfDay() in timestampExpr' ,
timestampValueExpression : 'toStartOfDay(timestamp)' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (toStartOfDay(timestamp) >= toStartOfDay(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } )) AND toStartOfDay(timestamp) <= toStartOfDay(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ))) ` ,
} ,
{
description : 'with toStartOfDay () in timestampExpr' ,
timestampValueExpression : 'toStartOfDay (timestamp)' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (toStartOfDay (timestamp) >= toStartOfDay(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } )) AND toStartOfDay (timestamp) <= toStartOfDay(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ))) ` ,
} ,
{
description : 'with toStartOfInterval() in timestampExpr' ,
timestampValueExpression :
'toStartOfInterval(timestamp, INTERVAL 12 MINUTE)' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (toStartOfInterval(timestamp, INTERVAL 12 MINUTE) >= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ), INTERVAL 12 MINUTE) AND toStartOfInterval(timestamp, INTERVAL 12 MINUTE) <= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ), INTERVAL 12 MINUTE)) ` ,
} ,
{
description :
'with toStartOfInterval() with lowercase interval in timestampExpr' ,
timestampValueExpression :
'toStartOfInterval(timestamp, interval 1 minute)' ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (toStartOfInterval(timestamp, interval 1 minute) >= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ), interval 1 minute) AND toStartOfInterval(timestamp, interval 1 minute) <= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ), interval 1 minute)) ` ,
} ,
{
description : 'with toStartOfInterval() with timezone and offset' ,
timestampValueExpression : ` toStartOfInterval(timestamp, INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York') ` ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (toStartOfInterval(timestamp, INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York') >= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ), INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York') AND toStartOfInterval(timestamp, INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York') <= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ), INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York')) ` ,
} ,
{
description : 'with nonstandard spacing' ,
timestampValueExpression : ` toStartOfInterval ( timestamp , INTERVAL 1 MINUTE , toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York' ) ` ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (toStartOfInterval ( timestamp , INTERVAL 1 MINUTE , toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York' ) >= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-12 00:12:34Z' ) . getTime ( ) } ), INTERVAL 1 MINUTE, toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York') AND toStartOfInterval ( timestamp , INTERVAL 1 MINUTE , toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York' ) <= toStartOfInterval(fromUnixTimestamp64Milli( ${ new Date ( '2025-02-14 00:12:34Z' ) . getTime ( ) } ), INTERVAL 1 MINUTE, toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York')) ` ,
} ,
{
description : 'with optimizable timestampValueExpression' ,
timestampValueExpression : ` timestamp ` ,
primaryKey :
"toStartOfMinute(timestamp), ServiceName, ResourceAttributes['timestamp'], timestamp" ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
expected : ` (timestamp >= fromUnixTimestamp64Milli(1739319154000) AND timestamp <= fromUnixTimestamp64Milli(1739491954000))AND(toStartOfMinute(timestamp) >= toStartOfMinute(fromUnixTimestamp64Milli(1739319154000)) AND toStartOfMinute(timestamp) <= toStartOfMinute(fromUnixTimestamp64Milli(1739491954000))) ` ,
} ,
{
description : 'with synthetic timestamp value expression for CTE' ,
timestampValueExpression : ` __hdx_time_bucket ` ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
databaseName : '' ,
tableName : 'Bucketed' ,
primaryKey :
"toStartOfMinute(timestamp), ServiceName, ResourceAttributes['timestamp'], timestamp" ,
expected : ` (__hdx_time_bucket >= fromUnixTimestamp64Milli(1739319154000) AND __hdx_time_bucket <= fromUnixTimestamp64Milli(1739491954000)) ` ,
} ,
{
description : 'with toStartOfMinute in timestampValueExpression' ,
timestampValueExpression : ` toStartOfMinute(timestamp) ` ,
dateRange : [
new Date ( '2025-02-12 00:12:34Z' ) ,
new Date ( '2025-02-14 00:12:34Z' ) ,
] ,
primaryKey :
"toStartOfMinute(timestamp), ServiceName, ResourceAttributes['timestamp'], timestamp" ,
expected : ` (toStartOfMinute(timestamp) >= toStartOfMinute(fromUnixTimestamp64Milli(1739319154000)) AND toStartOfMinute(timestamp) <= toStartOfMinute(fromUnixTimestamp64Milli(1739491954000))) ` ,
} ,
] ;
beforeEach ( ( ) = > {
mockMetadata . getColumn . mockImplementation ( async ( { column } ) = >
column === 'date'
? ( { type : 'Date' } as ColumnMeta )
: ( { type : 'DateTime' } as ColumnMeta ) ,
) ;
} ) ;
it . each ( testCases ) (
'should generate a time filter expression $description' ,
async ( {
timestampValueExpression ,
dateRangeEndInclusive = true ,
dateRangeStartInclusive = true ,
dateRange ,
expected ,
includedDataInterval ,
tableName = 'target_table' ,
databaseName = 'default' ,
primaryKey ,
} ) = > {
if ( primaryKey ) {
mockMetadata . getTableMetadata . mockResolvedValue ( {
primary_key : primaryKey ,
} as any ) ;
}
const actual = await timeFilterExpr ( {
timestampValueExpression ,
dateRangeEndInclusive ,
dateRangeStartInclusive ,
dateRange ,
connectionId : 'test-connection' ,
databaseName ,
tableName ,
metadata : mockMetadata ,
includedDataInterval ,
} ) ;
const actualSql = parameterizedQueryToSql ( actual ) ;
expect ( actualSql ) . toBe ( expected ) ;
} ,
) ;
} ) ;
feat: Add materialized view support (Beta) (#1507)
Closes HDX-3082
# Summary
This PR back-ports support for materialized views from the EE repo. Note that this feature is in **Beta**, and is subject to significant changes.
This feature is intended to support:
1. Configuring AggregatingMergeTree (or SummingMergeTree) Materialized Views which are associated with a Source
2. Automatically selecting and querying an associated materialized view when a query supports it, in Chart Explorer, Custom Dashboards, the Services Dashboard, and the Search Page Histogram.
3. A UX for understanding what materialized views are available for a source, and whether (and why) it is or is not being used for a particular visualization.
## Note to Reviewer(s)
This is a large PR, but the code has largely already been reviewed.
- For net-new files, types, components, and utility functions, the code does not differ from the EE repo
- Changes to the various services dashboard pages do not differ from the EE repo
- Changes to `useOffsetPaginatedQuery`, `useChartConfig`, and `DBEditTimeChart` differ slightly due to unrelated (to MVs) drift between this repo and the EE repo, and due to the lack of feature toggles in this repo. **This is where slightly closer review would be most valuable.**
## Demo
<details>
<summary>Demo: MV Configuration</summary>
https://github.com/user-attachments/assets/fedf3bcf-892c-4b8d-a788-7e231e23bcc3
</details>
<details>
<summary>Demo: Chart Explorer</summary>
https://github.com/user-attachments/assets/fc8d1efa-7edc-42fc-98f0-75431cc056b8
</details>
<details>
<summary>Demo: Dashboards</summary>
https://github.com/user-attachments/assets/f3cb247e-711f-4d90-95b8-cf977e94f065
</details>
## Known Limitations
This feature is in Beta due to the following known limitations, which will be addressed in subsequent PRs:
1. Visualization start and end time, when not aligned with the granularity of MVs, will result in statistics based on the MV "time buckets" which fall inside the date range. This may not align exactly with the source table data which is in the selected date range.
2. Alerts do not make use of MVs, even if the associated visualization does. Due to (1), this means that alert values may not exactly match the values shown in the associated visualization.
## Differences in OSS vs EE Support
- In OSS, there is a beta label on the MV configurations section
- In EE there are feature toggles to enable MV support, in OSS the feature is enabled for all teams, but will only run for sources with MVs configured.
## Testing
To test, a couple of MVs can be created on the default `otel_traces` table, directly in ClickHouse:
<details>
<summary>Example MVs DDL</summary>
```sql
CREATE TABLE default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` SimpleAggregateFunction(sum, UInt64),
`sum__Duration` SimpleAggregateFunction(sum, UInt64),
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, StatusCode, SpanKind, ServiceName);
CREATE MATERIALIZED VIEW default.metrics_rollup_1m_mv TO default.metrics_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`version` LowCardinality(String),
`StatusCode` LowCardinality(String),
`count` UInt64,
`sum__Duration` Int64,
`avg__Duration` AggregateFunction(avg, UInt64),
`quantile__Duration` AggregateFunction(quantileTDigest(0.5), UInt64),
`min__Duration` SimpleAggregateFunction(min, UInt64),
`max__Duration` SimpleAggregateFunction(max, UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
StatusCode,
count() AS count,
sum(Duration) AS sum__Duration,
avgState(Duration) AS avg__Duration,
quantileTDigestState(0.5)(Duration) AS quantile__Duration,
minSimpleState(Duration) AS min__Duration,
maxSimpleState(Duration) AS max__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind,
StatusCode;
```
```sql
CREATE TABLE default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (Timestamp, ServiceName, SpanKind);
CREATE MATERIALIZED VIEW default.span_kind_rollup_1m_mv TO default.span_kind_rollup_1m
(
`Timestamp` DateTime,
`ServiceName` LowCardinality(String),
`SpanKind` LowCardinality(String),
`histogram__Duration` AggregateFunction(histogram(20), UInt64)
)
AS SELECT
toStartOfMinute(Timestamp) AS Timestamp,
ServiceName,
SpanKind,
histogramState(20)(Duration) AS histogram__Duration
FROM default.otel_traces
GROUP BY
Timestamp,
ServiceName,
SpanKind;
```
</details>
Then you'll need to configure the materialized views in your source settings:
<details>
<summary>Source Configuration (should auto-infer when MVs are selected)</summary>
<img width="949" height="1011" alt="Screenshot 2025-12-19 at 10 26 54 AM" src="https://github.com/user-attachments/assets/fc46a1b9-de8b-4b95-a8ef-ba5fee905685" />
</details>
2025-12-19 16:17:23 +00:00
describe ( 'Aggregate Merge Functions' , ( ) = > {
it ( 'should generate SQL for an aggregate merge function' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'logs' ,
} ,
select : [
{
aggFn : 'avgMerge' ,
valueExpression : 'Duration' ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'severity' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toContain ( 'avgMerge(Duration)' ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate SQL for an aggregate merge function with a condition' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'logs' ,
} ,
select : [
{
aggFn : 'avgMerge' ,
valueExpression : 'Duration' ,
aggCondition : 'severity:"ERROR"' ,
aggConditionLanguage : 'lucene' ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'severity' ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toContain (
"avgMergeIf(Duration, ((severity = 'ERROR')) AND toFloat64OrDefault(toString(Duration)) IS NOT NULL)" ,
) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate SQL for an quantile merge function with a condition' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'logs' ,
} ,
select : [
{
aggFn : 'quantileMerge' ,
aggCondition : 'severity:"ERROR"' ,
aggConditionLanguage : 'lucene' ,
valueExpression : 'Duration' ,
level : 0.95 ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'severity' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toContain (
"quantileMergeIf(0.95)(Duration, ((severity = 'ERROR')) AND toFloat64OrDefault(toString(Duration)) IS NOT NULL)" ,
) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
it ( 'should generate SQL for an histogram merge function' , async ( ) = > {
const config : ChartConfigWithOptDateRange = {
displayType : DisplayType.Table ,
connection : 'test-connection' ,
from : {
databaseName : 'default' ,
tableName : 'logs' ,
} ,
select : [
{
aggFn : 'histogramMerge' ,
valueExpression : 'Duration' ,
level : 20 ,
} ,
] ,
where : '' ,
whereLanguage : 'sql' ,
groupBy : 'severity' ,
timestampValueExpression : 'timestamp' ,
dateRange : [ new Date ( '2025-02-12' ) , new Date ( '2025-02-14' ) ] ,
} ;
const generatedSql = await renderChartConfig ( config , mockMetadata ) ;
const actual = parameterizedQueryToSql ( generatedSql ) ;
expect ( actual ) . toContain ( 'histogramMerge(20)(Duration)' ) ;
expect ( actual ) . toMatchSnapshot ( ) ;
} ) ;
} ) ;
2025-02-26 00:00:48 +00:00
} ) ;