moves them into a core folder, this allows us to easily track when core files are modified via path
no changeset because no version bump required
fixes HDX-2589
Closes HDX-2576
Closes HDX-2491
# Summary
It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:
1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).
With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.
## Testing
### Testing the fix
The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.
<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>
https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>
<details>
<summary>With the fix, you should see logs in the selected time range</summary>
https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>
### Testing the optimization
The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both `toStartOfMinute(TimestampTime)` and `TimestampTime`.
To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).
<details>
<summary>DDL for log table with optimized key</summary>
```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
`Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
`TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
`TraceId` String CODEC(ZSTD(1)),
`SpanId` String CODEC(ZSTD(1)),
`TraceFlags` UInt8,
`SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
`SeverityNumber` UInt8,
`ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
`Body` String CODEC(ZSTD(1)),
`ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeName` String CODEC(ZSTD(1)),
`ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
`ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
`__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>
Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
## feat: allow CTE definitions to be nested chart configs
In order to easily use a CTE for fixing large index issues with delta
trace events, this commit updates the type and `renderWith` function to
render a nested chart config.
Ref: HDX-1343
---
## fix: use CTE instead of listing all index parts in query
Instead of sending 2 queries to the DB and enumerating all of parts
and offsets in the query, this change uses a CTE to select the parts.
This reduces the size of the HTTP request, which fixes the URI too
long response.
Ref: HDX-1343
<img width="1310" alt="Screenshot 2025-02-25 at 3 43 11 PM" src="https://github.com/user-attachments/assets/38c98bc2-2ff2-412c-b26d-4ed9952439f2" />
Co-authored-by: Mike Shi <2781687+MikeShi42@users.noreply.github.com>
Co-authored-by: Dan Hable <418679+dhable@users.noreply.github.com>
Co-authored-by: Tom Alexander <3245235+teeohhem@users.noreply.github.com>