Commit graph

43 commits

Author SHA1 Message Date
Brandon Pereira
43dfb3aaff
chore to move critical path files (#1314)
moves them into a core folder, this allows us to easily track when core files are modified via path

no changeset because no version bump required

fixes HDX-2589
2025-10-30 15:16:33 +00:00
Drew Davis
2162a69039
feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576
Closes HDX-2491

# Summary

It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways:

1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.
2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).

With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.

## Testing

### Testing the fix

The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute.

<details>
<summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary>

https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60
</details>

<details>
<summary>With the fix, you should see logs in the selected time range</summary>

https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75
</details>

### Testing the optimization

The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both  `toStartOfMinute(TimestampTime)` and `TimestampTime`.

To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`).

<details>
<summary>DDL for log table with optimized key</summary>

```sql
CREATE TABLE default.otel_logs_toStartOfMinute_Key
(
    `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `TimestampTime` DateTime DEFAULT toDateTime(Timestamp),
    `TraceId` String CODEC(ZSTD(1)),
    `SpanId` String CODEC(ZSTD(1)),
    `TraceFlags` UInt8,
    `SeverityText` LowCardinality(String) CODEC(ZSTD(1)),
    `SeverityNumber` UInt8,
    `ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
    `Body` String CODEC(ZSTD(1)),
    `ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
    `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)),
    `ScopeName` String CODEC(ZSTD(1)),
    `ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)),
    `ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)),
    INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
    INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8,
    INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
)
ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime)
ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(90)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
```
</details>

Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
2025-10-27 17:20:36 +00:00
Drew Davis
ff86d40006
perf: Implement query chunking for charts (#1233)
# Summary

Closes HDX-2310
Closes HDX-2616

This PR implements chunking of chart queries to improve performance of charts on large data sets and long time ranges. Recent data is loaded first, then older data is loaded one-chunk-at-a-time until the full chart date range has been queried.

https://github.com/user-attachments/assets/83333041-9e41-438a-9763-d6f6c32a0576

## Performance Impacts

### Expectations

This change is intended to improve performance in a few ways:

1. Queries over long time ranges are now much less likely to time out, since the range is chunked into several smaller queries
2. Average memory usage should decrease, since the total result size and number of rows being read are smaller
3. _Perceived_ latency of queries over long date ranges is likely to decrease, because users will start seeing charts render (more recent) data as soon as the first chunk is queried, instead of after the entire date range has been queried. **However**, _total_ latency to display results for the entire date range is likely to increase, due to additional round-trip network latency being added for each additional chunk.

### Measured Results

Overall, the results match the expectations outlined above.

- Total latency changed between ~-4% and ~25%
- Average memory usage decreased by between 18% and 80%

<details>
<summary>Scenarios and data</summary>

In each of the following tests:

1. Queries were run 5 times before starting to measure, to ensure data is filesystem cached.
2. Queries were then run 3 times. The results shown are the median result from the 3 runs.

#### Scenario: Log Search Histogram in Staging V2, 2 Day Range, No Filter

|   |  Total Latency | Memory Usage (Avg) | Memory Usage (Max)  |  Chunk Count |
|---|---|---|---|---|
|  Original |  5.36 |  409.23 MiB |  409.23 MiB | 1  |
|  Chunked |  5.14 | 83.06 MiB  | 232.69 MiB  |  4 |

#### Scenario: Log Search Histogram in Staging V2, 14 Day Range, No Filter

|   |  Total Latency | Memory Usage (Avg) | Memory Usage (Max)  |  Chunk Count |
|---|---|---|---|---|
|  Original |  26.56 |  383.63 MiB |  383.63 MiB | 1  |
|  Chunked |  33.08 | 130.00 MiB  | 241.21 MiB  |  16 |

#### Scenario: Chart Explorer Line Chart with p90 and p99 trace durations, Staging V2 Traces, Filtering for "GET" spans, 7 Day range

|   |  Total Latency | Memory Usage (Avg) | Memory Usage (Max)  |  Chunk Count |
|---|---|---|---|---|
|  Original |  2.79 |  346.12 MiB |  346.12 MiB | 1  |
|  Chunked |  3.26 | 283.00 MiB  | 401.38 MiB  |  9 |

</details>

## Implementation Notes

<details>
<summary>When is chunking used?</summary>
Chunking is used when all of the following are true:

1. `granularity` and `timestampValueExpression` are defined in the config. This ensures that the query is already being bucketed. Without bucketing, chunking would break aggregation queries, since groups can span multiple chunks.
4. `dateRange` is defined in the config. Without a date range, we'd need an unbounded set of chunks or the start and end chunks would have to be unbounded at their start and end, respectively.
5. The config is not a metrics query. Metrics queries have complex logic which we want to avoid breaking with the initial delivery of this feature.
6. The consumer of `useQueriedChartConfig` does not pass the `disableQueryChunking: true` option. This option is provided to disable chunking when necessary.
</details>

<details>
<summary>How are time windows chosen?</summary>

1. First, generate the windows as they are generated for the existing search chunking feature (eg. 6 hours back, 6 hours back, 12 hours back, 24 hours back...)
4. Then, the start and end of each window is aligned to the start of a time bucket that depends on the "granularity" of the chart.
7. The first and last windows are shortened or extended so that the combined date range of all of the windows matches the start and end of the original config.
</details>

<details>
<summary>Which order are the chunks queried in?</summary>

Chunks are queried sequentially, most-recent first, due to the expectation that more recent data is typically more important to the user. Unlike with `useOffsetPaginatedSearch`, we are not paginating the data beyond the chunks, and all data is typically displayed together, so there is no need to support "ascending" order.
</details>

<details>
<summary>Does this improve client-side caching behavior?</summary>

One theoretical way in which query chunking could improve performance to enable client-side caching of individual chunks, which could then be re-used if the same query is run over a longer time range.

Unfortunately, using streamedQuery, react-query stores the entire time range as one item in the cache, so it does not re-use individual chunks or "pages" from another query.

We could accomplish this improvement by using useQueries instead of streamedQuery or useInfiniteQuery. In that case, we'd treat each chunk as its own query. This would require a number of changes:

1. Our query key would have to include the chunk's window duration
2. We'd need some hacky way of making the useQueries requests fire in sequence. This can be done using `enabled` but requires some additional state to figure out whether the previous query is done.
5. We'd need to emulate the return value of a useQuery using the useQueries result, or update consumers.
</details>
2025-10-27 14:02:59 +00:00
Warren
43e32aafc7
feat: revisit Otel metrics semantic convention migration logics (#1267)
Since users can still switch to the new metric name using feature gate

Follow up https://github.com/hyperdxio/hyperdx/pull/1248
2025-10-14 22:06:31 +00:00
Warren
5efa2ffa0d
feat: handle k8s metrics semantic convention updates (#1248)
Handle OpenTelemetry semantic versions based on the ScopeVersion field (metrics)
Related to [changes](https://opentelemetry.io/blog/2025/kubeletstats-receiver-metrics-deprecation/)

Old (switched to v0.137.0)
<img width="818" height="317" alt="image" src="https://github.com/user-attachments/assets/ceea52c6-ad06-4295-afae-a44f21b2e962" />

New (be able to handle multiple versions)
<img width="568" height="329" alt="image" src="https://github.com/user-attachments/assets/d2e282b2-cfd7-490a-a64d-502881a360a2" />


Ref: HDX-2322, HDX-2562
2025-10-08 21:04:40 +00:00
Mike Shi
b8efb4924c
chart ai assistant (#1243) 2025-10-07 14:47:10 -04:00
Mike Shi
5a44953e49
feat: Add new none aggregation function to allow fully user defined aggregations in SQL (#1174)
<img width="1956" height="851" alt="image" src="https://github.com/user-attachments/assets/3ca89db9-484b-4e74-88a5-4c31b6a96aef" />
2025-09-19 21:17:40 +00:00
Drew Davis
e7b590cc59
fix: Fix invalid valueExpression (#1161) 2025-09-12 11:43:37 -04:00
Drew Davis
fa45875d38
feat: Add delta() function for gauge metrics (#1147) 2025-09-11 17:10:43 -04:00
Mike Shi
61c79a16a4
fix: Ensure percentile aggregations on histograms dont create invalid SQL queries due to improperly escaped aliases. (#1021)
Closes #1020
Closes HDX-2063

<img width="1855" height="897" alt="image" src="https://github.com/user-attachments/assets/5f7f0505-934c-4da0-8e46-f07aa5035455" />
2025-07-25 17:15:37 +00:00
Mike Shi
33fc071dfa
feat: Allow users to define custom column aliases for charts (#996)
<img width="1337" height="988" alt="image" src="https://github.com/user-attachments/assets/80d83541-3fa9-4ebb-b54c-3caccbd86e90" />

Resolves HDX-1719
2025-07-15 14:08:29 +00:00
Mike Shi
973b9e8d0a
feat: Add any aggFn support, fix select field input not showing up (#991)
Closes HDX-2011

Co-authored-by: Tom Alexander <3245235+teeohhem@users.noreply.github.com>
2025-07-11 14:20:09 +00:00
Dan Hable
2f4bc07d38
fix: remove noisy log message (#921)
Now that the app has some complex queries that leverage CTEs, metrics for example, it's common for the logic in this optimization to throw an exception. When that happens, the query rendering logic continues without a problem but generates a noisy line in the console log. We can just remove this log message to clean up the debugging experience.

Ref: HDX-1763
2025-06-10 21:15:42 +00:00
Tom Alexander
cb4045bddb
feat: Add charts API (#811)
* Utilizes renderChartConfig and CH client to query for chart data
* Implements API input schema
* Adds lots of tests

Testing Notes:
* To use swagger, go to localhost:8000/api/v2/docs
* Authorize using your access key found in localhost:8000/me
* Under the charts route, click "Try it out"
* Use example payload:
*
```
{
  "startTime": <insert valid timestamp ms>,
  "endTime": <insert valid timestamp ms>,
  "granularity": "1h",
  "series": [
    {
      "sourceId": "<insert valid sourceid>",
      "aggFn": "count",
      "where": "SeverityText:error",
      "groupBy": []
    }
  ]
}
```


It was easiest for me to go to the UI, create a new chart and grab the sourceid and start/end timestamps from the URL, plug it in and profit.

Note: It was apparent to me that we will need to provide at least GET support for sources, otherwise that ID is not easily obtained.

Ref: HDX-1651
2025-06-09 19:50:38 +00:00
Dan Hable
96b8c50898
fix(metrics): fix histogram metric query (#823)
Fix the query to address issues with the value calculation as well as allow for grouping.

Ref: HDX-1726
2025-05-19 14:47:24 +00:00
Warren
321e24f968
fix: alerting time range filtering bug (#814)
Ref: HDX-1701

1. fix alerting time range filtering
2. add time range info to the alert body

<img width="620" alt="image" src="https://github.com/user-attachments/assets/205d6537-e177-4be9-888f-a9328c8a2b8a" />
2025-05-16 17:40:32 +00:00
Mike Shi
931d7387d9
fix: bugs with showing non otel spans (ex. clickhouse opentelemetry span logs) (#789)
<img width="1645" alt="image" src="https://github.com/user-attachments/assets/2f7eb93f-9648-4c98-8bfd-a2d0f65be9d5" />

fixing a few bugs that prevented us from properly rendering trace view for `system.opentelemetry_span_log`

fix HDX-1676

Co-authored-by: Warren <5959690+wrn14897@users.noreply.github.com>
2025-05-07 16:08:21 +00:00
Dan Hable
79fe30f503
fix: aggFn use default instead of null (#782)
Switched to the `-OrDefault` version of the float conversion when combined with the aggregate functions to prevent emitting null.

Ref: HDX-1379
2025-04-29 18:29:37 +00:00
Mike Huang
4acb8bcd9d
fix incorrect split for select query (#754)
- add new split function for brackets and quotes selection 
- should only affect SELECT section in search component
- add test

<img width="1236" alt="image" src="https://github.com/user-attachments/assets/569d569c-c52e-4f17-84be-0ec52175df0e" />


Ref: hdx-1587
2025-04-22 02:33:12 +00:00
Warren
7f0b397969
feat: queryChartConfig method + events chart ratio (#759)
Ref: HDX-1631

1. use temp centralized `queryChartConfig` to handle multi-series chart (metrics specifically)
2. move ratio computation logics (events chart) to the renderChartConfig
3. fix missing `seriesReturnType` prop in chartConfig in the checkAlert file
2025-04-21 22:52:55 +00:00
Dan Hable
4865ce7a62
fix: fix histogram metric query (#737)
Fix for the histogram query based on the late night session at KubeCon.

Ref: HDX-1572
2025-04-14 22:04:13 +00:00
Dan Hable
e002c2f9c6
feat: query sum metric without rate logic (#717)
Add the ability to query a sum metric and obtain the underlying values instead of the rate of change between those points.

Ref: HDX-1543
2025-03-27 19:09:58 +00:00
Warren
e884d85354
fix: metrics > logs correlation flow (#711)
Ref: HDX-1537

<img width="907" alt="Screenshot 2025-03-25 at 1 52 45 PM" src="https://github.com/user-attachments/assets/f2cc7f1c-0516-4c04-a339-ec80e4cc188d" />

If no log source is associated with metric source, the app notifies users

<img width="753" alt="image" src="https://github.com/user-attachments/assets/453ea3f7-f721-4189-b035-623602483c6a" />
2025-03-25 21:28:43 +00:00
Dan Hable
50ce38f1a9
test: histogram metric query integration tests (#692)
Pulls a set of test cases from the v1 code base that checks histogram metric queries against different quantile and queries for edge bounds as well.

Ref: HDX-1425
2025-03-25 18:59:15 +00:00
Warren
e5a210a1bd
feat: support search on multi implicit fields (#696)
Currently,  users (or hyperdx) will still need to create the index (ex: tokenbf)  on multi-fields to speed up query if perf is a concern.
ref: HDX-1522


<img width="715" alt="image" src="https://github.com/user-attachments/assets/d8ddbe3e-eb75-4780-b2cf-03dcf2f309ec" />

<img width="1056" alt="image" src="https://github.com/user-attachments/assets/e2071c55-9958-4772-a156-e1e1b568d67e" />
2025-03-20 22:41:26 +00:00
Dan Hable
8d5c120490
fix: update delta chart with clause (#694)
Prior refactoring broke out the `sql` and `chartConfig` field names to allow each to be more strict in their types. The delta chart config call was missed in that refactoring.

After refactoring the schema:
<img width="1302" alt="Screenshot 2025-03-19 at 10 13 47 AM" src="https://github.com/user-attachments/assets/d4c5433c-751e-4561-9f52-72ccca64d301" />

After fixing the delta chart component:
<img width="896" alt="Screenshot 2025-03-19 at 10 14 23 AM" src="https://github.com/user-attachments/assets/3a183e1a-af68-4453-b4fc-8515a8e6734e" />

In the UI:
<img width="1417" alt="Screenshot 2025-03-18 at 5 22 25 PM" src="https://github.com/user-attachments/assets/4b720af6-ed11-444d-8602-c019f38facad" />

Ref: HDX-1517
2025-03-19 20:37:19 +00:00
Dan Hable
b9f7d32efa
refactor: clean up the chart config CTE render logic (#686)
Some additional refactoring and testing around the more complex CTE rendering.

Ref: HDX-1511
2025-03-17 14:45:26 +00:00
Dan Hable
a9dfa14930
fix: use CTE instead of listing all index parts in query (#666)
## feat: allow CTE definitions to be nested chart configs

In order to easily use a CTE for fixing large index issues with delta
trace events, this commit updates the type and `renderWith` function to
render a nested chart config.

Ref: HDX-1343

---

## fix: use CTE instead of listing all index parts in query

Instead of sending 2 queries to the DB and enumerating all of parts
and offsets in the query, this change uses a CTE to select the parts.
This reduces the size of the HTTP request, which fixes the URI too
long response.

Ref: HDX-1343
2025-03-14 13:34:47 +00:00
Mike Shi
521facae92
use quote for aliases for sql compatibility (#680) 2025-03-13 06:19:26 +00:00
Tom Alexander
c6916f08a0
fix: Remove fill from chart configs as it breaks heatmap and more (#674)
WITH FILL messes with the heatmap bucketing logic, confusing the charting library. This change removes WITH FILL from query generation.

Ref: HDX-1456
2025-03-13 01:30:50 +00:00
Mike Shi
8f4e01035b
add support for aliases in search, add WITH clause to chartconfig (#659)
<img width="1645" alt="image" src="https://github.com/user-attachments/assets/430df67f-c415-4191-b796-ea078b8a1232" />

still not super smooth, but gets us most of the way there
2025-03-13 01:20:52 +00:00
Dan Hable
8acc7257d2
fix: few of histogram query fixes/tweaks (#669)
1. Eliminates a subquery select by pulling the handful of subquery fields up a level.

2. Removed `intDivOrZero` usage as this rounded fractional amounts to the nearest whole number, over/under stating the value.

3. Formatting of query now matches other queries.

Ref: HDX-1467
2025-03-12 14:12:21 +00:00
Warren
4492daa5b9
fix: gauge metric attribute conflicts issue (#670)
Ref: HDX-1468
2025-03-12 02:16:15 +00:00
Warren
9c5c2396fa
fix: handle 'filters' config (metrics) (#663)
Ref: HDX-1466
2025-03-11 07:25:00 +00:00
Warren
29e8f37d00
fix: aggCondition issue in sum/gauge/histogram metrics (#662)
Ref: HDX-1455
2025-03-10 23:38:55 +00:00
Dan Hable
99b60d50b2
fix: update sum metric query based on v1 integration test (#650)
Fix the sum query to produce the correct results from the min/max test case from v1.

Ref: HDX-1421
2025-03-07 07:03:03 +00:00
Warren
cd0e4fd71c
fix: correct handling of gauge metrics in renderChartConfig (#654) 2025-03-06 00:06:57 +00:00
Dan Hable
e80630c107
feat: supporting quantile histogram metrics (#635)
Additional `renderChartConfig` support to transform a histogram select into the correct SQL syntax to generate a chart. For parity with v1, this query only handles quantile queries.

<img width="1939" alt="Screenshot 2025-02-26 at 12 58 55 PM" src="https://github.com/user-attachments/assets/1126ac6c-c431-4d89-92d7-9df1e49e25cf" />

<img width="1960" alt="Screenshot 2025-02-26 at 3 11 07 PM" src="https://github.com/user-attachments/assets/e4fa09bf-1e27-4a90-ad25-6c6cb2890877" />

Ref: HDX-1339
2025-02-27 16:55:36 +00:00
Tom Alexander
521793df2d
fix: Ensure group-by works with sum metrics (#636)
Adds all available columns into the query so that we can properly apply the group by clause.

Ref: HDX-1419
2025-02-27 15:51:23 +00:00
Warren
57a6bc399f
feat: BETA metrics support (sum + gauge) (#629)
<img width="1310" alt="Screenshot 2025-02-25 at 3 43 11 PM" src="https://github.com/user-attachments/assets/38c98bc2-2ff2-412c-b26d-4ed9952439f2" />


Co-authored-by: Mike Shi <2781687+MikeShi42@users.noreply.github.com>
Co-authored-by: Dan Hable <418679+dhable@users.noreply.github.com>
Co-authored-by: Tom Alexander <3245235+teeohhem@users.noreply.github.com>
2025-02-26 00:00:48 +00:00
Warren
a483780ef6
style: move types from renderChartConfig + add exceptions types (#568) 2025-01-24 01:52:54 +00:00
Warren
a70080e533
style: use common utils package (api and app) (#555) 2025-01-21 18:44:14 +00:00
Warren
6ee29abe02
feat: introduce common-utils package (#554)
- copy and paste the utils to a separate dir
- setup building + CD
2025-01-16 18:15:22 +00:00