OpenMetadata/bootstrap/sql
Sriharsha Chintalapani 5620121e50
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
SearchIndex: tunable index settings + per-stage latency metrics (#27865)
* SearchIndex: configurable index settings + per-stage latency metrics

Adds two diagnostic and operational improvements to the distributed search
indexing pipeline so operators can both tune cluster behavior per
installation and pinpoint where reindex latency is being spent.

Configurable index settings (per-installation, no code changes needed)
- New SearchIndexing app config fields: liveIndexSettings (post-promote),
  bulkIndexSettings (during reindex), and per-entity overrides.
- DefaultRecreateHandler applies bulk overrides on staged-index creation
  (e.g. refresh=-1, replicas=0, async translog) and reverts to live values
  before alias swap. Optional force-merge before swap.
- Safety revert ensures the promoted index never inherits a disabled
  refresh interval, even if the admin only configured bulk overrides.
- Live UX is preserved: refresh defaults to 1s so users and agents that
  read-after-write see near-real-time results.
- New IndexManagementClient methods (updateIndexSettings, forceMerge)
  with implementations for OpenSearch and Elasticsearch.

Per-stage latency metrics (consumer-vs-producer attribution)
- StageStatsTracker accumulates per-stage wall-clock time alongside
  existing counters; added timing-only addStageTime() so per-record
  callbacks and per-batch wall-clock don't double-count.
- DB migration 1.13.0 adds readerTimeMs / processTimeMs / sinkTimeMs /
  vectorTimeMs columns to search_index_server_stats. Existing rows get
  DEFAULT 0; aggregation queries SUM the new columns.
- Reader timing wraps PartitionWorker.readEntitiesKeyset (DB latency).
  Process timing wraps the doc-build join in OpenSearch and Elasticsearch
  bulk sinks (CPU/serialization). Sink timing wraps client.indices().bulk
  (pure search-cluster latency), attributed per participating tracker.
- DistributedJobStatsAggregator surfaces totalTimeMs on each StepStats so
  the UI can compute avg latency = totalTimeMs / successRecords and
  throughput = successRecords / (totalTimeMs / 1000) on every WebSocket
  push without server-side derivation.
- New per-server aggregation query (getStatsByServer) for distributed
  visibility, fed into SearchIndexJob.ServerStats with timing fields.

UI: each of the four stage cards (Reader / Process / Sink / Vector) shows
"Latency: X ms · Y r/s" when timing is available; per-entity table gains
Sink avg + Sink throughput columns. Docs panel updated. New SearchIndexing
config section added with sane defaults that preserve current behavior.

Tests: 6 new StageStatsTracker timing tests, new aggregator test that
asserts StepStats.totalTimeMs is populated at job and per-entity level.
All existing tests updated for new arg shapes; 60 unit tests pass.

The pattern operators see: Reader avg climbing means DB-side issue
(cache/autovacuum); Sink avg climbing means OS-side issue (segments/
back-pressure); only one entity's row climbing identifies the offender.
2026-05-02 20:11:06 -07:00
..
migrations SearchIndex: tunable index settings + per-stage latency metrics (#27865) 2026-05-02 20:11:06 -07:00
schema Fix Metrics collection; reduce no.of metrics; improve slow request lo… (#25751) 2026-03-13 13:38:31 -07:00