Commit graph

3 commits

Author SHA1 Message Date
Mohit Yadav
f69d753b79
Set Indexing related executor threads priority to LOW (#27153)
* Improve memory usage for reindex process and lower priority for

* review comments

* Fix Long Jobs not marked

* Remove not required changes
2026-04-15 11:28:47 -07:00
Mohit Yadav
7b6360a9ed
Reindex Work - Perf , Metrics , Benchmarking and More (#26231)
* Update Perf

* Add multi asset scale count

* Update perf and Usage

* Fix recommendation

* Add Benchmarking script and doc

* Fix Perf

* Add --no break to benchmark

* add more metrics and validation for indexes miss

* Update generated TypeScript types

* Bound Doc Virtual Threads

* Remove Additional Properties from the UI

* Update doc

* Fix Job Getting Marked Stopped

* Server killed logs fixes

* Add Server stat to Quartz Progress

* Fix CPU spiking

* Make Auto Tune Consider JVm configs

* Fix Partition Calculator and Recovery Job Stats

* Update Auto Tune to show up in logs and stored in config

* Fix Auto Tune Config not store in app run record

* Fix OnDemand Job type

* Indexing Failures not flushed fixed

* Fix Stat counting at job level with process job failures

* Add Reindex Job Identifier

* Add Thread Identifiers

* Wait for sink

* Wait for sink

* Fix Stopping to let partitions finish the job

* CPU Budgeting

* More Conservative settings

* Address Review Comment

* fix Open Search Index Manager

* Reapply OpenSearch BulkSink

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-03-10 08:10:46 +05:30
Mohit Yadav
b59aa7fc44
Improve indexing (#26154)
* Add Prometheus metrics for reindexing pipeline via Micrometer                                                       Bridge the existing reindexing atomic counters to Prometheus so operators     can alert on failures, latency spikes, and backpressure without relying      solely on database-flushed stats.

  - Add ReindexingMetrics singleton (initialize/getInstance pattern matching
    CacheMetrics) with job lifecycle counters, stage success/failed/warnings
    counters, bulk request timers with SLA buckets, payload size distribution,
    backpressure and promotion counters, and active/pending gauges
  - Register in MicrometerBundle after StreamableLogsMetrics
  - Instrument ReindexingOrchestrator.run() with job started/completed/failed/stopped
  - Bridge StageStatsTracker.flush() deltas to Prometheus per stage and entity type
  - Add bulk request latency timer and payload size recording in OpenSearchBulkSink
  - Record backpressure events in SearchIndexExecutor.handleBackpressure()
  - Record promotion success/failure in DefaultRecreateHandler
  - Add ReindexingMetricsTest with 24 tests covering all metric types

* Add Improvements

* Auto Gene

* Use Auto Config in distributed

* Fix Partition Claim Spread

* Make partition use config

* Correct total count

* Fix Wait time to 5 mins

* Revert om yaml

* Fix Sink sync

* Add Failure Handling at different stages

* Update script to create entities

* Move to scripts

* Add usage and fix script

* Fix Script

* Update generated TypeScript types

* Fix Staging miss

* Fix Stats reconcilation issue

* Revert workflow handler

* Fix Partition worker early sync

* Update Logs

* Update logs EntityRepository

* Error failure test

* Review Comments fix

* Fix Non Distributed live feed

* Fix Non Distributed stats feed

* Fix Review comments

* Fix Time Series cutt off

* Update generated TypeScript types

* Md

* Benchmark addition

* Fix date time warning

* Update load test to do benchmark analysis

* Disagnostic and update perf test

* Move load test to bin

* Fix Review Comments

* Add numeric values

* Move to localhost by default

* Fix Perf test issues

* Review Comments

* Add Preflight Fixes

* Add Preflight fixes for stale entry

* Remove stale entry on ApplicationHandler

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-03-03 16:39:27 +05:30