OpenMetadata/scripts
Sriharsha Chintalapani b5374f9fec
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
Reindex robustness: selective fields, cache fail-fast, stop actually stops (#27876)
* Reindex robustness: selective fields, cache fail-fast, stop actually stops

Three independent fixes that all surfaced from the same incident: a 580k-
container reindex that froze for hours, then refused to actually stop when
the user clicked Stop.

Selective fields in the distributed reader path. PartitionWorker was
hardcoding List.of("*"), triggering every fieldFetcher in setFieldsInBulk —
including fetchAndSetOwns on Team/User where every owned entity becomes a
getEntityReferenceById round-trip. PR #27723 fixed this for EntityReader
(single-server) but the distributed pipeline never picked it up. Lifted the
field-resolution into ReindexingUtil so both paths share one source of
truth.

Cache layer no longer flaps on a single Redis hiccup. RedisCacheProvider
used to flip the whole provider unavailable on the first 300 ms timeout and
flip back on the next PING success — which combined with a 1 s health-check
made the indexer pay one timeout per cycle indefinitely. Replaced with a
sliding-window failure detector (5 failures in 30 s to trip, 3 consecutive
successes to recover) on the BulkCircuitBreaker pattern.

CacheWarmupApp parsed user config as EventPublisherJob (the SearchIndex
schema), which broke the Configuration page once cacheWarmupAppConfig.json
gained a type discriminator. Switched to CacheWarmupAppConfig in all four
parse sites and decoupled runtime status/stats from the parsed config.
Removed the readAppConfigFlags() workaround that read warmBundles /
enableDistributedClaim out of a raw map. Bails with ACTIVE_ERROR (not
COMPLETED) when an entity type is only partially warmed; retries on
transient cache unavailability instead of giving up on the first miss.

Stop actually stops. Three pieces:
- DistributedJobStatsAggregator skips the WebSocket status broadcast while
  the job is STOPPING so it doesn't overwrite the AppRunRecord.STOPPED that
  AppScheduler.updateAndBroadcastStoppedStatus pushed. Self-stops after a
  30 s grace if the executor never gets to call stop() on it.
- DistributedSearchIndexExecutor.stop() now calls workerExecutor.shutdownNow()
  after flagging workers, so threads parked inside the bulk-sink semaphore,
  initializeKeysetCursor, or waitForSinkOperations (5-min deadline) get
  interrupted instead of grinding for minutes.
- OpenSearchBulkSink replaces concurrentRequestSemaphore.acquire() with a
  60-second tryAcquire, recording permanent failure on timeout. A leaked
  bulk future (callback never fires) can no longer permanently freeze every
  subsequent flush at a fixed record count.
2026-05-04 13:22:15 -07:00
..
check_prerequisites.sh Add Unit Tests coverage (#26360) 2026-03-23 16:17:15 +01:00
datamodel_generation.py Revert "Feature #18173: Version API Improvements" (#26307) (#27837) 2026-04-30 11:23:42 +00:00
deploy-pipelines.py MINOR - deploy pipelines fixes (#24575) 2025-11-27 12:31:07 +01:00
format-code.sh Minor: Migrate to latest google code style library to support Java 17 and beyond (#14429) 2023-12-18 12:56:17 -08:00
generate-rdf-models.sh RDF Ontology, Json LD, DCAT vocabulary support by mapping OM Schemas to RDF (#22852) 2025-08-17 18:36:26 -07:00
html_to_pdf.py Generate PDF for Snyk security report (#10086) 2023-02-02 17:10:35 +01:00
ingest_100k_containers.py Reindex robustness: selective fields, cache fail-fast, stop actually stops (#27876) 2026-05-04 13:22:15 -07:00
ingest_100k_tables.py Distributed Search Indexing with Push Notifications (#24939) 2026-01-23 06:12:05 +05:30
jacoco_diff_coverage.py Add Unit Tests coverage (#26360) 2026-03-23 16:17:15 +01:00
reindex-perf-bootstrap.sh Reindex robustness: selective fields, cache fail-fast, stop actually stops (#27876) 2026-05-04 13:22:15 -07:00
scaffold_connector.py Add skills to build connectors (#26309) 2026-03-08 21:45:10 -07:00
slack-link-monitor.py CI - Slack link monitor w/ playwright (#25641) 2026-01-30 10:23:52 +01:00
test_connection.py Distributed Search Indexing with Push Notifications (#24939) 2026-01-23 06:12:05 +05:30
update_version.py MINOR: Add OpenAPI version update functionality in Makefile and script (#24604) 2026-01-14 14:11:56 +05:30
validate_change.sh ISSUE #2681 - Add Missing test parameters in PSQL (#25323) 2026-01-16 12:09:15 +01:00
validate_json_yaml.sh FIX #24374 - Data Contract at Data Product level (#25314) 2026-01-23 07:01:53 +01:00
validate_sample_data.py CI - Fix operator build test (#19938) 2025-02-24 12:17:00 +01:00
validate_yaml.py CI - YAML formatting issue (#23047) 2025-08-21 18:05:15 +02:00
worktree_dev.sh fix: Handle special characters in passwords for TableDiff URL parsing (#25038) 2026-01-06 08:18:08 +01:00