* Reindex robustness: selective fields, cache fail-fast, stop actually stops
Three independent fixes that all surfaced from the same incident: a 580k-
container reindex that froze for hours, then refused to actually stop when
the user clicked Stop.
Selective fields in the distributed reader path. PartitionWorker was
hardcoding List.of("*"), triggering every fieldFetcher in setFieldsInBulk —
including fetchAndSetOwns on Team/User where every owned entity becomes a
getEntityReferenceById round-trip. PR #27723 fixed this for EntityReader
(single-server) but the distributed pipeline never picked it up. Lifted the
field-resolution into ReindexingUtil so both paths share one source of
truth.
Cache layer no longer flaps on a single Redis hiccup. RedisCacheProvider
used to flip the whole provider unavailable on the first 300 ms timeout and
flip back on the next PING success — which combined with a 1 s health-check
made the indexer pay one timeout per cycle indefinitely. Replaced with a
sliding-window failure detector (5 failures in 30 s to trip, 3 consecutive
successes to recover) on the BulkCircuitBreaker pattern.
CacheWarmupApp parsed user config as EventPublisherJob (the SearchIndex
schema), which broke the Configuration page once cacheWarmupAppConfig.json
gained a type discriminator. Switched to CacheWarmupAppConfig in all four
parse sites and decoupled runtime status/stats from the parsed config.
Removed the readAppConfigFlags() workaround that read warmBundles /
enableDistributedClaim out of a raw map. Bails with ACTIVE_ERROR (not
COMPLETED) when an entity type is only partially warmed; retries on
transient cache unavailability instead of giving up on the first miss.
Stop actually stops. Three pieces:
- DistributedJobStatsAggregator skips the WebSocket status broadcast while
the job is STOPPING so it doesn't overwrite the AppRunRecord.STOPPED that
AppScheduler.updateAndBroadcastStoppedStatus pushed. Self-stops after a
30 s grace if the executor never gets to call stop() on it.
- DistributedSearchIndexExecutor.stop() now calls workerExecutor.shutdownNow()
after flagging workers, so threads parked inside the bulk-sink semaphore,
initializeKeysetCursor, or waitForSinkOperations (5-min deadline) get
interrupted instead of grinding for minutes.
- OpenSearchBulkSink replaces concurrentRequestSemaphore.acquire() with a
60-second tryAcquire, recording permanent failure on timeout. A leaked
bulk future (callback never fires) can no longer permanently freeze every
subsequent flush at a fixed record count.
* Revert "Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration (#26307)"
This reverts commit e4d3e423e1.
* fix: apply ruff formatting after conflict resolution in Python files
* Feature #18173: Improve Version API, through paginatio, get x latest versions, specifict time, specific metadata changes
* Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration
* Update generated TypeScript types
* address comments
* fix py check
* Address comments
* Address comments
* Fix tests
* Fix tests
* Fix tests
* Better way to lookup versions
* Fix pytests
* Fix tests
* Address comments
* chore(migrations): move version API schema additions from 1.13.0 to 1.12.7
Moves the PR's new entity_extension columns (versionNum, changedFieldKeys),
indexes, and backfill scripts from the 1.13.0 migration directory into a
new 1.12.7 directory. Keeps 1.13.0 identical to upstream main; only this
PR's additions land in 1.12.7.
Also updates MigrationSqlStatementHashTest to exercise the relocated files.
* fix(versions): address CI failures and review feedback
- testAPI.test.ts: update getTestCaseVersionList mock expectation to include
the new params argument (APIClient.get is called with { params } since the
function now supports limit/offset/fieldChanged).
- PaginatedVersionHistory.spec.ts: replace banned networkidle waits and
waitForSelector with web-first assertion on version-button visibility
(satisfies playwright/no-networkidle and playwright/no-wait-for-selector).
- EntityVersionTimeLine.tsx: implement infinite scroll via IntersectionObserver
on a sentinel element at the bottom of the version list. Hooks up the
onLoadMore/hasMore/isLoadingMore props that were in the interface but
previously unused.
- EntityVersionPage.component.tsx: fix stale-closure bugs in fetchMoreVersions
(gitar-bot review). Use versionListRef for currentOffset and
isLoadingMoreRef to gate concurrent invocations so IntersectionObserver
double-firing does not cause duplicate appends.
- EntityResource.java: accept offset > 0 with default limit when no
fieldChanged is provided, so pagination params are no longer silently
ignored (Copilot review).
- datamodel_generation.py: raise explicit errors if generated files or
expected replacement targets are missing, instead of silently succeeding
when the generator output drifts (Copilot review).
* fix(checkstyle): format Java, ESLint/Prettier on UI, relax datamodel_generation strict check
- Java: spotless:apply on EntityResource.java (line-break formatting).
- Python: relax datamodel_generation.py DIRECT_IMPORT_FIXES check — replacement
targets are alternative forms the generator may or may not emit. Only
require the final marker ('from .paging import Paging') is present after
replacements; the prior strict per-target check broke 'make generate'.
- UI lint: organize-imports, ESLint --fix, Prettier on all version-related
files touched by the PR (resolves lint-src + lint-playwright CI checks).
- EntityVersionTimeLine: guard IntersectionObserver effect with isLoadingMore
so the observer is torn down while a fetch is in flight (Copilot review).
- EntityVersionTimeline.test.tsx: add unit tests covering sentinel rendering
conditions (hasMore, onLoadMore) and the isLoadingMore observer-guard
(Copilot review).
* fix(ui-checkstyle): prettier+eslint on EntityVersionTimeline.test.tsx
Collapse import line and reorder JSX props (callbacks last) per repo
lint rules. Reruns ui-checkstyle-changed caught these in the new test
file from the previous commit.
* test(playwright): address @aniketkatkar97 review on PaginatedVersionHistory spec
- Add waitUntil: 'domcontentloaded' to every page.goto() call.
- Wait for loaders (waitForAllLoadersToDisappear) before asserting the
version-button to avoid racing the initial entity render.
- Replace the manual { timeout: 15_000 } on versionSelectors.nth(1) with
an explicit waitForResponse on the second paginated /versions call
(offset > 0). This deterministically synchronises on the infinite-scroll
fetch instead of a wall-clock timeout.
* fix: address Copilot review — one-shot observer + local SQL splitter
1. EntityVersionTimeLine.tsx: call observer.unobserve(entry.target) as
soon as the sentinel first intersects so onLoadMore fires only once
per attached observer. The effect reattaches a fresh observer after
isLoadingMore flips back to false, so subsequent pages still load
— we just no longer rely on the parent's in-flight ref as the sole
stopgap against repeated fires for the same page.
2. MigrationSqlStatementHashTest.java: replace Flyway's non-public
org.flywaydb.core.internal.* parser classes with a small, local SQL
statement splitter. Handles line (--) and block comments, single-,
double-, and backtick-quoted strings, backslash escapes, and doubled-
quote escapes. Removes a brittle dependency on Flyway internals that
could break on upgrades.
Tested:
- mvn test -pl openmetadata-service -Dtest=MigrationSqlStatementHashTest
→ 2 tests pass.
- yarn test EntityVersionTimeline.test.tsx → 8/8 tests pass.
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: sonika-shah <sonika-shah@users.noreply.github.com>
Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com>
Co-authored-by: sonika-shah <sonikashah94@gmail.com>
Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>
* FIX#24374 - Data Contract at Data Product level
* Update generated TypeScript types
* FIX#24374 - Data Contract at Data Product level
* fix DP page
* fix: preserve termsOfUse object format in filtered contract
The termsOfUse field was being converted to a string during filtering,
but the form components expect it to be an object with {content: string}.
This was causing test failures where form elements were not visible.
- Keep termsOfUse as object format when not inherited
- Convert old string format to new object format for consistency
- Fixes 21 test failures in DataContracts.spec.ts and DataContractInheritance.spec.ts
* fix: address code review findings - state sync and immutability
Frontend changes:
- Add useEffect to sync formValues with filteredContract changes
- Ensures edit form updates when contract prop changes
Backend changes:
- Create deep copy at start of mergeContracts() to avoid mutating input
- Prevents side effects if contract object is reused elsewhere
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
* Addressing feedback
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
* fix tests
* fix inherited contract delete and status
* fix inherited contract delete and status
* fix inherited contract execution in app
* fix test
* fix: resolve playwright postgresql ci test failure
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
* ci: fix yaml validation and checkstyle failures
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
* fix: correct JSON/YAML validation errors
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
* fix: resolve maven-collate and ui-coverage test failures
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
* gitar feedback
* fix ci
* fix ci
* fix ci
* fix ci
* include .claude
* validate
* fix playwright
* playwright
* fix playwright
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Gitar <gitar@collate.io>
Co-authored-by: Gitar <noreply@gitar.ai>
Co-authored-by: pmbrull <pmbrull@users.noreply.github.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
Co-authored-by: karanh37 <karanh37@gmail.com>
* fix(dq): psql migration for row insert test parameters
* fix(dq): use name and add trailing new line
* Fix description formatting in postDataMigrationSQLScript.sql
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(ingestion): include database in MySQL data_diff URL (#24641)
The data_diff library requires MySQL URLs to specify a database in the
path (e.g., mysql://user:pass@host:port/database). Without this, the
table diff test fails with "MySQL URL must specify a database" error.
This fix adds MySQL and MariaDB to the list of dialects that need the
schema (which is the database in MySQL's terminology) included in the
URL path.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: conflicts with recent changes
* chore: translated missing arabic entries
* fix: conditional logic issue
* chore: fix failing tests
* style: ran java linting
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* fix: Handle special characters in passwords for TableDiff URL parsing
Fixes#24164
Replace urlparse with SQLAlchemy's make_url to properly handle special
characters (like ']', '[', '@', '#', '!') in database credentials when
building connection URLs for the Data Quality TableDiff test.
Python's urllib.parse.urlparse() incorrectly interprets ']' as the end
of an IPv6 literal, causing "Invalid IPv6 URL" errors when passwords
contain such characters.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: logic in implementation causing tests to fail
* chore: devex scripts
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Small refactor to the Make update_all scripts
* Extract regex update method
* decouple release from workflows
* Fix issue with docker-openmetadata-db workflow referencing non existent action
* docs(check-prerequisites): prerequisites recipe
- added scripts/check_prerequisites.sh.
- added usage in docs.
- added prerequisites to Makefile.
* docs(check-prerequisites): fix for docker version
* docs(check-prerequisites): fix for docker version
* docs(check-prerequisites): fix for docker version
* docs(check-prerequisites): revert docker-compose.yml
* Minor: add spotless and use simplecontext
* Remove context from rule evaluaiton
* Fix EventSubscription tests
* Minor: Migrate to latest google code style library to support Java 17 and beyond
* Minor: Ignore code style migration from git blame
* Moved more recipes into ingestion/Makefile
* Removed some recipes into ingestion/Makefile and added import statement
* Modified file paths so that 'make generate' works from the ingestion directory
* Modified checks for current directory
* Fixed function names to be in snake case
* Reverted function names back to camel case
* Reverted changes to js_antlr and py_antlr and moved generate command back into root directory Makefile
* Updated run_ometa_integration_testsrecipe in ingestion/Makefile
---------
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>