OpenMetadata

mirror of https://github.com/open-metadata/OpenMetadata synced 2026-05-24 09:39:11 +00:00

Author	SHA1	Message	Date
Sriharsha Chintalapani	d3bbbefe37	fix(rdf): dedupe lineage edges, surface Fuseki failures, port distributed-mode improvements (#27999 ) * fix(rdf): dedupe lineage edges and broaden PROV-O coverage The RDF Knowledge Graph endpoint was emitting two edges per lineage relationship — once as `om:UPSTREAM` (forward) and once as `prov:wasDerivedFrom` (reverse) — because the parser preserved each predicate's native subject/object orientation instead of canonicalizing both into a single `(upstream, downstream)` edge. Also extend PROV-O coverage so external SPARQL clients can use the W3C Provenance vocabulary directly: - `prov:Entity` / `prov:Activity` / `prov:Agent` class typing on datasets / pipelines / users - `prov:wasAttributedTo` mirror of `om:owners` - `prov:generated` (inverse of existing `wasGeneratedBy`) and `prov:used` on lineageDetails so the Entity → Activity → Entity chain is complete - `prov:hadPlan` + `prov:Plan` for SQL transformation recipes - `prov:startedAtTime` / `prov:endedAtTime` on Activity instances - `prov:wasAssociatedWith` Activity → Agent linking - `prov:invalidatedAtTime` on soft-deleted entities Other RDF cleanups in the same area: - LineageDetails URIs are now deterministic (driven by from/to ids instead of a timestamp), so re-indexing collapses duplicate Activity resources via the existing DELETE+INSERT idempotency - Skip emitting the redundant `om:owners` JSON-string literal — the mapped path already produces clean `om:hasOwner <agent>` triples - Skip empty `[]` array literals in the unmapped path - Propagate failures from `RdfRepository.{addRelationship, addLineageWithDetails, bulkAddRelationships, bulkAddGlossaryTermRelations}` instead of silently swallowing them, so downstream callers can surface the failure Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf-index-app): surface Fuseki failures in app run record Per-entity and per-batch failures from the RDF index app used to be logged via SLF4J only — they never made it into the AppRunRecord, so the UI/run history showed "completed" even when every entity had silently failed to write to Fuseki. - `RdfBatchProcessor.processEntities` now captures the last error per entity, returns it in `BatchProcessingResult.lastError`, and accumulates relationship-processing failures into the same result. - Relationship and lineage processing methods (`processBatchRelationships`, `processLineageRelationship`, `processGlossaryTermRelations`) return structured results with failure counts and last-error messages instead of `void`, so failures are visible to the partition worker. - `RdfIndexApp` records the failure on `jobData` for both the distributed and non-distributed code paths, so users see a real error message in the run history (e.g. "Failed to write entity X to Fuseki: ConnectException"). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * perf(rdf-index-app): port distributed-mode improvements from SearchIndex The RDF distributed-indexing fork was lagging behind several SearchIndex improvements that addressed concrete reliability and throughput issues. Port them across: Core perf / reliability - Precomputed partition start cursors: coordinator walks each entity once via keyset pagination at job init and caches the boundary cursor per (jobId, entityType, rangeStart). Workers consult the cache before falling back to the OFFSET-based path. Eliminates the previous O(N²) per-partition cursor lookup. - `cancelInFlightPartitions` + `requestStop` + `checkAndUpdateJobCompletion` on the coordinator. Stop now cancels both PENDING and PROCESSING partitions in a single SQL update and immediately drives the job status from STOPPING → STOPPED, so the UI status no longer hangs while workers drain. - Selective field hydration: `RdfPartitionWorker.readEntitiesKeyset` uses `ReindexingUtil.getSearchIndexFields(entityType)` instead of `List.of("")`, avoiding expensive fetchers (e.g. fetchAndSetOwns) per batch. - Partition heartbeat thread: virtual thread refreshes `lastUpdateAt` every 30s for partitions actively being processed by this server, so the stale reclaimer no longer interrupts active work. - `MAX_IN_FLIGHT_PARTITIONS_PER_SERVER = 5` backpressure: claim path rejects when the server already holds 5 PROCESSING partitions, giving fair distribution across pods. Verified the existing claim DAO uses `FOR UPDATE SKIP LOCKED` for both MySQL and Postgres. - Gate WebSocket stat broadcasts during the STOPPING phase so the Quartz-scheduler-driven STOPPED status push isn't overwritten. Multi-server scaffolding (single-pod is unaffected) - `RdfPollingJobNotifier`: DB-polling discovery for other server pods to find an in-flight RDF reindex they can join. - `RdfEntityCompletionTracker`: per-entity-type partition tracking with callback firing once all partitions for an entity complete, foundation for early per-entity index promotion. Tests: precomputed-cursor cache lookup, in-flight backpressure, cancelInFlight delegation, completion tracker callback semantics, notifier start/stop. DAO additions on `rdf_index_partition`: - `cancelInFlightPartitions(jobId, now)` — covers both PENDING and PROCESSING in one statement - `countInFlightPartitionsForServer(jobId, serverId)` — backpressure - `countPartitionsByStatus(jobId, status)` — used by completion check Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> fix(ui-apps): hide misleading data on synthetic 'CurrentConfig' row When an app has no run history, AppRunsHistory fabricated a synthetic placeholder row that looked like a real run — `runType: "CurrentConfig"`, a fake `Run At` timestamp pulled from `appData.updatedAt`, an ever-growing `Duration` (`now − updatedAt`), and an active `Stop` button that targeted nothing. Render `--` for `Run At`, `Run Type`, and `Duration` on synthetic rows, and hide the `Stop` button so users no longer see "Run now → 19-minute Running with Stop button" when the actual job never registered. Real app runs are unaffected — they still display `runType` from the backend (OnDemandJob, Hourly, Daily, Custom, etc.). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): address PR review findings Four issues raised in PR #27999 review: - Cursor format consistency in walkAndRecord (bug): The defensive branch produced cursors via a custom `{name, id}` map while the regular path used `repo.getCursorValue()`. For entities with quoted names these encodings diverge — a quoted-name entity could land in the cache with a cursor incompatible with what the worker fetches via keyset pagination. Track the last seen entity reference and run it through `repo.getCursorValue()` in both paths. `encodeBoundaryCursor` is removed. - Adaptive scheduling in RdfPollingJobNotifier (perf): The previous implementation woke the scheduler thread every 1s and short-circuited inside the poll method when idle. Reschedule the task at the appropriate interval (1s active / 30s idle) when `setParticipating` flips, so the thread genuinely sleeps when idle. - Cursor cache cleanup on startup recovery (edge case): `partitionStartCursors` was only evicted by `refreshAggregatedJob` / `checkAndUpdateJobCompletion`. If a coordinator crashed mid-job and never reached either, the cache entry leaked until process restart. Add `evictStaleCursorCacheEntries()` invoked by `performStartupRecovery` that drops entries for jobs that no longer exist in the DB or are already terminal. - Consolidate describeError helpers (quality): `describeError`, `describeBulkError`, and `describeLineageError` in `RdfBatchProcessor` all walked the cause chain and formatted a prefixed message with the same logic. Reduced to a single `describeError(prefix, error)` plus a thin `describeEntityError` adapter for the per-entity call site. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf-index-app): avoid double workerExecutor.shutdownNow() in stop() stop() called workerExecutor.shutdownNow() inline AND through cleanupLocalExecution -> shutdownWorkerExecutor, which broke the DistributedRdfIndexExecutorTest.stopAndCoordinatorCleanupOnlyTearDownLocalExecutionOnce verify(workerExecutor, times(1)).shutdownNow() expectation. Drop the inline call — cleanupLocalExecution is the single owner of the shutdown path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci: drop redundant DB matrix from openmetadata-service unit tests The {mysql, postgresql} strategy matrix on openmetadata-service unit tests doubled CI cost without adding signal: both jobs ran the same surefire suite. The `-Pmysql` / `-Ppostgresql` profiles are defined only in `openmetadata-sdk/pom.xml` (lines 190-206), set a single `test.database` property, and that property is consumed exclusively by the failsafe plugin (integration tests `IT.java` / `IntegrationTest.java`), which only runs under `-Pintegration-tests` — not enabled here. `openmetadata-service` itself has zero tests that read `test.database` or use `MySQLContainer`/`PostgreSQLContainer` (verified by grep). The only testcontainer-based DB code in the repo lives in `openmetadata-integration-tests`, a different module that this workflow doesn't build. Run the unit suite once. The `openmetadata-service-unit-tests-status` required-check aggregator is unaffected (it depends on the renamed job which still has the same name). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): address Copilot PR review findings Six correctness issues raised on PR #27999: - Lineage-details DELETE was too broad (RdfRepository): the cleanup step deleted all `<fromUri> om:hasLineageDetails ?d` triples, so reindexing one (fromId, toId) edge wiped lineage-details links for every other downstream of the same source entity. Pin the delete to the specific `<fromUri> om:hasLineageDetails <detailsUri>` triple. Same with prov:generated cleanup — anchor it to the specific detailsUri instead of any details resource. - Predicate not flipped during canonicalization (RdfRepository): `parseEntityGraphEdgesFromResults` swapped subject/object for reverse-direction predicates (`prov:wasDerivedFrom`, `prov:wasInfluencedBy`) but kept the original predicate URI on the resulting EdgeInfo. Exported graphs could carry semantically invalid triples like `<upstream> prov:wasDerivedFrom <downstream>`. Add `forwardEquivalentPredicate` to substitute the OM-native forward predicate when the direction flips. - `dct:modified` was an invalid xsd:dateTime (RdfPropertyMapper): `entity.getUpdatedAt().toString()` returns the epoch-millis Long as a string, but the literal was tagged `xsd:dateTime`. Convert via `Instant.ofEpochMilli(...).toString()` so the lexical form matches the type — same fix already in place for prov:invalidatedAtTime. - Unmapped EntityReference arrays were dropped entirely (RdfPropertyMapper): the previous fix to skip noisy JSON-string literals also dropped fields like `domains`, `reviewers`, `voters` for entity contexts that don't have a JSON-LD mapping for them — the unmapped path was the only path emitting them, so nothing landed in RDF. Expand each array element through `addEntityReference` so the data still produces proper `om:<fieldName> <ref>` triples; mapped-path duplicates are collapsed by Jena's Model dedupe. - Partition failure detection missed reader errors (DistributedRdfIndexExecutor): the EntityCompletionTracker was fed `result.errorMessage() != null`, but `RdfPartitionWorker` can increment `failedCount` from `readerErrors` without ever setting `lastError`. Use `result.failedCount() > 0` so partitions whose failures came from `ResultList.getErrors()` are also marked as failed when promoting an entity. - `COMPLETED_WITH_ERRORS` was hidden when failedRecords == 0 (RdfIndexApp): the coordinator marks a job COMPLETED_WITH_ERRORS whenever any partition is FAILED or CANCELLED, including for user-initiated stops where no record-level failures accrued. The monitor's `completedWithErrors` gate required `failedRecords > 0`, so those terminal states never hit `jobData.setFailure(...)` and the run record showed success. Drop the failedRecords precondition and tailor the fallback message based on whether there are record-level failures or partition-level only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): separate relationship failures + type lineage as prov:Activity Two more PR review findings on #27999: - Relationship failures inflated failedRecords stat: `processEntities` was folding relationship/lineage edge failures into `failedCount`, which becomes `failedRecords` in the index stats. Records there mean entities, computed from entity counts in `totalRecords`. Counting per-edge relationship failures could push `failedRecords` above `processedRecords`/`totalRecords` and produce nonsensical per-entity stats. Track them separately: add `relationshipFailureCount` to `BatchProcessingResult` and `PartitionResult`. `failedCount` now stays entity-level. The completion tracker is fed the broader `result.hasAnyFailure()` so partitions where relationship triples failed don't get prematurely promoted as success even though their entity writes succeeded. - `detailsResource` wasn't typed as prov:Activity: the resource carries Activity-shaped predicates (prov:startedAtTime, prov:endedAtTime, prov:used, prov:hadPlan, prov:wasGeneratedBy, prov:wasAssociatedWith) but only the OM-specific `om:LineageDetails` rdf:type. Add an explicit `rdf:type prov:Activity` so PROV-O reasoners and federated SPARQL clients recognize it as an Activity without having to learn the OM type. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): label lineage edges relative to focal node The Knowledge Graph view was labeling every edge with relation type "upstream" as "Upstream" regardless of direction relative to the focal node. For a focal node F, the raw stored relation `(F, X, upstream)` means "F is upstream of X" — i.e. X is downstream of F. The previous output labeled both `F → X` and `X → F` edges as "Upstream", which made bidirectional lineage look like a duplicated relation. Re-orient the label in `convertEdgesToGraphData` based on whether the focal is the edge's source or target: - focal → X → "Downstream" - X → focal → "Upstream" - non-focal-touching edges keep the raw relation label. Reported on a sample-data table with a circular lineage cycle (`dim_customer ↔ fact_orders`) where both directions showed "Upstream". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): close remaining Copilot review gaps Three findings from PR #27999's third review pass — all about failure signals being silently dropped between layers: - `RdfIndexApp.processTask` ignored relationship failures: only `result.failedCount() > 0` was treated as a failure, so partitions whose Fuseki relationship/lineage writes failed (incrementing `relationshipFailureCount` but not `failedCount`) never wrote `jobData.failure`. Switch to `result.hasAnyFailure()` and report the combined count. - `checkAndUpdateJobCompletion` ignored partition `lastError`: a partition can finish COMPLETED with `lastError` set when a relationship bulk write was caught and recorded but didn't bump `failedRecords` or flip the partition to FAILED. The job would then go to COMPLETED even though there were real failures. Treat the presence of any `rdf_index_partition.lastError` as an error signal — promote to COMPLETED_WITH_ERRORS and aggregate sample errors into the job's errorMessage if it was blank. - `forwardEquivalentPredicate` mapped to a non-existent `om:DOWNSTREAM` URI: OpenMetadata only stores lineage with `om:UPSTREAM` (forward) and `prov:wasDerivedFrom` (reverse PROV-O pair); there is no `om:DOWNSTREAM` predicate written anywhere — the downstream view is derived by reading the same UPSTREAM edge from the other side. Map both `prov:wasDerivedFrom` and `prov:wasInfluencedBy` to `om:UPSTREAM` (both are reverse-direction causation predicates: in `B wasDerivedFrom A` / `B wasInfluencedBy A` the source is A and effect is B, so the canonical forward predicate is the same). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Fix RDF tag mapper * Fix all the comments Cherry-picked from #27562 (without bin/ autogenerated noise). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Align RdfPropertyMapper tests with refactor and isolate ontology export IT RdfPropertyMapperTest still referenced the removed addVotes helper and expected addStructuredProperty to dispatch votes — both gone after votes was added to IGNORED_PROPERTIES. Update the assertions accordingly. GlossaryOntologyExportIT timed out on the full suite because it flips a global RDF singleton in @BeforeAll and each test blocks a server thread on synchronous Fuseki writes. SAME_THREAD only serialized methods within the class — concurrent classes still raced for server threads. Adding @Isolated matches the pattern already used by RdfResourceIT for the same reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rdf): align addCertification typing + relationType after predicate flip Two findings on PR #27999 from the post-cherry-pick review pass: - `addCertification` mis-typed glossary-source certifications and skipped skos:Concept: it always emitted `om:Tag` regardless of source, even though `resolveTagResource` returns a glossaryTerm URI when the certification points at a glossary term. It also didn't add `skos:Concept` (or the `createTypeResource("tag")` `skos:Concept` for classification tags), so SPARQL queries filtering certification targets by `a skos:Concept` missed them while `addTagLabel`-emitted tags were findable. Mirror `addTagLabel`: branch on source (`Glossary` vs `Classification`), emit the right primary type plus `skos:Concept` (glossary) or `om:Tag` (classification), and include `om:tagSource`. - `relationType` left stale after predicate flip: when `parseEntityGraphEdgesFromResults` flipped subject/object for a reverse-direction predicate and rewrote `canonicalPredicate` to `om:UPSTREAM`, it kept the original `relationType` derived from the reverse predicate. So `prov:wasInfluencedBy` produced an EdgeInfo with `relationType=downstream` + `predicate=om:UPSTREAM` — internally inconsistent, and the mismatched `edgeKey` prevented dedup against an existing UPSTREAM edge with the same endpoints. Re-derive `relationType` from the canonical predicate after the flip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): close 2 review findings + add parser-helper unit tests Two outstanding Copilot findings on PR #27999 plus targeted unit coverage for the helpers that drive lineage canonicalization. Findings: - `colLineageUri` collision risk (RdfRepository): the deterministic key replaced non-alphanumerics in `toColumn` with `_`, so distinct column names (e.g. `a-b` vs `a_b`) collapsed onto the same URI, which would lose / overwrite column-lineage resources during reindex. Append the loop index as a tiebreaker so distinct columns keep distinct URIs. - `createTypeResource` missing dprod prefix (RdfPropertyMapper): the `getNamespace` switch didn't recognize `dprod`, so `RdfUtils.getRdfType("dataProduct")` (returns `dprod:DataProduct`) produced an invalid `dprod:DataProduct` URI on the wire. Added the `DPROD_NS = https://ekgf.github.io/dprod/` constant and a `dprod` case in the switch. Coverage: - New `RdfParserHelpersTest` exercises the canonicalization helpers via reflection: `isReverseDirectionPredicate` (recognizes PROV-O causation predicates, ignores forward predicates), `forwardEquivalentPredicate` (both `wasDerivedFrom` and `wasInfluencedBy` collapse to `om:UPSTREAM` so dedup works), `relativeRelationLabel` (focal-relative Upstream/Downstream flipping with all the boundary cases — non-focal edges, non-lineage relations, null focal). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): merge array contexts before per-field resolution The third (low-confidence "suppressed") finding on review 4256830399 turned out to be a real duplication: when a field is mapped in one context map of an array context but absent from another, the previous processArrayContext ran processContextMappings once per map. The pass where the field IS mapped emits the proper `om:hasOwner <ref>` triples (plus `prov:wasAttributedTo`); the pass where the field is absent falls through to processUnmappedField and emits an additional `om:owners <ref>` triple. Net: two predicates for the same logical relationship. Verified on the live Fuseki: 113 `om:hasOwner` triples vs 112 `om:owners` triples — one set per pass. Fix: flatten all context maps in the array into a single merged map once, then iterate entity fields exactly once against that combined view (later contexts win on key conflicts, matching JSON-LD context merge semantics). Each field is resolved against the union of mappings, so the unmapped fallback only fires for fields truly absent from every context. Net effect: `prov:wasAttributedTo` count is unchanged, `om:hasOwner` is unchanged, and the redundant `om:owners` triples disappear. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(rdf): close 2 review findings on coordinator finalization race Two findings from PR #27999 review 4259628860: - `checkAndUpdateJobCompletion` early-returned before lastError check could promote: `refreshAggregatedJob` already marks the job COMPLETED when partitions all finish without `failedRecords`/`failedPartitions`, so `checkAndUpdateJobCompletion`'s subsequent `if (job.isTerminal())` short-circuit silently dropped the lastError signal. Move the partition-lastError check INTO `refreshAggregatedJob` so both code paths produce consistent terminal status — a partition that finished COMPLETED but carries a non-null lastError now correctly promotes the job to COMPLETED_WITH_ERRORS regardless of which finalizer wins the race. - `completePartition` / `failPartition` overwrote CANCELLED state: the unconditional partition row update lost a concurrent Stop's CANCELLED status if a worker finished its batch after the Stop request landed but before noticing it. Add a status-guarded `updateIfProcessing` DAO method (UPDATE ... WHERE id = :id AND status = 'PROCESSING') and have both completion paths use it; if 0 rows update, log and skip the side effects (no server-stat increment, no refreshAggregatedJob call) so the authoritative CANCELLED status stays. Mirrors the pattern SearchIndex's coordinator uses for the same race. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>	2026-05-11 06:14:50 -07:00
Mohit Tilala	3d6fd71de3	Fixes #27950 : [Datalake] JSON columns incorrectly typed as STRING for empty dict values (#27951 ) * fix: datalake JSON columns incorrectly typed as STRING for empty dict values * fix: wrap df_row_val with str() for strptime and parse calls to satisfy type checker * fix: address static check type errors and review comments in datalake utils * Restore debug logging, fix dead-code fallback, strengthen tests * Replace lexicographic max() with explicit type precedence in fetch_col_types	2026-05-11 18:02:06 +05:30
Shailesh Parmar	a00a8dcdb4	test: enhance FailedTestCaseSampleData tests with mock Table component (#28028 )	2026-05-11 12:04:53 +00:00
Ryad-Lotfi MAHTAL	97e3ae52db	Fixes #22916 : Add chart-level lineage for Metabase connector (#26778 ) Some checks are pending Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run Details Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions Details Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run Details Maven Collate Tests / maven-collate-ci (push) Waiting to run Details Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions Details Java Checkstyle / java-checkstyle (push) Waiting to run Details Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run Details OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions Details * fix: add chart-level lineage for Metabase connector * refactor: extract _get_chart_entity helper and move lookups outside source_tables loop Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test_yield_lineage to assert chart-level lineage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: add type guards for chart-level lineage to satisfy basedpyright Guard chart lineage yields with isinstance(from_entity, Table) and None-check on chart_entity to produce type-safe generator yields, eliminating reportArgumentType and reportReturnType errors from the static-checks CI step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: guard optional metabase lineage lookups * fix: normalize metabase lineage search results * test: cover metabase lineage fallback cases * build: use canonical Maven Central URL --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>	2026-05-11 16:40:49 +05:30
sonika-shah	6c30d82f4c	fix(security): pin libthrift, provided jsonschema2pojo, bump azure-kv/sjm/reactor-netty, exclude netty-epoll (#28010 ) * fix(security): pin libthrift 0.23.0 and exclude Jackson 3.x from jsonschema2pojo-core - Pin org.apache.thrift:libthrift to 0.23.0 in dependencyManagement. apache-jena-libs:4.10.0 transitively pulls libthrift:0.19.0 which is vulnerable to CVE-2026-43869 (fixed in 0.23.0). - Exclude tools.jackson.core:jackson-core and jackson-databind from jsonschema2pojo-core in common/pom.xml. jsonschema2pojo-core 1.3.x switched its internal Jackson to 3.x; the existing exclusion only covered the legacy com.fasterxml.jackson.core groupId, so 3.0.2 jars were leaking into the runtime classpath despite our annotator code using Jackson 2.x exclusively. Removes exposure to: - GHSA-2m67-wjpj-xhg9 - CVE-2026-29062 - GHSA-72hv-8253-57qq (3.x line) * chore(security): bump azure-security-keyvault-secrets and simple-java-mail to fix transitive CVEs - com.azure:azure-security-keyvault-secrets 4.10.0 → 4.10.7 4.10.7 declares azure-core-http-netty 1.16.4, which uses reactor-netty-http 1.2.16. Replaces the second source path of reactor-netty-http 1.0.48 in the OM standalone dist. Fixes CVE-2025-22227 (the azure-kv path). - org.simplejavamail:simple-java-mail 8.12.2 → 8.12.6 Hygiene bump (4 patch versions). Note: simple-java-mail 8.12.6's master pom still pins angus-mail to 2.0.3, so the actual angus-mail fix for CVE-2025-7962 still relies on OM's existing <angus-mail.version>2.0.4</angus-mail.version> dep-management entry, which already wins for OM standalone (verified: openmetadata-1.12.7 dist already ships angus-mail-2.0.4.jar). * fix(security): switch libthrift fix from version-pin to exclusion; expand reasoning comments libthrift (CVE-2026-43869): Replace the dependencyManagement pin to 0.23.0 with an explicit <exclusion> on apache-jena-libs. OM's source tree has zero org.apache.thrift imports and no references to RDF Thrift binary serialization (RDF_THRIFT, ThriftConvert, RDFFormat.THRIFT) — the only consumer of libthrift in our dep tree is Jena's optional RDF Thrift I/O code path, which OM never exercises. libthrift 0.23.0 was published 2026-05-08 and no Jena release yet ships it (Jena 6.0.0 and 5.6.0 still ship libthrift 0.22.0, also vulnerable). Pinning would force a Jena-uncertified libthrift onto code Jena tests with 0.22.0; excluding the unused JAR is cleaner and self-cleaning when Jena bumps. Lucene/Solr (also in this dep tree) already excludes libthrift for the same reason — confirmed via lucene-solr-grandparent pom. Jackson 3.x exclusion: expanded the comment in common/pom.xml to record the upstream state (jsonschema2pojo-core 1.3.3 still pins jackson3.version=3.0.2) and the verification that build succeeds with the exclusion. fix(security): mark jsonschema2pojo-core as <optional> instead of maintaining per-dep exclusion list Per Copilot review on PR #28010 (line 66 of common/pom.xml): jsonschema2pojo-core is build-time only — the annotator classes that reference it (PasswordAnnotator, MaskedAnnotator, etc.) are invoked exclusively by the jsonschema2pojo-maven-plugin at code-gen time, never on the runtime classpath of any deployed service. Switch from a growing list of <exclusion> entries (which only caught the deps known at the time each entry was added) to <optional>true</optional>. This stops jsonschema2pojo-core AND every transitive dep it pulls — current and future — from propagating to downstream consumers' runtime classpath. Effect on the GHSA-2m67-wjpj-xhg9 / CVE-2026-29062 / GHSA-72hv-8253-57qq fix: the jackson-core-3.0.2 / jackson-databind-3.0.2 jars (groupId tools.jackson.core) no longer leak into the dist via this path. Verified: $ mvn -pl openmetadata-service dependency:tree -Dincludes='tools.jackson.core:,org.jsonschema2pojo:' (empty) $ mvn -pl openmetadata-spec -am install -DskipTests BUILD SUCCESS (annotator code-gen still works — jsonschema2pojo-maven-plugin pulls jsonschema2pojo-core via its own <dependencies> block, and adds common.jar there too via openmetadata-spec/pom.xml) * fix(security): revert libthrift exclusion → pin to 0.23.0; Jena statically references TException The exclusion broke RDF tests: RdfInferenceConfigurationTest, RdfPropertyMapperTest, SparqlBuilderNestedFieldsTest, SqlToSparqlTranslatorTest fail with `Could not initialize class org.apache.jena.rdf.model.ModelFactory` and `org/apache/thrift/TException` (NoClassDefFoundError). Even though OM never calls RDF Thrift I/O directly, several Jena classes (ModelFactory, PrefixMappingImpl, etc.) statically reference org.apache.thrift.TException at class-init time. Removing libthrift fails class loading on the very first use of any Jena Model. The grep for `org.apache.thrift` in OM source missed this because the references are in Jena's own bytecode, not OM's source. Reverting the exclusion. Pinning libthrift to 0.23.0 in dependencyManagement remains the only available fix: - No Jena release ships the fix (latest 6.0.0 still uses 0.22.0; libthrift 0.23.0 was published 2026-05-08). - Exclusion breaks the build (above). - Pinning forces the fixed version onto Jena's classpath; libthrift maintains backwards-compatible binary protocol semantics, so Jena's runtime usage continues to work. CI will validate. In-pom comment expanded to record this discovery so the trade-off doesn't get re-litigated next round. * chore: shorten security comments in poms * fix(security): exclude netty-transport-native-epoll from azure-core-http-netty GHSA-rwm7-x88c-3g2p / CVE-2026-42577 (AWS Inspector reports HIGH). The bug is in netty 4.2.x epoll; we ship 4.1.x. The advisory's machine-readable vulnerable_version_range is < 4.2.13.Final (overly broad), which causes scanners to flag 4.1.x even though the buggy code path was never in 4.1. Bumping our netty to 4.2.13.Final is blocked by Azure SDK / gRPC / AWS SDK / reactor-netty all targeting 4.1.x. Instead, exclude the Linux native binding JAR (the only thing in our tree that is named io.netty:netty-transport-native-epoll) so the flagged artifact stops shipping in the dist. Netty's standard pattern is to call Epoll.isAvailable() and fall back to NioEventLoopGroup when the native binding is absent — the exact same code path already used on macOS/Windows deployments. netty-transport-classes-epoll (the Java classes, required by reactor-netty/lettuce/AWS-netty-nio-client bytecode references) stays. Verified: mvn -pl openmetadata-service -am dependency:tree \ -Dincludes='io.netty:netty-transport-native-epoll' -> empty (was: 4.1.133.Final-linux-x86_64) * fix(security): align reactor-netty-http dep-mgmt pin to 1.2.16 Per Copilot review on PR #28010 (line 19): the bump of azure-kv to 4.10.7 was described as bringing reactor-netty-http 1.2.16, but the existing dep-mgmt pin to 1.2.14 was overriding the transitive (mvn dependency:tree confirmed 1.2.14 was the actual resolved version). Bump the pin 1.2.14 → 1.2.16 to match what azure-core-http-netty 1.16.4 ships transitively. Both are above the CVE-2025-22227 fix line (≥ 1.2.8), so this is a pin-alignment cleanup, not a security delta. * fix(security): switch jsonschema2pojo-core from <optional> to <scope>provided</scope> Semantically more correct for a build-time-only dep. The annotator classes (PasswordAnnotator, MaskedAnnotator, etc.) are invoked only by jsonschema2pojo-maven-plugin at code-gen time in its own classloader; the runtime classpath of any deployed service never needs jsonschema2pojo-core. <scope>provided</scope> says exactly that: - on compile + test classpath (so annotators compile) - excluded from runtime / dist packaging by default - not propagated to downstream consumers Same scanner outcome as <optional>true</optional> — Jackson 3.x JARs still don't ship in the dist — but cleaner expression of intent. CVE coverage unchanged: GHSA-2m67-wjpj-xhg9, CVE-2026-29062, GHSA-72hv-8253-57qq. Verified: mvn -pl openmetadata-spec -am install -DskipTests → BUILD SUCCESS mvn -pl openmetadata-service dependency:tree -Dincludes='tools.jackson.core:,org.jsonschema2pojo:' → empty * fix(security): switch netty-epoll exclusion from dep-mgmt to per-direct-dep Per Copilot review on PR #28010: the previous parent-pom dep-management entry for azure-core-http-netty with <exclusion> on netty-transport-native-epoll did work (verified via mvn dependency:tree — exclusion DOES propagate to transitive resolution in dep-mgmt), but Copilot raised a concern that pinning azure-core-http-netty to 1.16.4 would block future Azure SDK bumps if a newer SDK requires a higher azure-core-http-netty. Same refactor as already applied to ai-platform PR #669. Remove the parent dep-mgmt entry; apply per-direct-dep <exclusions> on the 3 azure-* deps that transitively bring azure-core-http-netty in openmetadata-service: - azure-security-keyvault-secrets - azure-identity - azure-storage-blob Exclusion now travels with whatever azure-core-http-netty version each SDK chooses; SDK bumps are no longer blocked by a hardcoded version. Verified: mvn -pl openmetadata-service dependency:tree -Dincludes='io.netty:netty-transport-native-epoll' returns empty. * fix(security): extend netty-epoll exclusion to azure-identity-extensions Per gitar-bot review on PR #28010: add the netty-transport-native-epoll <exclusion> to azure-identity-extensions for consistency with the 3 other azure-* direct deps in openmetadata-service/pom.xml that already have it (azure-security-keyvault-secrets, azure-identity, azure-storage-blob). Defensive: today's resolution is already clean because Maven's nearest-definition rule picks the directly-declared azure-identity:1.15.2 (with our exclusion) over the transitive azure-identity:1.7.1 brought by azure-identity-extensions:1.0.0. Adding the exclusion here protects against a future refactor that removes the direct azure-identity declaration. Verified: mvn -pl openmetadata-service dependency:tree -Dincludes='io.netty:netty-transport-native-epoll' still returns empty. --------- Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>	2026-05-11 14:08:26 +05:30
Harsh Vador	7c844d77d2	Fix fast-uri Dependabot vulnerabilities in UI core components (#28020 )	2026-05-11 08:30:58 +00:00
Pere Miquel Brull	f4cb7d0f14	feat(ingestion): add QuestDB database connector (#27604 ) * feat(ingestion): add QuestDB database connector QuestDB speaks the PostgreSQL wire protocol but implements a minimal pg_catalog, so the default PG dialect queries fail on the CHAR->DOUBLE cast in pg_class.relkind. This connector routes SQLAlchemy inspection through information_schema and short-circuits constraint/index lookups (QuestDB has no PK/FK/unique/indexes), letting CommonDbSourceService handle the rest of the topology unchanged. - Fixed /qdb target in the psycopg2 URL regardless of databaseName (which remains the OpenMetadata display name) - get_database_names defaults to 'qdb' instead of 'default' - 12 unit tests + live-verified against QuestDB 9.3.5 on localhost:8812 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(questdb): address review feedback — rename to QuestDB, wire UI Code review fixes for PR #27604: Blockers resolved: - Rename Questdb -> QuestDB across schema, enum, Python classes, and all generated TS files. Matches peer connectors (PinotDB, DynamoDB) and the product's actual brand. Changing post-merge would be a breaking migration. - Remove sslConfig from schema. QuestDB's sslConfig was declared but never wired — ssl_manager.check_ssl_and_init is @singledispatch and has no QuestDBConnection registration, so enabling SSL in the UI was a silent no-op. Can be added in a follow-up with an explicit psycopg2 wiring. Warnings resolved: - authType now in schema's required array — was failing with opaque 401. - Delete dead queries.py (QUESTDB_TEST_GET_TABLES was defined but never imported). - Add bytea -> LargeBinary to the type map (verified via live information_schema probe against QuestDB 9.3.5 — all other native types normalize to standard PG names that were already mapped). - Complete type annotations on utils._get_table_names, _get_columns, _information_schema_type. - Dialect patch test now uses a real PGDialect_psycopg2 instance instead of a MagicMock dialect, so it catches signature drift against the real SQLAlchemy Inspector contract. Added a separate test that verifies get_table_names emits a query against information_schema.tables (not pg_catalog). - Add ingestion_logger() to utils.py with a debug log on dialect patching. - _empty_view_definition now returns None instead of "" to match how other dialects signal the absence of a DDL. Also fixes UI visibility (QuestDB was missing from the service picker): - Regenerate 15 TS enum files via json2ts.sh -> quicktype so the new DatabaseServiceType.QuestDB value flows through the UI. - Register service-icon-questdb.png in ServiceIconUtils.ts. - Add locales/en-US/Database/QuestDB.md connector form docs. - Add quicktype as a devDependency — json2ts.sh needs it and it wasn't installed. Docs: update skills/connector-building and skills/standards/registration to reflect reality — i18n locale files are not needed, icon + locale MD registration steps are, and Services.constant.ts is deprecated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * skill * fix(questdb): restore databaseSchema field for test connection test_connection_db_schema_sources reads service_connection.databaseSchema directly with no hasattr guard. Removing it from the schema in the prior review fix broke GetTables and GetViews steps: 'QuestDBConnection' object has no attribute 'databaseSchema' Restored as an optional string with a clearer description (defaults to public when unset). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix owners * add yaml * Update generated TypeScript types * Sync package.json and yarn.lock with main * Fix: ingestion files , Added Lineage for questdb tests and UI changes, Refactored code * FIX: python_checkstyle * Fix: test and unused param * Fix: yield_table enforcing tabletype to partition, Refactored lineage * Fix: Failing test and remove print statement * FIX: python_checkstyle and added error handling * FIX: Resolved comments * FIX: failing tests and schema cleaning * Minor change * Fix: Failing unit tests * Fix: Unit test unrelated changes ignored * FIX: tests * Fix: Failing test due to extra parameter in yaml --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local> Co-authored-by: Akash Verma <138790903+akashverma0786@users.noreply.github.com>	2026-05-11 13:02:32 +05:30
Eugenio	6ac135dc7e	Fixes 21329: exclude temporal table period columns from autoClassification sampling (#27960 ) * fix(azuresql): exclude temporal table period columns from sampling Query sys.columns for generated_always_type to detect SYSTEM_TIME period columns (ValidFrom/ValidTo) and skip them in both schema reflection (mssql/utils.py) and sample data fetching (AzureSQLSampler). Also moves the catalog round-trip inside the `if columns` guard to avoid the query when column filtering is not in use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(azuresql): add unit tests for temporal column exclusion Adds sampler unit tests covering period-column filtering and NOT_COMPUTE_PYODBC exclusion. Adds a PII processor test case for temporal tables using single first-names to avoid non-deterministic NER matches. Corrects customers_sensitive expected tags to include address→PII.NonSensitive, which the classifier now correctly detects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(azuresql): add full workflow integration test for temporal tables Replaces the isolated sampler unit test with an end-to-end integration test that registers the AzureSQL service, creates a system-versioned table, runs MetadataWorkflow then AutoClassificationWorkflow, and asserts that sample data excludes ValidFrom/ValidTo. Includes SQL permission prerequisites and troubleshooting guide in the module docstring. Teardown controlled by AZURE_SQL_CLEANUP env var (default: true). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix `spacy<3.8` for `ingestion/[dev]` --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 11:45:40 +05:30
Harsh Vador	6068a10dbe	(feat)ui: migrate form builder in connection form (#27812 ) * (feat)ui: migrate form builder in connection form * add core field support * fix failing test & fix checkstyle * fix failing test * improvise fields visibility * fix failing test * improve spacing * add password hint support * fix ui checkstyle * address gitar * fix oneOfField stale issues * address gitar along with test * fix failing sonar * array field type to ui-core and advanced config to accordion usage * address gitar * remove form bg, handle breadcrumb navigation * fix mocks * handle layout, spacing , bg color * handle bg colors * use core-components * fix checkstyle * radio buttons bg color and spacing * remove hideBgGrey prop * nit * add dedicated EmbeddedAddServicePage for askcollate route & fix checkstyle * add unit tests	2026-05-11 11:07:17 +05:30
Harsh Vador	86e1d88386	security: Include branch name in security scan Slack alerts and fail only on high vulnerabilities (#27977 ) * Add branch context to security scan Slack alerts and upload CSV findings summary * change failing severity from medium to high & address gitar * fix csv formatting * revert flattening changes	2026-05-11 10:41:48 +05:30
Pere Miquel Brull	7e0ee80c28	feat(search): add Google Gemini embedding provider (#27974 ) Some checks are pending Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run Details Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions Details Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions Details Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run Details Java Checkstyle / java-checkstyle (push) Waiting to run Details Maven Collate Tests / maven-collate-ci (push) Waiting to run Details OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions Details Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run Details * Add design: Google Gemini embedding client Adds a fourth embedding provider (google) alongside openai/bedrock/djl, using the Generative Language API with a single API key. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Add implementation plan: Google Gemini embedding client 7 tasks covering schema change + regen, client implementation, validation tests, error path tests, request shape tests, switch wiring, and final verification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(spec): add google embedding provider config block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(search): add GoogleEmbeddingClient with happy-path test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(search): extract MODELS_PREFIX constant in GoogleEmbeddingClient The string "models/" appeared in both DEFAULT_BASE_URL and the buildRequestBody method. Extract it as a named constant per project standards. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): add constructor validation tests for GoogleEmbeddingClient Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): add blank model id test and clarify null-modelId workaround Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): add HTTP error and malformed response tests for GoogleEmbeddingClient * test(search): tighten empty values array assertion to check message Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): verify Google embedding request URL, headers, and body shape Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(search): extract endpoint constant and harden extractBody helper Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(search): wire google embedding provider into SearchRepository switch * test(search): cover null dimension and custom endpoint, drop redundant comment Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Remove internal planning docs from PR These were workflow scaffolding (design spec + implementation plan) generated by the superpowers brainstorming/planning flow; they belong in the local development trail, not the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address PR review comments - GoogleEmbeddingClient.buildRequest: handle endpoint with existing query string by switching the key separator from '?' to '&' as needed; document why the API key travels in the URL (Google Generative Language API requirement, not Bearer-header). - GoogleEmbeddingClient.extractErrorMessage: replace empty catch block with a trace-level log to comply with the 'no empty catch' standard. - elasticSearchConfiguration.json: clarify google.endpoint description so operators know it must be the full ':embedContent' URL, not a base URL. - GoogleEmbeddingClientTest.extractBody: await onComplete via CompletableFuture.get(5s) instead of relying on synchronous publisher delivery; surface onError properly. - New test: testEndpointWithExistingQueryStringUsesAmpersand verifies the '?' / '&' separator logic. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Wire google embedding provider into openmetadata.yaml defaults - Add `google:` block under naturalLanguageSearch with env-var fallbacks (GOOGLE_API_KEY, GOOGLE_EMBEDDING_MODEL_ID, GOOGLE_EMBEDDING_DIMENSION, GOOGLE_API_ENDPOINT). - Update embeddingProvider option list comment to include "google". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Use gemini-embedding-001 default and pass outputDimensionality The previous default (text-embedding-004) is rejected on some Google projects with `404: not found for API version v1beta, or is not supported for embedContent`. Switch to gemini-embedding-001 — the current GA model, available at v1beta and broadly accessible. - GoogleEmbeddingClient.buildRequestBody: include outputDimensionality from the configured embeddingDimension. Required for gemini-embedding-001 (defaults to 3072 dims otherwise) and supported as a truncation hint by text-embedding-004. - elasticSearchConfiguration.json + openmetadata.yaml: change default embeddingModelId to gemini-embedding-001 and document the outputDimensionality semantics on the embeddingDimension field. - GoogleEmbeddingClientTest.testRequestBodyShape: assert outputDimensionality=768 in the captured body and use gemini-embedding-001 as the test fixture model. - SystemRepository.getEmbeddingConfigurationMessage: add a `google` case so /api/v1/system/status surfaces the configured model/endpoint instead of "Unknown provider 'google'". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Guard against missing google config in SystemRepository diagnostic If `embeddingProvider=google` but the `google` config block is absent, calling `nlpConfig.getGoogle().getEndpoint()` would NPE and produce a misleading "Unable to determine embedding configuration" message. Add an explicit null check that yields a clear diagnostic instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Validate google.endpoint contains :embedContent at construction A custom endpoint missing the `:embedContent` action used to silently produce 404s at runtime. Fail fast at startup with a clear message showing the expected URL form, so misconfiguration surfaces in logs instead of in vector-search failures. - Update testCustomEndpointConstruction to use a valid full URL. - Add testCustomEndpointWithoutEmbedContentThrows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(spec): add modelId chat field to google block Adds a `modelId` property to the natural-language-search `google` block, parallel to how the `openai` block exposes both `modelId` (chat) and `embeddingModelId` (embedding). This enables Gemini-based NLQ filter extraction (chat completions via :generateContent) on top of the existing embedding support. Default: gemini-2.5-flash. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Update generated TypeScript types * trigger --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-05-10 16:37:53 +02:00
Shailesh Parmar	2967a7f0a8	refactor: replace RouterUtils with ObservabilityRouterClassBase for navigation paths (#27956 ) * refactor: replace RouterUtils with ObservabilityRouterClassBase for navigation paths * feat: migrate navigation to observabilityRouterClassBase in DataQuality and IncidentManager components * refactor: format navigation calls and imports for consistency across components * test: mark 'Pipeline Alert' and permission tests as slow	2026-05-10 16:50:00 +05:30
dependabot[bot]	41cfcf995e	chore(deps): bump fast-uri in /openmetadata-ui/src/main/resources/ui (#28004 ) Some checks are pending Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run Details Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions Details Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run Details Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions Details Java Checkstyle / java-checkstyle (push) Waiting to run Details Maven Collate Tests / maven-collate-ci (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions Details Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run Details Bumps [fast-uri](https://github.com/fastify/fast-uri) from 3.1.0 to 3.1.2. - [Release notes](https://github.com/fastify/fast-uri/releases) - [Commits](https://github.com/fastify/fast-uri/compare/v3.1.0...v3.1.2) --- updated-dependencies: - dependency-name: fast-uri dependency-version: 3.1.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-09 17:22:17 +00:00
Laura	882ef3f8c5	add nlq to OpenMetadataApplicationConfig (#27988 ) * add nlq to OpenMetadataApplicationConfig * move config under naturalLanguageSearch * openai client * Update generated TypeScript types --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>	2026-05-09 18:15:00 +02:00
Harshit Shah	0ff09b4915	Migrate FailedTestCaseSampleData table to core-ui Table component (#27985 ) Some checks are pending Java Checkstyle / java-checkstyle (push) Waiting to run Details Maven Collate Tests / maven-collate-ci (push) Waiting to run Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions Details * refactor(FailedTestCaseSampleData): migrate table to core-ui Table component - Replace Ant Design Table with core-ui Table (react-aria-components) - Add border wrapper tw:border tw:border-border-secondary tw:rounded-[10px] - Add 210px min-width on data cells with horizontal scroll - Add 8px padding on header and data cells - Center diff-type column content vertically and horizontally - Move all styles from .less file to tw: classes using theme tokens - Delete failed-test-case-sample-data.less Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix checkstyle --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 09:38:14 +00:00
Harshit Shah	8570d36830	Migrate IncidentManager table to core-ui Table component (#27972 ) * refactor(IncidentManager): migrate table to core-ui Table component - Replace Ant Design Table with core-ui Table (react-aria-components) - Use plain renderRow function (matching DataQualityTab pattern) with static Table.Cell children and Table.Body dependencies to fix status/ severity/assignee columns stuck at loading skeleton - Fix popover max-height distortion by adding popoverClassName prop to IncidentStatusPopoverShell and applying tw:!max-h-none via react-aria className override - Update unit test mock for @openmetadata/ui-core-components to include Table component - Update e2e selector from Ant Design .ant-table-tbody to data-testid based tbody tr selector Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix checkstyle * address gitar-bot comments * address comments --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 12:04:32 +05:30
Harsh Vador	f3ef11cf50	fix: use useClipboard hook in CodeBlockComponent to fix clipboard on non-secure contexts (#28003 )	2026-05-09 09:31:21 +05:30
Sriharsha Chintalapani	22a6c10072	Context center (#27558 ) Some checks are pending Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run Details Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions Details Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run Details Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions Details Java Checkstyle / java-checkstyle (push) Waiting to run Details Maven Collate Tests / maven-collate-ci (push) Waiting to run Details OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions Details Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run Details * Add Context Center: Migrate Knowledge Center , Images/ PDFs document support * Add Context Center: Migrate Knowledge Center , Images/ PDFs document support * Address PR #27558 review comments - KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch and getHierarchyWithSearchForActivePage so the /search/hierarchy endpoint no longer NPEs when the pageType query param is omitted. The ES/OS client helpers already skip the pageType term when the value is null or empty, so this is a pure null-guard. - ContextFileResource.uploadFile: when a failure happens after the ContextFileContent row is created (e.g. inside extractionService.submit), the cleanup path now hard-deletes that content row so the DB is not left with an orphaned record. - ContextFileResource: replace the raw Content-Disposition string with a buildContentDisposition helper that emits both the legacy quoted filename= and the RFC 5987 filename=UTF-8'' parameter with percent-encoded bytes, so international filenames round-trip while staying header-injection safe. sanitizeFileName also falls back to "download" on null/blank input. - ContextFileResourceTest: new cases for sanitizeFileName null/blank fallbacks and for buildContentDisposition ASCII/unicode/space/injection behaviour (18 tests, all passing). Address copilot review comments on PR #27558 - AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix) matches the DAO signature — previous ordering always missed the index. - FolderResource / ContextFileResource getEntitySpecificOperations: return List.of() instead of null so callers iterating the returned list cannot NPE. - SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid helper that returns null for missing/malformed values and logs a warning instead of failing the whole hierarchy response. - DaoListFilter: qualify the pageType column with the caller-provided tableName, rename getArticleCondition to getPageTypeCondition (legacy no-arg method kept as @Deprecated wrapper for compatibility). - Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit getChildrenCountForPage search (N+1) with a single pass over the batch that derives childrenCount from pages whose parent is in the same result set. Also drops the now-unused helper and its throws clause. - openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient, jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional> so they don't leak into SDK consumers that only use the core client. - InMemoryAssetService: use the shared AsyncService executor for upload /read/delete instead of the JVM common ForkJoinPool. - sample-pricing.xlsx: replace the plain-text placeholder with a real minimal XLSX workbook so detection-based and extraction-based code paths see a valid Microsoft Excel 2007+ file. * Use one filters aggregation for page hierarchy childrenCount Follow-up to `b8458e2868`. The previous fix derived childrenCount from pages whose parent appeared in the same batch — that worked for listPageHierarchyForActivePage (which fetches all depths) but always returned 0 on the plain listPageHierarchy path (which only fetches one depth), so top-level listings lost the count semantically. Replace with a single `filters` aggregation keyed by page id: each named bucket matches descendants via a fullyQualifiedName prefix query against the page's FQN. That gives accurate direct-descendant counts for every returned page in one aggregation round-trip, still O(1) additional search requests regardless of batch size. * Add allowedFields entries for contextFile, folder, page Fixes SearchSettingsHandlerTest.testEveryAssetTypeHasCorrespondingAllowedFields. searchSettings.json already had assetTypeConfigurations for contextFile, folder, and page but no matching allowedFields entries, so the test that asserts every assetType has a corresponding allowedFields block failed with 'Asset type contextFile has no corresponding allowedFields entry'. Adds the three missing blocks with the fields that each index actually exposes — name / displayName (with .keyword and .ngram variants), description, fqn, fqnParts, tags/tier/domains/dataProducts, plus entity-specific fields (fileType/contentType/extractedText for contextFile, parent.displayName for folder/page, pageType for page). * Fix ui checkstyle * Fix Java checkstyle * Address PR #27558 copilot review round 2 - ES/OS populateChildrenCounts: add fqnDepth == parentDepth + 1 to the per-page filter so childrenCount is direct children only, matching the field name and the UI's isLeaf check semantics. Previously matched all descendants. - ES/OS buildPageNestedSearchHierarchy: filter out hits with a null id before Collectors.toMap, which would otherwise NPE when SearchUtils drops a malformed UUID. - SearchUtils.getPageHierarchy: wrap PageType.fromValue in a parsePageType helper that logs and returns null on unknown values, so a single bad hit can no longer break the whole hierarchy response. - TestSuiteBootstrap.setupMinIO: pin minio/minio to RELEASE.2024-01-16T16-07-38Z instead of :latest so a newly-published image cannot break integration tests without a code change. - createContextFile.json: rewrite the assetId description to be provider agnostic (S3 / Azure Blob / in-memory / no-op) and flag it as the legacy path, preferring headContentId / ContextFileContent. * Update generated TypeScript types * Address PR #27558 copilot review round 3 - bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql: - asset_entity: add PRIMARY KEY (id); mark all generated columns STORED for consistency with the other drive/knowledge tables in the same migration; compute deleted as a real boolean via IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) so the boolean index behaves correctly. - knowledge_center: mark name, updatedAt, updatedBy, pageType as STORED and apply the same deleted expression so the existing indexes on name and (fqnHash, deleted) are reliable on fresh installs. - drive_folder / context_file / context_file_content: update the deleted generated column to use the same boolean-safe expression. - ElasticSearch/OpenSearch hierarchy search: add an explicit sort on fullyQualifiedName ASC with _id ASC as tiebreaker so from/size pagination is deterministic and cannot skip/duplicate pages between requests. * Fix UI checkstyle * Address PR #27558 copilot review round 4 - createPage.json: rewrite the field descriptions for name, displayName, owners, reviewers, and entityStatus. They were copy/pasted from other schemas ('query', 'tag') and were misleading in generated docs and clients. - NoOpAssetService.generateDownloadUrlWithExpiry: return asset.getUrl() instead of a synthetic 'https://cdn.example.com/...' URL. The old behaviour let clients attempt downloads that would never resolve when object storage was disabled; returning the asset's own (empty) URL surfaces the misconfiguration cleanly. - AzureAssetService: normalize the prefix path the same way S3 does. Previously a null/blank prefix produced the literal 'null/' prefix, writing blobs under the wrong key. New formatPrefix returns "" for null/blank and ensures exactly one trailing '/' for a real prefix. - AssetRepository.getByFQN: treat null or empty list as 'not found', matching getByFqnPrefix. Callers previously received an empty list silently when the DAO returned [] instead of a 404. * Update generated TypeScript types * Fix UI checkstyle * Address PR #27558 copilot review round 5 - AssetDAO.update / AssetRepository.update: switch the UPDATE target from fqnHash to id. Two assets can share the same fullyQualifiedName (e.g. successive revisions of the same context file), so the old SQL could silently update sibling rows. - ContextFileExtractionService: run the extraction pipeline on a dedicated fixed thread pool instead of AsyncService.getExecutorService. process() blocks on assetService.read(...).join(), and S3/Azure reads are themselves scheduled on AsyncService — sharing the same bounded pool risks starving those reads (and deadlocking) once every thread is busy running extractions. - postgres/schemaChanges.sql: wrap the generated deleted column in COALESCE((json ->> 'deleted')::boolean, false) (and the asset_entity CAST variant) so an absent 'deleted' key is stored as FALSE, not NULL. Otherwise "non-deleted" filters based on the boolean index drop rows silently. Matches the MySQL IFNULL(..., FALSE) side of the migration. - ContextFileUploadSupport.sanitizeEntityName: treat null/blank input as 'file' instead of NPE-ing on replaceAll. Multipart uploads can arrive without filename metadata; the upload should still succeed with a stable generated name. * Remove macOS-only @rollup/rollup-darwin-arm64 dev dep I pinned this during local troubleshooting to get a Vite dev server running on macOS (rollup's optional native binary was missing). CI runs on Linux, where yarn install --frozen-lockfile refuses the package ('The platform \"linux\" is incompatible with this module'), which broke license-header, lint-src, lint-playwright, i18n-sync, app-docs, and ui-coverage-tests for PR #27558. rollup re-resolves its native binary per platform — there's no reason to pin the darwin one. Remove it from package.json and drop the matching '@rollup/rollup-darwin-arm64@^4.60.2' block from yarn.lock. * Re-declare optional SDK test deps on integration-tests classpath KnowledgeCenterIT failed in CI with 'java.lang.NoClassDefFoundError: org/glassfish/jersey/apache/connector/ApacheConnectorProvider' after I marked the JAX-RS client stack in openmetadata-sdk as <optional>true</optional> during review round 2. That change stops the deps from leaking to every SDK consumer, but integration-tests actually uses org.openmetadata.sdk.test.util.RestClient, so the optional deps must be re-declared on its own classpath. Adds jakarta.ws.rs-api, jersey-client, jersey-apache-connector, httpclient, jakarta.json-api, and parsson to openmetadata-integration-tests/pom.xml as <scope>test</scope>. * Fix IT failures from CI integration-tests-mysql-elasticsearch 1. MySQL deleted column: revert the IFNULL wrapper to plain (json -> '$.deleted'). My earlier IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) hit 'Incorrect integer value: false for column deleted' on fresh installs because MySQL cannot coerce the resulting JSON scalar into TINYINT(1) when the column is STORED. The bare '(json -> '$.deleted')' form is what other OM tables already use, and MySQL converts JSON true/false to 1/0 directly for the BOOLEAN column. STORED + PRIMARY KEY stay in place. 2. DriveFileUploadIT: raise the four short atMost(5s) awaits to 20s with explicit pollDelay(ZERO) + pollInterval(200ms). K8sOMJobOperatorIT sets a global Awaitility pollInterval of 5s at class setup; any subsequent test with atMost <= 5s hits 'Timeout must be greater than the poll delay'. Overriding the per-call poll settings insulates these asserts from the global leak. * Document SDK test-utility optional deps In review round 2 we marked jersey-client, jersey-apache-connector, jakarta.ws.rs-api, httpclient, jakarta.json-api, parsson, java-jwt, and junit-jupiter-api as <optional>true</optional> on openmetadata-sdk so that core SDK consumers don't inherit a heavy JAX-RS + JUnit stack. openmetadata-integration-tests hit this immediately with NoClassDefFoundError from RestClient; its own pom now re-declares the deps. Add a "Test utilities" section to the SDK README that lists the optional deps downstream test-utility consumers must re-declare (with the concrete <scope>test</scope> XML snippet) and explains the error they'd otherwise see. * NoOpAssetService: never return null from generateDownloadUrlWithExpiry In review round 4 I changed this method to return asset.getUrl() when the asset is non-null. But Asset.url is optional in the schema, so asset.getUrl() itself can be null — which breaks the implied "never returns null" contract downstream callers rely on (AttachmentResource only null-checks defensively). Normalize null and blank URLs to an empty string so the method's non-null, non-blank contract holds even when storage is disabled and the asset was never populated with a URL. * AssetServiceFactory: swap to NoOp when re-initialized with storage off init(...) previously only assigned NoOpAssetService when instance was null. On a re-init with object storage toggled off (config reload, test teardown, etc.), the previously wired S3/Azure/InMemory provider stayed live and kept serving real IO against a backend the operator thought was disabled. Replace the instance with a fresh NoOp when storage is disabled unless the instance is already a NoOp (idempotent on repeated disabled inits). * Type create-request domains arrays as fullyQualifiedEntityName The three new KC/Drive create schemas (createFolder, createContextFile, createPage) had domains as an array of unconstrained strings. The rest of the OM API models domain references as FQNs, and the shared basic.json#/definitions/fullyQualifiedEntityName is the convention for this. Point all three items refs at fullyQualifiedEntityName so generated clients see a consistent FQN type and requests get validated for non-empty length/format rather than any string. * Update generated TypeScript types * Address PR #27558 copilot review 4144965142 - ContextFileExtractionService: switch the default thread pool to a static final DEFAULT_EXECUTOR, so every production instance of the service reuses the same pool instead of leaking a fresh fixed pool per construction (tests especially create multiple instances). Threads remain daemons, so the pool never blocks JVM shutdown. - ObjectDeleteQueueService: when queueCapacity is 0, use a SynchronousQueue so "reject-if-all-workers-busy, no buffering" holds. Previous Math.max(1, queueCapacity) silently allocated a 1-slot ArrayBlockingQueue, contradicting the caller's stated capacity and potentially buffering one task past the semaphore's accounting. Not fixing: - SearchUtils @Slf4j 'LOG' vs 'log'. OM's openmetadata-service/lombok.config sets 'lombok.log.fieldName = LOG', so @Slf4j correctly generates 'LOG' for every class in this module. The reviewer's concern only applies to projects without that directive. Verified clean compile. * Address PR #27558 copilot review 4144917449 - knowledgeCenterTags.json: change mutuallyExclusive from the string "false" to the JSON boolean false. The Classification schema declares this as `"type": "boolean"`; jackson's lenient string->boolean coercion masked it until now, but strict validators would reject and the other OM bootstrap tag files that use the correct boolean (piiTagsWithRecognizers.json) model what this should look like. - ContextFileExtractionService.process: guard the updateContent updater with the same head-content check already used in updateFile. Previously, if headContentId flipped between the initial check and the status writes, updateFile would no-op while updateContent still marked the now-stale content "Analyzing", leaving it stuck once the later early-return fires. - AzureAssetService.upload: stream the InputStream straight to the blob using the known asset.getSize() instead of reading the whole payload into a byte[] via IOUtils.toByteArray. Matches the S3 streaming behaviour and avoids full-file heap pressure / OOM risk on larger files. Buffered fallback retained when size is unknown. - Size fields modeled as integer: flip fileSize / size on createContextFile.json, contextFile.json, asset.json, createAsset.json, and contextFileContent.json from "type": "number" to "type": "integer" with "format": "int64" and "minimum": 0. Byte counts are inherently whole numbers; floating point loses precision above 2^53 and makes validation murky. Update the (double) call sites in ContextFileResource, ContextFileUploadSupport, and AttachmentResource to match. Not fixing: - ContextEntityPromptService "unused Authorizer import" — false positive, the class uses it in the constructor. - NoOpAssetService.generateDownloadUrlWithExpiry null return — already fixed earlier in commit `a4a2dcc91d` (returns "" when url is null/blank). * AssetService.read: run inline instead of hopping through AsyncService Every caller of AssetService.read(...) immediately .join()s on the returned future: - ContextFileExtractionService.process reads + extracts - ContextFileResource.downloadFile reads + streams back - AttachmentResource.serveAsset reads + streams back - QueuedDeleteAssetService just delegates None of them exploit the async nature, but the S3/Azure/InMemory implementations all wrapped the blocking fetch in AsyncService.executeAsync or CompletableFuture.supplyAsync on a bounded pool. That created a starvation path when any caller thread was already running on AsyncService (or could monopolize it under load) — join() would block the caller while the submitted read task fought for a free worker. Switch S3, Azure, and InMemory read() to execute on the caller's thread and return CompletableFuture.completedFuture(...). Interface is unchanged so existing .join() callers keep working; the extra thread hop and the potential for AsyncService starvation are both gone. Combined with the dedicated context-file-extraction pool, the extraction pipeline no longer touches AsyncService for any asset-read step. * Address PR #27558 copilot review 4151211562 - FolderIndex / ContextFileIndex: stop re-setting entityType, deleted, owners, totalVotes inside buildSearchIndexDocInternal. Those common fields are populated by populateCommonFields in the SearchIndex template method (Phase 1) before Phase 3 calls the entity-specific internal builder, so the explicit puts were redundant and silently overrode the template output. Aligns with PageIndex convention and updates the unit tests to assert the internal builder sets only entity-specific fields. - ContextFileTextExtractor: bound the Tika BodyContentHandler at MAX_CANONICAL_TEXT_LENGTH instead of passing -1 (unbounded) so a pathological image cannot drive OCR to accumulate arbitrary output on the heap. - ContextFileExtractionService: replace the unbounded Executors.newFixedThreadPool backing queue with a ThreadPoolExecutor using an ArrayBlockingQueue + AbortPolicy. Without a bounded queue the RejectedExecutionException handling in submit(...) was dead code; with it, an overloaded server surfaces a "retry later" failure status instead of silently accumulating work. - S3AssetService / AssetService / AssetServiceFactory / QueuedDeleteAssetService: make AssetService extend AutoCloseable with a default no-op, override close() in S3AssetService to release the S3Client and S3Presigner connection pools, and register a shutdown hook in AssetServiceFactory that closes the current provider on JVM exit (and on re-init when the provider changes). - bootstrap 2.0.0 MySQL schemaChanges: change the deleted generated column from (json -> '$.deleted') to (JSON_EXTRACT(json, '$.deleted') IS TRUE) so rows where the JSON key is absent resolve to FALSE instead of NULL. Avoids filter misses on the composite (fqnHash, deleted) index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix Java checkstyle * Fix integration test compile + S3 generateDownloadURL ContextFileIT / DriveFileUploadIT compile failures came from the fileSize schema switch to integer/int64 — the generated setter/getter is now Integer. Replace the double literals with ints and the assertEquals(double, ...) sites with intValue() so the (int, int) overload resolves unambiguously. Also override S3AssetService.generateDownloadURL to return a short-lived presigned URL (mirroring AzureAssetService) instead of inheriting the default, which would return the raw S3 key from asset.url. Addresses review 4151282021. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert MySQL deleted column back to bare json -> expression The JSON_EXTRACT(...) IS TRUE form broke integration tests — GET after create started returning 404, consistent with MySQL evaluating the IS TRUE predicate against the JSON scalar in a way that stored 1 instead of 0 for freshly-created rows (deleted=false). Restoring the bare (json -> '$.deleted') expression used pre-review. Rows with the key missing will store NULL on the generated column, which is a theoretical concern the review flagged but does not affect current code paths (all inserts write json.deleted explicitly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix Transi18next import path in KnowledgeCenter components Two KnowledgeCenter files imported Transi18next from 'utils/CommonUtils', which is where Collate's UI re-exports it from. OpenMetadata core exports Transi18next from 'utils/i18next/LocalUtil' (same path every other core file uses). The Collate-style import broke the production Vite/Rollup build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Harden ContextFileIT.testFileAppearsInSearch against async indexing The test used a fixed Thread.sleep(2000) then a single assertEquals on the status code. That was flaky two ways: ES indexing is async and the 2s window is not always enough, and on a fresh cluster the context_file_search_index itself may not exist yet at first query (yielding 500). Replace with an await() loop that polls every 200ms for up to 30s and asserts both status==200 AND that the newly-created file's UUID appears in the response. Matches the assertSearchContainsFile helper in DriveFileUploadIT. Also URL-encode the namespaced query string so the uniqueName does not break the query parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Make playwright editor shortcuts platform-aware The SHORTCUTS constant in playwright/constant/KnowledgeCenter.constant.ts hard-coded "Meta+b" / "Meta+z" / etc. On macOS Meta is Cmd and those shortcuts trigger bold / undo / copy as expected, but on the Linux CI runners Meta is the Super (Windows) key — so every ProseMirror formatting and history test just pressed Super+b, which does nothing, and the test then fails waiting for the <strong>…</strong> element (or for the undone text to disappear). Detect the runner platform and use Meta on macOS, Control everywhere else — matching the same pattern in src/constants/KnowledgeCenter.constant.ts. Unblocks the 6 KnowledgeCenterTextEditor failures across Admin / Data Consumer / Data Steward roles (Text Formatting + Undo/Redo). Slash commands keep passing because they don't depend on modifier keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Run prettier on DateTimeUtils.ts CI's lint-src job fails because ESLint+Prettier --fix produces a non-empty diff against the committed tree. Local prettier pass trimmed the indentation and added a trailing comma in the imports block. No behavioral change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix Knowledge Page entity-link + DAO filter regressions from the port Downloaded the failing playwright traces from the PR's postgres e2e run and walked each one. Three distinct bugs, all present because the Collate-side overrides (overrides/EntityUtilClassCollate.ts and the DaoExtension.KnowledgeExtensionDAO custom SQL) were not carried over into OpenMetadata core when KnowledgeCenter was merged up. 1) CollectionDAO.KnowledgePageDAO: override listCount / listBefore / listAfter (plus helper SQL queries) so that `GET /v1/knowledgeCenter?entityId=X&entityType=topic` actually INNER JOINs entity_relationship and returns only pages whose relatedEntities contains the target entity. Without this the base EntityDAO ignored entityId/entityType entirely and returned every page, which is why the "Knowledge Articles" widget on a data asset page showed the 15 fixture articles instead of the one just attached — and why updateDataAsset timed out waiting for the linked article. Uses OWNS relation for user/team filters (same semantics Collate uses) and HAS for every other entity type. 2) EntityUtilClassBase + EntityUtils.getEntityLinkFromType: add EntityType.KNOWLEDGE_PAGE cases that route to getKnowledgePagePath. Before this, mention notifications for Knowledge Pages fell through to the default `/table/<fqn>` branch (confirmed in the captured page-snapshot: the mention link pointed at `/table/Article_eEqrWeeU`), which 404'd on the Table API and rendered an error page — so the entity-header-display-name textarea never appeared and the User Mentions test timed out. Search results on Explore had the same problem, rendering every Knowledge Page result card with href="/". 3) EntityUtilClassBase.getEntityByFqn / ENTITY_PATCH_API_MAP / getResourceEntityFromEntityType: handle KNOWLEDGE_PAGE end-to-end so the detail-page fetch, patches, and policy lookups all route through the knowledgeCenter REST API rather than falling back to the generic entity utilities (which don't know about the 'page' entity type). Verified against the real trace artifacts from CI run 24790718035: - shard 3 Knowledge Center page test — widget shows 10 unrelated "Article_" fixture items instead of the created one → root cause is the missing DAO JOIN (#1). - shard 3 User Mentions test — notification link is /table/, not /knowledge-center/ (#2). - shard 3 Reviewer Workflow — data consumer's knowledge-center goto renders "No data available" because getEntityByFqn fell back to a table fetch for a page FQN (#3). - shard 5 ExplorePageRightPanel_KnowledgeCenter (22 failures) — search result card links are "/explore/" (empty), same root cause as (#2) inside getEntityLinkFromType default branch. Compiles: mvn -pl openmetadata-service -q -DskipTests compile passes; tsc --noEmit reports no new errors in the touched files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Address remaining PR #27558 review feedback Seven actionable fixes drawn from the still-open review threads; the rest of the open threads in copilot's bot reviews are either already addressed in earlier commits or stale against the current code and are being resolved on the review UI alongside this commit. - AssetRepository.getByFQN: the LOG.error message said "asset with id" but was printing the FQN. Relabel to "asset with FQN" for accurate troubleshooting (thread #42). - KnowledgePageMapper.createToEntity: stop mutating the inbound CreatePage by calling create.withRelatedEntities(...). Build the effective list as a local variable and pass it to copy(...). Prevents the Organization fallback from leaking into the caller's request object, which is surprising when the request is re-used or logged (thread #43). - FolderIndex: default childrenCount to 0 when the entity hasn't yet had its children recomputed (e.g. a freshly created folder). Prevents the numeric field from being indexed as missing, which broke range and sort queries that assume it is always present (thread #46). - NoOpAssetService and InMemoryAssetService: override generateDownloadURL to delegate to generateDownloadUrlWithExpiry, matching S3/Azure. Without this, callers using the non-expiry API got asset.getUrl() (often empty for these providers), yielding broken download links (threads #39, #45). - ObjectDeleteQueueService: register a JVM shutdown hook in the singleton's initializer that calls stop(). The service already implements Dropwizard Managed, but nothing currently wires it into the application lifecycle, so non-daemon delete-worker threads were at risk of keeping the JVM alive after ungraceful termination. The hook is a belt-and-suspenders fallback to the Managed path (threads #52, #53). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add java-checkstyle skill for Claude + Codex agents CI keeps surfacing "Java checkstyle failed — please run mvn spotless:apply" comments on PRs (including this branch). CLAUDE.md and AGENTS.md already mentioned the command, but a one-line prose note in the middle of each file wasn't enough to make it a reliable habit. This commit: - Adds a dedicated invocable skill at .claude/skills/java-checkstyle/SKILL.md (for the Claude Code harness) and a mirror at .agents/skills/java-checkstyle/SKILL.md (for Codex-style agents). Both describe the same procedure: when / why to run spotless, the `-pl <module>` scoping option, the verify-only `spotless:check` form, the expected diff shape, and the rule to never hand-edit formatting around a plugin error. - Promotes the existing one-liners in CLAUDE.md and AGENTS.md to explicit "run before finishing any Java task" instructions, pointing at the skill so agents have a reusable procedure to invoke rather than improvising. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Harden AttachmentResource upload/download against three regressions Carried over from the latest AttachmentResource review. Three issues: 1. Content-Disposition header injection (security) — downloadAsset() built the header by direct string interpolation of asset.getFileName(). A filename containing double-quotes or CRLF could inject arbitrary HTTP headers. ContextFileResource already has a sanitize + RFC-5987 encode helper; rather than duplicate it, promote ContextFileUploadSupport.sanitizeFileName / buildContentDisposition to public, delete the duplicates from ContextFileResource (now delegators), and reuse the shared helpers from AttachmentResource. 2. Unbounded upload buffering (performance / DoS) — createAssetFromUpload read the full multipart body into a byte[] via IOUtils.toByteArray before checking against MAX_FILE_SIZE. An attacker could send an arbitrarily large body and exhaust heap before the validation ran. Replace with ContextFileUploadSupport.bufferUpload(), which streams to a bounded temp file and throws MaxFileSizeExceededException the moment the configured limit is passed; translate that into the same AttachmentException size-validation error the previous code raised. Promoted BufferedUpload and MaxFileSizeExceededException to public so the attachments package can consume them. 3. Startup NPE when objectStorage is null (bug) — initialize() called config.getObjectStorage().getMaxFileSize() without a null guard, so a deployment that doesn't configure object storage would NPE on server start. Added the same guard ContextFileResource.initialize() already uses, gave MAX_FILE_SIZE a safe 5 MiB default, and also null-guarded the S3-configuration branch of the CDN URL lookup so a pure-Azure or pure-NoOp setup doesn't fall off the end of the ternary. Ran mvn spotless:apply — picks up formatting-only changes in CollectionDAO.java and FolderIndex.java as a side effect of the shared helper additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add ui-checkstyle skill + fix residual import-order drift CI's UI Checkstyle workflow has three per-area jobs (lint-src, lint-playwright, lint-core-components) that reformat the files changed in the PR and fail if the reformat produces a diff. CLAUDE.md and AGENTS.md didn't previously document this flow, so re-running the fix was a guessing game — the two lint-core-components and lint-playwright failures on this branch came from stale import order left over from the main→context_center merge. This commit: - Adds a dedicated invocable skill at .claude/skills/ui-checkstyle/SKILL.md (Claude Code harness) and a mirror at .agents/skills/ui-checkstyle/SKILL.md (Codex-style agents). Both describe the exact three-command sequence CI runs — organize-imports-cli → eslint --fix → prettier --write — the per-area file scoping, the `--check` dry-run mode, and the rule that organize-imports must run BEFORE prettier (otherwise the indentation / trailing-comma round-trip leaves a dirty diff). - Promotes the existing one-liner in CLAUDE.md and AGENTS.md to an explicit "run before finishing any UI task" instruction that points at the skill. - Fixes two residual import-order drifts (KnowledgePagesHierarchy.tsx, EntityUtilClassBase.ts) surfaced by running the skill's sequence locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix UI checkstyle on EntityUtilClassBase.ts ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the fallback return in getEntityByFqn. Committing the formatted version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix ContextFileIT.testFileAppearsInSearch flaky 500 from query_string parsing The previous polling search used the namespaced unique name as a free-text q= argument. The namespace prefix contains '-' which the ES 9.x query_string parser treats as a NOT operator, producing a deterministic 500 across the full 30s polling window even when the document was indexed. Switch to the direct get-by-id endpoint (/v1/search/get/{index}/doc/{id}), which performs a real-time ES GET with no query_string parsing and no analyzer involvement — the most reliable signal that the document was indexed. Bump the timeout to 60s and capture the response body on any non-200 so future regressions surface the real ES error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Fix knowledge center icon * update knowledge center to context center Co-authored-by: Copilot <copilot@github.com> * Revert "update knowledge center to context center" This reverts commit `f0cca5fd65`. * Fix UI checkstyle: sort tag-related imports in SearchClassBase Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Fix Jest coverage failures in KnowledgeCenter Layout and right panel KnowledgeCenterLayout was importing i18n directly from LocalUtil, but the global setupTests mock for that module only exposes t/on. Switch to the useTranslation() hook so it picks up the react-i18next mock that already provides i18n.dir(), matching how LeftSidebar and RichTextEditor use the direction. EntityRightPanelClassBase.getKnowLedgeArticlesWidget now returns the KnowledgePages component instead of null. Update the corresponding test case to assert the new return value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix playwright tests and bugs Co-authored-by: Copilot <copilot@github.com> * Fix checkstyle * Fix /knowledgeCenter/search/hierarchy 500 by removing _id sort ES 9.x and OpenSearch 3.x reject sorts on the _id field by default (indices.id_field_data.enabled is false), causing every call to listPageHierarchy{,ForActivePage} to fail the search_phase_execution_exception "all shards failed" we see in the screenshot. The _id sort was added in `4a75852a7e` as a tiebreaker for from/size pagination, but fullyQualifiedName is already a keyword field with doc_values and is unique per page (name is unique within a parent's children) — so no tiebreaker is needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Cascade hard-delete to descendant pages in search index KnowledgeCenter pages are nested via FQN (parent.fqn -> parent.fqn.child), not via a parent.id field on the child doc. The default deleteOrUpdateChildren case for entity type "page" uses page.id field matching, which doesn't exist on child page docs — so a recursive hard-delete on the parent removed the parent from search but left every descendant orphaned in the index. Stale docs only disappeared on a full reindex. This logic was overridden in the collate fork's SearchRepositoryExt; it was lost during the migration when the override class was removed. Fold the override into the base SearchRepository as a Page-specific case that calls deleteEntityByFQNPrefix, which deletes by fullyQualifiedName.keyword prefix match — covering every descendant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Add page/folder/contextFile/securityService to SearchIndexingApp picker The Search Indexing Application's "Entities" picker shows "No data" when typing "Page" because the enum in src/utils/ApplicationSchemas/SearchIndexingApplication.json does not include the Knowledge Center / Drive entity types added on this branch. The collate fork carried these in SearchIndexingApplication-collate.json (included page); folder, contextFile and securityService are new on this branch and never made it into the picker enum during the migration. Without them in the enum, users cannot select these entity types for targeted reindex, even though every other reindex code path supports them. src/jsons/applicationSchemas/* is generated by parseSchemas.js from src/utils/ApplicationSchemas/* at build time and is gitignored, so only the source schema is updated here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Restore live index settings on per-entity distributed-promote path DefaultRecreateHandler exposes two finalization paths: - finalizeReindex(...) — centralized end-of-job promotion. Calls applyLiveServingSettings + maybeForceMerge before the alias swap, reverting the bulk overrides (refresh_interval=-1, replicas=0, async translog) back to live values (refresh=1s, replicas=1, durable translog). - promoteEntityIndex(ctx, ok) — per-entity promotion. Used by the distributed search-indexer's "promote as soon as all partitions for an entity complete" callback (DistributedSearchIndexExecutor.promoteEntityIndex). Swaps the alias and cleans up old indices — but never restored live settings. When an entity finishes its partitions before the final reconciliation (typically the smallest entities — e.g. knowledge `page` with ~11 rows), its index is promoted via the per-entity path, the alias swap succeeds, and the bulk-build overrides become the new live settings. refresh_interval stays at -1 in production, so live writes after the reindex are buffered in the translog and never reach searchable segments until a manual _refresh. Externally this surfaces as "create an article, hierarchy is empty until I re-trigger reindex" — exactly the user-reported bug. Mirror the finalizeReindex sequence by calling applyLiveServingSettings (and maybeForceMerge for parity) at the top of the promote block in promoteEntityIndex, before the alias swap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Wire jobData into per-entity reindex promotion handler DefaultRecreateHandler.applyLiveServingSettings reads from the handler's jobData field (live + bulk index-settings overrides on the EventPublisherJob). The per-entity distributed-promotion path in DistributedSearchIndexExecutor created its own DefaultRecreateHandler instance and never called withJobData(jobData) on it. With jobData=null, buildRevertJson returns null and applyLiveServingSettings silently no-ops — meaning the previous fix (`b272de85f9`) never actually re-applied live settings on the per-entity promote path, even though the call was reached. currentJob.getJobConfiguration() is the EventPublisherJob the strategy created. Wire it into the new handler at construction time, mirroring the withJobData call DistributedIndexingStrategy already makes on the strategy's own handler instance. With this change, the per-entity promote path now logs "Applying live serving settings to staged index '...' for entity 'page': {\"number_of_replicas\":1,\"refresh_interval\":\"1s\", ...}" before the alias swap, and post-promotion `_settings` show refresh_interval=1s instead of the stuck -1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Fix delete failure * Fix java checkstyle * Fix article deletion issue * refactor(test): streamline Knowledge Center List setup and teardown processes * Fix GlossaryTags * Add missing pieces in knowledge articles * Fix checkstyle * Remove reviewer workflow spec * remove unused util * Fix the localization changes * Fix unit tests * deleted unused svg * added missing svg * improved ux of save button & autofocus on title * lint fixes * Update page index * Make calculateFqnDepth static * fixed the kc imports * import fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com> Co-authored-by: Copilot <copilot@github.com> Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com> Co-authored-by: Rohit0301 <rj03012002@gmail.com> Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com> Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>	2026-05-08 10:56:04 -07:00
Sriharsha Chintalapani	9956592b00	chore(security): bump deps to address reported CVEs (#27994 ) * chore(security): bump deps to address reported CVEs - log4j 2.25.3 -> 2.25.4 (CVE-2026-34477/34478/34480) - jsonschema2pojo 1.2.2 -> 1.3.0 (CVE-2025-3588) - netty-bom 4.1.132 -> 4.1.133 (netty-codec/transport GHSAs) - azure-identity 1.14.0 -> 1.15.2 in openmetadata-service to align with parent dependencyManagement * fix: bump jsonschema2pojo to 1.3.1 to fix maven-plugin classpath 1.3.0 dropped its declared dep on plexus-utils, breaking the maven-plugin at runtime with NoClassDefFoundError on org/codehaus/plexus/util/DirectoryScanner. 1.3.1 restores it. 1.3.3 has a separate regression (IndexOutOfBoundsException in ValidRule), so 1.3.1 is the right pin.	2026-05-08 22:33:03 +05:30
Eugenio	483461a003	Add migrations to ensure PII are really enabled (#27921 ) This is especially needed for instances that had already upgraded to 1.12.0 onwards, those instaces skipped the migration cherry-picked in 1.12.6	2026-05-08 15:39:29 +00:00
Akash Verma	459dfa30a5	Add missing Customsearch.md (#27968 ) Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>	2026-05-08 15:02:05 +00:00
Anujkumar Yadav	d4387aa644	feat: Add request access button for data product (#27973 ) * feat: Add request access button for data product * Fix lint checks * fix lint issue and addressed comments * fix test	2026-05-08 12:36:40 +00:00
Harshit Shah	19ca2b96c0	fix: migrate and polish TestSuite pipeline tab (#27914 ) Some checks are pending Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run Details Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions Details Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run Details Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions Details Java Checkstyle / java-checkstyle (push) Waiting to run Details Maven Collate Tests / maven-collate-ci (push) Waiting to run Details OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions Details Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run Details * fix(ui): migrate and polish TestSuite pipeline tab Migrate the TestSuite Pipeline tab to core-ui table primitives and align behavior with the ingestion table experience, including row actions, count rendering, header tooltip styling, and placeholder spacing. * fix checkstyle * fix failing tests * address comments * fix playwright checkstyle * remove unnecessary changes	2026-05-08 08:33:46 +00:00
Pere Miquel Brull	10f26581b8	chore(mcp): add server.json for MCP Registry publishing (#27982 ) * chore(mcp): add server.json for MCP Registry publishing Adds metadata for publishing openmetadata-mcp to the official MCP Registry (registry.modelcontextprotocol.io). Aggregators like PulseMCP scrape the official registry, so this single entry surfaces the server across the ecosystem. The server is self-hosted per deployment, so the streamable-http URL uses an {openmetadata_host} template variable that clients resolve to their own OpenMetadata hostname. * chore(mcp): align server.json description with #27975 messaging Reframes the registry description to match the "trusted context and business semantics for AI" positioning from the README rebrand in #27975. Also tightens the description to satisfy the schema's 100-char cap on the field (the prior 506-char copy would have failed validation at publish time) and adds websiteUrl pointing to the MCP docs page. * chore(mcp): mark server.json description as the official MCP The registry namespace (io.github.open-metadata/) is invisible to users browsing aggregators like PulseMCP — they see only title and description. Calling out "Official OpenMetadata MCP" differentiates this canonical entry from any community wrappers people might publish under other namespaces. chore(mcp): clarify host variable supports custom ports Many self-hosted OpenMetadata deployments run on the default :8585 without a reverse proxy. Spell that out in the openmetadata_host variable description so users know they can include a port. * fix Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-08 10:14:31 +02:00
harshsoni2024	44ed018064	MINOR: add more logs in pbi for lineage (#27970 ) * add more logs on lineage * minor fix * string literal fix * RUF010 exception * space fix --------- Co-authored-by: Satender K <satendra.kumar@getcollate.io>	2026-05-08 10:13:54 +02:00
Harshit Shah	bb6c43768f	Migrate ColumnProfileTable from antd to core components Table (#27965 ) * feat(profiler): migrate ColumnProfileTable from antd to core-ui Table Replaces the antd-based Table wrapper in ColumnProfileTable with the @openmetadata/ui-core-components Table primitive (react-aria-components foundation). Removes antd ColumnsType column definitions in favour of explicit Table.Row/Table.Cell render, adds client-side sort via SortDescriptor state, manual expand/collapse for nested columns via FlatRow flattening, and preserves data-row-key/expand-icon attributes for e2e selector compatibility. Ref: https://github.com/open-metadata/openmetadata-collate/issues/3837 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix checkstyle * refactor(profiler): replace inline style width constants with Tailwind classes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-08 10:57:54 +05:30
Harshit Shah	7c93c1c54a	[UI] Migrate Observability Alerts table to core-ui components (#27906 ) * feat(ui): migrate observability alerts table to core-ui components Move Observability Alerts page table rendering to openmetadata core-ui Table components and align column layout, loading behavior, and pagination divider handling. Update unit and existing pagination e2e coverage to validate action controls and table structure, and close issue #3837. * address gitar-bot comments * fix ui checkstyle * fix failing tests * fix playwright checkstyle * fix failing test * fix failing pagination tests	2026-05-08 10:12:27 +05:30
Chirag Madlani	a90e7729a6	refactor: streamline SchemaTable component and optimize related metrics form (#27959 ) * refactor: streamline SchemaTable component and optimize related metrics form * fix row expansion issue on update --------- Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>	2026-05-07 22:58:07 +05:30
Mayur Singal	4e24ba1d8b	fix(cli-e2e): use profileSampleConfig in profiler test builders (#27947 ) PR #27184 (commit `47c88d49ce`, "Dynamic Sampling Config") moved profileSample/profileSampleType out of DatabaseServiceProfilerPipeline and TableProfilerConfig into a nested profileSampleConfig object, but the CLI E2E test config builders weren't updated. Both pydantic models now use extra='forbid', so the old format raises "Extra parameter 'profileSample'" and the scheduled py-cli-e2e-tests workflow has been red on every run since 2026-04-17 (postgres, mysql, mssql, oracle, redshift, snowflake, redash, metabase, quicksight, tableau, bigquery_multiple_project, dbt_redshift). Update the ProfilerConfigBuilder to emit the new schema and update the BigQuery TableProfilerConfig usage to match. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 22:35:31 +05:30
Teddy	219c5683fa	ISSUE #3032 (#27912 ) * feat: move flat sampling to sampling config + dynamic sampling option * feat: move flat sampling on the backend to sample profile conifg object * feat: fix circular import * feat: align UI with new profiler config * feat: fix json schema * feat: align python imports with new schema path * feat: update migration to look at extension * feat: remove enable * feat: remove enable * feat: added titles to sample config * feat: generated ts classes * feat: addressed comments * feat: change sample config instantiation to match new structure * feat: removed backward compatible fields * feat: ran java linting * feat: updated imports to point to generated files * feat: added dynamic sampler resolution logic * feat: ran python linting * feat: remove duplicate migration * chore: merge upstream and clean conflicts * feat: update logic to support dynamic and static sampling * feat: adjust sample config call * feat: test for statis, dynamic, row count and tier methods * feat: more sample config unit tests * feat: added tests for metric and sampling * feat: added tests to validate fallback is not called i nmetric computers * feat: strengthen profiler validation tests * feat: fix sampling config * feat: fix sampling config * feat: fix sampling config * feat: generated typescript models * feat: fixed missing dq pipeline migration * feat: fixed static check * feat: fixed ci failures * feat: fixed ci failures * feat: fixed unit tests faioure and linting * feat: fixed integration tests failures * chore: fixe burstiq refactor * chore: fix trino ci failures * chore: revert baseline.json file * chore: fix sampler availabl burst iq changes * feat: added smart sampling radio button * feat: ignore static checks errors * feat: ran ts linting * feat burstiq infinite recursion issue with dynamic as default * feat: translate i8n keys * feat: fix failing tests	2026-05-07 09:01:18 -07:00
Rohit Jain	b42c9ad3ba	Fixed the translations issues in AdvancedSearch description option (#27961 ) * Fixed the translations issues in AdvancedSearch description option * nit	2026-05-07 15:19:54 +00:00
Laura	4c07b28c82	Add alias marketplace (#27943 ) * Add alias marketplace * wire fingerprint and embeddings in domain_index_mapping	2026-05-07 16:50:49 +02:00
Pere Miquel Brull	54ae549fc6	Fixes #27852 : propagate tolerations from CronOMJob to scheduled OMJob (#27955 ) CronOMJobReconciler.deepCopyPodSpec was copying nodeSelector but silently dropping tolerations when generating an OMJob from a CronOMJob template. Manual runs worked because they go straight through K8sPipelineClient.buildOMJob, but scheduled runs went through this deep-copy and lost the field, leaving pods Pending on tainted nodes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 14:38:49 +02:00
Anujkumar Yadav	80375a7dc6	Add data access request support (#27879 ) * Add DAR tasks * Removed UI related changes of DAR * nit * Update generated TypeScript types * fix linting issue * Removed all languages changes * nit * removed white space * add request data access button with owner/status conditions * fix lint issue * fix minor validation for data access button * fix lint issue * fix data access button visiable condition * fix java lint checks and fix test cases * nit * fix test * fix(tasks): model CreateTask.about as entityLink, validate target entity Replace `about` (FQN string) + `aboutType` (string) with a single `about` field of type entityLink (`<#E::{entityType}::{fqn}>`). The resource layer parses the link and resolves it via `Entity.getEntityReferenceByName(type, fqn, NON_DELETED)`, which guarantees the target asset exists and is not soft-deleted. Why: long-FQN data assets were rejected with `[query param name size must be between 1 and 256]` because the modal was constructing a Task `name` from the FQN. The `about` was modelled as a free string with no schema validation that the target was a real, non-deleted entity. The Threads API already uses entityLink for this exact purpose; tasks now align with that pattern. The link is supplied as a hidden field by the UI — users never see it. Also fixes the missing `@ExtendWith(TestNamespaceExtension.class)` on `DataAccessRequestIT` that caused four test failures in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix unit test failure * fix(test): await workflow stage transition in DataAccessRequestIT The workflow advances the task from pending-workflow-start to review asynchronously. Asserting on the object returned by create() was a race condition. Use Awaitility to poll until the stage is review, matching the pattern in IncidentTaskIntegrationIT. --------- Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com> Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>	2026-05-07 17:56:44 +05:30
Laura	b12506fc6d	Add container entity type (#27957 ) * Add alias marketplace * wire fingerprint and embeddings in domain_index_mapping * add container entity to dataAssetEmbeddings * add container to VECTOR_INDEXABLE_ENTITIES * Move changes for marketplace to another PR	2026-05-07 14:21:29 +02:00
sonika-shah	e91c90c144	fix: validate custom property name charset (#27808 ) * fix: validate custom property name charset Tighten custom property name validation to block characters that break downstream parsers, with verified empirical reproduction: - `"` causes HTTP 500 on PUT /metadata/types/{id} - `:` breaks CSV import — exporter writes `key:value;key:value`, importer splits at first colon, treats prefix as the field name - `^` breaks OpenSearch query when the name is in searchSettings.searchFields — Lucene reads `^` as the boost separator in `field^boost` - `$` breaks CSV import via java.util.regex.Matcher.replaceAll which interprets `$<letter>` as a backreference Adds a `customPropertyName` definition in basic.json and switches customProperty.json to reference it. Adds a defensive regex check in TypeRepository.validateProperty so the API returns 400 with a clear error message even if schema validation is bypassed. Tests cover allowed-charset acceptance, the four blocked characters, leading-character validation, max-length enforcement, and unbalanced brackets. * Update generated TypeScript types * test: add schema-vs-Java consistency test for custom property name Guards against drift between basic.json#customPropertyName and the TypeRepository regex/length constants. If either side is updated without the other, CI fails with a message pointing to both files. The Java validator is kept (better error message + covers internal callers that bypass the HTTP layer); the consistency test guarantees the two definitions cannot drift. * fix: extend custom property name charset after gap-coverage matrix Re-ran the matrix on previously-untested chars (+ ? * ~ ` \) across all 17 property types × create/patch/CSV/search: - + ? * ~ ` all pass cleanly on every operation × every property type — add to allow list - \ fails CSV roundtrip for entityReference and entityReferenceList types (escape inconsistency in CSV serialization) — add to block list Updates the regex, schema description, Java validator error message, and adds the new chars to the allow/block integration tests. Consistency unit tests in TypeRepositoryTest continue to pass. Final allow set: alphanumeric _ - . / & % # @ ! , ; = \| ' + ? * ~ ` space ( ) < > [ ] { } Final block set: " : ^ $ \ * Update generated TypeScript types * updated the custom property name validation * added name suffix in custom property name * lint fixes * include backslash in invalid char Co-authored-by: Copilot <copilot@github.com> * fixed the playwright issue Co-authored-by: Copilot <copilot@github.com> * lint fix * fix check style * Drop redundant Java validator for custom property name; tighten IT assertions Schema is the single source of truth: jsonschema2pojo emits @Pattern + @Size on CustomProperty.name from basic.json#/definitions/customPropertyName, and @Valid on TypeResource.addOrUpdateProperty enforces them at the HTTP boundary. The hand-written Pattern constant, validateCustomPropertyName, and the schema-vs-Java sync test were duplicating that rule and could never reach the HTTP user (Bean Validation always fires first via @Valid). Tighten the new TypeResourceIT cases from assertThrows(Exception.class) to assertThrows(InvalidRequestException.class) so a regression to a different exception type or status code fails loudly. * restrict few more special characters from Cp name * minor fix * Disallow & < > in custom property names; align IT cases Schema-side counterpart to the UI changes in the previous two commits: basic.json#/definitions/customPropertyName now blocks &, <, > alongside the existing " : ^ $ \\. The DOMPurify pass on the UI sanitizes &, <, > into HTML entities, which produced inconsistent persisted values; rejecting them at the schema layer prevents that drift across all write paths. IT updates: - Drop &, <, > from the allowed-charset cases (and the "withMatched(pair)And<more>" composite) - Add &, <, > to the disallowed-charset cases - Drop "<" leading-character case (now covered as a disallowed character) - Drop "<" / ">" unbalanced-bracket cases * Update generated TypeScript types * Close PATCH bypass for custom property name validation on Type Bean Validation runs for the dedicated PUT /types/{id} (addOrUpdateProperty) because the resource declares @Valid CustomProperty, and the createOrUpdate path can't carry customProperties at all (CreateType schema doesn't include the field). PATCH /types/{id} accepts an opaque JsonPatch, so @Valid never reaches into the resulting customProperties[] — a JSON Patch like [{"op":"add","path":"/customProperties/-","value":{"name":"bad:colon",...}}] persisted bad-named properties (verified live: HTTP 200 before this fix). Run Hibernate Validator programmatically inside TypeRepository.prepare() so every write path enforces the schema-derived @Pattern / @Size / @NotNull on each CustomProperty. The rule still lives only in basic.json — picked up via the generated @Pattern annotation, executed via ValidatorUtil.validate. Tests in TypeResourceIT: - test_patchCannotAddCustomPropertyWithDisallowedName — seeds a valid property to ensure /customProperties exists, then PATCHes appending a name with ':', asserts InvalidRequestException and verifies the bad name is not persisted - test_patchCanAddCustomPropertyWithValidName — guards against the fix rejecting valid PATCH-driven additions * Block * in custom property names — breaks ES field-path lookup When the property name appears in extension.<propertyName>^boost entries of searchSettings.searchFields, OpenSearch treats * as a field-path wildcard. The literal * field never matches its own wildcard pattern, so the field gets silently skipped from the query and Explore search returns no hit for the value. Bisected against the running server: of 12 candidate Lucene-special chars, only * actually breaks the mainline UI search flow. ? ~ ( ) { } [ ] / ! and space all returned hits via the searchFields path because OS looks up the field literally and only treats * as a wildcard at that layer. Updates the regex + description in basic.json/customProperty.json, the UI regex in regex.constants.ts, the validation message across 19 locales, the generated TS docstrings, the Playwright invalid-name fixtures and spec, and the IT TypeResourceIT case (withasterisk moves from allowed to disallowed). Validate only newly-added custom properties; isolate PATCH IT to fresh types prepare() previously validated the entire customProperties[] on every Type write. An upgraded instance with a legacy property whose name contained a now-banned char would then reject any subsequent PUT/PATCH on that type, even when the write only adds a different valid property. Move the name validation into TypeUpdater.updateCustomProperties() and scope it to the `added` list computed by recordListChange against the original entity. New properties are still validated; pre-existing names are left alone. Replace the IT PATCH cases' shared `topic` Type with a fresh, namespaced entity-category Type per test (createEntityTypeForTest). The shared `topic` was mutated concurrently by other tests in the class — combined with PATCH's lack of per-type locking, that produced lost-update races and flaky asserts. The fresh per-test type has customProperties: [] from creation, so the patch sets the array directly without a seed property. * chore: prettier formatting on the new asterisk-rejection test * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs: add + ? ~ ` to JSDoc allow-list to match the regex * fix(it): request customProperties field on read-back in PATCH IT Type.customProperties is a lazy field — TypeRepository.setFields only populates it when the request URL includes ?fields=customProperties. The default getTypeById helper omits the param, so the read-back always saw customProperties == null. That made test_patchCanAdd... fail (the just- persisted property wasn't visible) and made test_patchCannotAdd... pass for the wrong reason (would have stayed green even if the bad name had slipped through validation). Add a fields-aware getTypeById overload and use it in both PATCH cases. Empirically verified against the live server: good name returns 200 + appears in customProperties, bad name returns 400 + does not. * minor fix * playwright test fix * removed unecessary test * blocked ~ and / from custom property name * lint-fix * Block / and ~ in custom property names (JSON Pointer reservations) Forward slash and tilde are reserved by JSON Pointer (RFC 6901): / is the path separator and ~ is the escape lead-in (~0 = ~, ~1 = /). Allowing them in a property name shifts the burden onto every caller that builds a JSON Patch by string interpolation; a raw `/extension/${propertyName}` either splits into the wrong number of segments or contains an invalid escape sequence, and the server applies the patch to the wrong key (or 400s outright). This surfaced as a reproducible failure in the table-cp Playwright suite: the preceding test ended with `path: \`/extension/${propertyName}\`` where propertyName ended in `/`. The server addressed extension[name-without-/][""] instead of extension[name-with-/], returned 400, and TableClass.patch overwrote entityResponseData with the error body — stripping id and FQN. The next test fell into the search-based navigation path with an empty search term and timed out at 180s. Tighten the schema regex in openmetadata-spec/.../basic.json#customPropertyName to drop / and ~ from the allowed set; update the human-readable description in basic.json and customProperty.json to call out the RFC 6901 reservation. Move the with/slash and with~tilde cases from the allowed-charset IT to the disallowed-charset IT in TypeResourceIT. * Update generated TypeScript types * Use fresh per-test Type in custom-property name validation IT The five charset/length/lead-char tests added in this PR previously mutated the shared built-in TABLE_ENTITY_TYPE under @Execution(CONCURRENT). The PUT path acquires TYPE_PROPERTY_LOCKS so concurrent writes serialize, but relying on that lock for test isolation is fragile — the PATCH-driven IT in the same class already uses a per-test fresh Type via createEntityTypeForTest(client, ns, ...) for exactly this reason (see `1864b0a6ac`). Switch the five PUT tests to the same pattern so no test mutates a shared Type, eliminating cross-test coupling regardless of whether the server-side lock is in place. Tests affected: - test_customPropertyNameAllowedCharacters_succeeds - test_customPropertyNameDisallowedCharacters_fails - test_customPropertyNameMustStartWithAlphanumeric_fails - test_customPropertyNameTooLong_fails - test_customPropertyNameUnbalancedBrackets_succeeds * Align UI artifacts with the tightened custom-property-name regex Three small follow-ups flagged by reviewers: - regex.constants.ts: JSDoc above CUSTOM_PROPERTY_NAME_REGEX still listed / and ~ as allowed even though the pattern below was tightened to drop them. Update the comment to match the actual regex and call out the RFC 6901 reason so future edits don't reintroduce them. - CustomProperties.spec.ts: the "should accept a valid name with allowed special characters" test fed a hardcoded string containing ~ and /, which the new regex rejects — the assertion would fail. Drop those two characters so the input stays in the allowed set. - zh-cn.json: the Simplified Chinese translation of custom-property-name-validation was double-escaped (\\\" and \\\\), which would render to users as literal \" and \\ rather than " and \. Match the escaping pattern used by the other 18 locales. * addressed gitar comment --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Rohit0301 <rj03012002@gmail.com> Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-07 15:35:43 +05:30
Satender K	ea65ce78bf	Fixes 27433: task page is being updated when user is already on same page and clicks on the new assigned task from bell icon. (#27903 ) * fixed issue 27433 * updated code as per GitAR comments * updated code for UI check fails * added E2E test case for issue 27433 * fix(e2e): make task notification refresh test self-contained Replace hardcoded 'raw_order' search with a test-owned TableClass entity, remove fragile URL-splitting FQN extraction, and clean up the created task in afterAll to prevent residual data across test runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style(e2e): apply prettier and import organization to TaskNavigation spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * worked up comments by Rohit * updated test case as per review comment * added afterAll back as removing it will leave resources leaked in DB, as per GitAR --------- Co-authored-by: Satender <sommy@Satenders-MacBook-Pro.local> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 15:20:02 +05:30
Ram Narayan Balaji	339b3dfb18	fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs (#27940 ) * fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs - jetty-http: 12.1.6 → 12.1.7 (HTTP Request Smuggling, CRITICAL) - bcpkix/bcprov/bcutil-jdk18on: 1.80 → 1.84 (Crypto Signature Bypass + Timing Attack) - postgresql: 42.7.7 → 42.7.11 (SCRAM-SHA-256 DoS) - httpcore5-h2: pinned to 5.3.5 (HTTP/2 stream reset DoS) - commons-compress: pinned to 1.26.0 (Infinite Loop DoS) - jackson-core: 2.18.6 → 2.19.0 (async parser resource exhaustion) - maven-shade-plugin: 3.5.1 → 3.6.0 (supports Java 22 MR-JAR in jackson-core 2.19.0) - openapi-generator template override: jackson-version 2.17.1 → 2.19.0 in generated swagger pom * fix(security): upgrade spring-web 6.2.11 → 6.2.18 * fix(security): align jackson-dataformat-yaml, feign, gson, logback versions - jackson-dataformat-yaml: 2.17.2 → ${jackson.version} (2.19.0) - feign-core: 13.2.1 → 13.5 (in openapi-gen template) - gson: 2.10.1 → 2.11.0 (in openapi-gen template) - logback-classic: 1.3.13 → 1.5.25 (in openapi-gen template) * fix(security): use jackson 2.18.7 — highest clean 2.x with full ecosystem 2.19.0-2.21.0 all carry a HIGH (CVSS 8.7) vulnerability per Sonatype. 2.18.7 is the latest clean patch where all Jackson modules are released. * fix(security): remove hardcoded jackson 2.17.2 override in k8s-operator, inherit 2.18.7 from root * fix(security): upgrade gson 2.11.0 → 2.13.1 (Medium CVE) * fix(security): replace 436-line pom.mustache with minimal stub The openapi-generator-maven-plugin writes target/generated-sources/swagger/pom.xml at build time with hardcoded jackson 2.17.1. Snyk --all-projects picks up every pom.xml on disk and flags it as HIGH. The generated pom.xml is never packaged into any JAR or Docker image — it is a generator artefact. The actual runtime jackson version comes from the module pom inheriting jackson.version=2.18.7 from the root. Replace the 436-line verbatim upstream template (maintained just to change 2 version lines) with a 10-line coordinate-only stub. The generated pom.xml will have no <dependencies> block, so Snyk finds nothing to flag.	2026-05-07 09:19:10 +00:00
Sid	c24e5098ce	fix(playwright): unblock glossary bulk import after modal close (#27952 ) * fix(playwright): unblock glossary bulk import delete loop after modal close The trailing `getByRole('dialog').getByRole('img').click()` at line 267 fired after the bulk-import modal had already been closed and asserted not visible. It would either miss or grab a residual Ant modal element, leaving a `.ant-modal-mask` attached over the page. The mask was invisible to the accessibility tree but intercepted pointer events, so the next `settingClick` in the delete-properties loop hung waiting for the `customProperties.glossaryTerm` card to become actionable until the 180s test timeout. Replace the bogus click with a `.ant-modal-mask` count assertion so the next step only proceeds once the overlay has detached, and gate each loop iteration on a `waitForURL` to the glossary-term detail page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(playwright): use toBeAttached over .ant-modal-mask for modal close gate Drop the `.ant-modal-mask` count assertion — it leaks a UI-library internal class into the test and would match unrelated modals. The bulk-import modal is mounted via `{isModalOpen && (...)}` (BulkImportVersionSummary.component.tsx), so the entire overlay subtree unmounts atomically when closed. Asserting the existing `bulk-import-details-modal` testid is `not.toBeAttached()` waits for that unmount and guarantees no backdrop is left intercepting clicks — stronger than `not.toBeVisible()`, which would pass mid-animation while the overlay wrapper is still in the DOM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Siddhant <siddhant@MacBook-Pro-678.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 08:23:53 +00:00
Rohit Jain	e48f3d7ead	Upvote/Downvote Icon Loses Primary Color on Blur After Liking Entity Page (#27898 ) * Upvote/Downvote Icon Loses Primary Color on Blur After Liking Entity Page * fixed bg color * minor fix	2026-05-07 12:32:07 +05:30
Sriharsha Chintalapani	b837ade95a	docs(github): require issue link, design, tests, UI recording in PR template (#27891 ) Expands `.github/pull_request_template.md` to require a linked issue, a high-level design (for large PRs), a structured Tests section (use cases, unit + coverage %, backend/ingestion integration tests, Playwright, manual steps), and a UI screen recording for any UI change. Adds a `/pr-checklist` skill that walks the template, gathers evidence, and drafts the PR body before opening via `gh pr create`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 08:05:56 +02:00
Sriharsha Chintalapani	3beb1e020b	Improve cache warmup configuration and availability (#27948 ) * Fix cache warmup app config rendering * Add optional relationship cache warmup * Restore relationship repository in warmup test * Update generated TypeScript types * Disable cache warmup when cache is unavailable * Address cache warmup review comments * Address Copilot cache warmup comments * Memoize app detail tabs --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-05-06 21:56:07 -07:00
Nikhil Chennam	a136325cf0	fix: API response for TableColumnCountToBeBetween (#27900 ) * Fix typo in tablecolumnCount Changes: - Remove redundant None checks for min/max bounds (already have defaults in base class) - Add type: ignore for Optional[float] comparison to satisfy type checker - Update test to assert exact result message instead of substring matching - Test now verifies full message format including min/max bound values	2026-05-07 10:15:39 +05:30
Eugenio	adcdd345be	Fix recognizer inclusion based on language (#27919 )	2026-05-07 00:16:20 +00:00
Sriharsha Chintalapani	f9d3c85d20	fix(search): restore live settings on per-entity promote path (#27920 ) * Restore live index settings on per-entity distributed-promote path DefaultRecreateHandler exposes two finalization paths: - finalizeReindex(...) — centralized end-of-job promotion. Calls applyLiveServingSettings + maybeForceMerge before the alias swap, reverting the bulk overrides (refresh_interval=-1, replicas=0, async translog) back to live values (refresh=1s, replicas=1, durable translog). - promoteEntityIndex(ctx, ok) — per-entity promotion. Used by the distributed search-indexer's "promote as soon as all partitions for an entity complete" callback (DistributedSearchIndexExecutor.promoteEntityIndex). Swaps the alias and cleans up old indices — but never restored live settings. When an entity finishes its partitions before the final reconciliation (typically the smallest entities — e.g. knowledge `page` with ~11 rows), its index is promoted via the per-entity path, the alias swap succeeds, and the bulk-build overrides become the new live settings. refresh_interval stays at -1 in production, so live writes after the reindex are buffered in the translog and never reach searchable segments until a manual _refresh. Externally this surfaces as "create an article, hierarchy is empty until I re-trigger reindex" — exactly the user-reported bug. Mirror the finalizeReindex sequence by calling applyLiveServingSettings (and maybeForceMerge for parity) at the top of the promote block in promoteEntityIndex, before the alias swap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Wire jobData into per-entity reindex promotion handler DefaultRecreateHandler.applyLiveServingSettings reads from the handler's jobData field (live + bulk index-settings overrides on the EventPublisherJob). The per-entity distributed-promotion path in DistributedSearchIndexExecutor created its own DefaultRecreateHandler instance and never called withJobData(jobData) on it. With jobData=null, buildRevertJson returns null and applyLiveServingSettings silently no-ops — meaning the previous fix (`b272de85f9`) never actually re-applied live settings on the per-entity promote path, even though the call was reached. currentJob.getJobConfiguration() is the EventPublisherJob the strategy created. Wire it into the new handler at construction time, mirroring the withJobData call DistributedIndexingStrategy already makes on the strategy's own handler instance. With this change, the per-entity promote path now logs "Applying live serving settings to staged index '...' for entity 'page': {\"number_of_replicas\":1,\"refresh_interval\":\"1s\", ...}" before the alias swap, and post-promotion `_settings` show refresh_interval=1s instead of the stuck -1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Add regression test for live serving settings on per-entity promote Asserts that DefaultRecreateHandler.promoteEntityIndex calls searchClient.updateIndexSettings with the live-revert JSON (refresh_interval=1s, replicas=1, translog.durability=request) before swapping the alias, given a handler with bulk overrides wired through withJobData. Without the two preceding fixes the assertion fails with "Wanted but not invoked" — applyLiveServingSettings was never reached on the per-entity promotion path, so the staged index inherited refresh_interval=-1 and post-promotion live writes never became searchable until a manual _refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Expand unit coverage around the per-entity promotion contract DefaultRecreateHandlerTest.PromoteEntityIndexTests: - testPromoteEntityIndexAppliesSettingsBeforeAliasSwap: InOrder verification that updateIndexSettings runs BEFORE swapAliases. Catches a swap-then-revert misordering (which would briefly serve live reads against refresh=-1 settings). - testPromoteEntityIndexForceMergesWhenConfigured: forceMerge(staged, 1) is invoked when bulkIndexSettings.forceMergeOnPromote=true. Catches a regression where the force-merge call gets dropped without anyone noticing. - testPromoteEntityIndexSkipsSettingsWithoutJobData: locks in the safe no-op behavior when a handler is constructed without withJobData. Documents that no-jobData → no settings call (vs. crash or silent revert to defaults). DistributedSearchIndexExecutorTest: - initializeEntityTrackerWiresJobDataIntoDefaultRecreateHandler: triggers the private initializeEntityTracker with currentJob holding a populated jobConfiguration and verifies recreateHandler.withJobData(jobData) is called on the per-entity handler. This catches the second half of the original regression: even if applyLiveServingSettings is reached on promoteEntityIndex, jobData=null makes it a silent no-op. Future edits that drop the wiring or move handler construction elsewhere will fail here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add integration test for live settings restoration after alias promotion Triggers SearchIndexingApplication with bulkIndexSettings configured (refresh_interval=-1, number_of_replicas=0, translog.durability=async), waits for the run to terminate, then queries _settings on the promoted table_search_index alias against the real OpenSearch/Elasticsearch container (via TestSuiteBootstrap.createSearchClient()). Asserts that each concrete index resolved by the alias has the live values applied (refresh=1s, replicas=1, translog.durability=request) and not the bulk overrides. This is the end-to-end counterpart to the unit-level regression test in DefaultRecreateHandlerTest. Catches the same class of bug at the layer where it actually surfaced in production: an alias swap that completed successfully according to logs but left the new live index unsearchable because refresh was disabled and writes were buffered indefinitely. Modeled on SearchIndexingFieldsParityIT for run-trigger / poll structure; adds the post-completion _settings verification step that no other IT performs today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address PR review: harden settings revert + lock InOrder + drop redundant test DefaultRecreateHandler: move applyLiveServingSettings + maybeForceMerge inside the try/finally that unregisters the staged index. Without this, a transient OS/ES failure during _settings update or _forcemerge propagated out before the finally ran, leaving searchRepository.unregisterStagedIndex permanently registered — so live writes kept routing to a staged index nothing reads from. Same fix applied to finalizeReindex for consistency (its window is shorter since it runs at end-of-job, but the leak shape is identical). Per gitar-bot review. DefaultRecreateHandlerTest: - testPromoteEntityIndexAppliesLiveServingSettingsBeforeSwap: replace independent verify() calls with InOrder so the test actually locks the "settings before alias swap" ordering its name and the PR description promise. A swap-then-revert refactor would have passed before this. Per Copilot review. - Drop testPromoteEntityIndexAppliesSettingsBeforeAliasSwap (the standalone InOrder test added in the previous commit) — folded back into the test above, which now covers both ordering and JSON content in one place. - Add testPromoteEntityIndexUnregistersStagedIndexOnSettingsFailure — regression test for the gitar-bot fix above. Verified to fail with "IllegalState connection reset" when the calls are moved back outside the try block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Drop verbose explanatory comments from promote-path edits The why-it's-in-a-try and why-jobData-is-wired blocks read like commit messages, not code annotations. Tests and commit history carry the rationale; the code itself reads fine without them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Close Rest5Client in IT _settings helper readIndexSettings opened a Rest5Client and never closed it, leaking HTTP connections on test re-runs. Wrap in try-with-resources. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Tighten SearchIndexAliasPromotionIT against false-positive runs Two reasons the IT could pass without the regression: - waitForLatestRunSuccess accepted activeError, which maps to COMPLETED_WITH_ERRORS. In that path EntityCompletionTracker can invoke promoteEntityIndex(..., false), the staged index is deleted, and the alias stays on the pre-existing live index. The pre-existing index already has live settings, so the _settings assertions pass against it without exercising the promotion path. - readIndexSettings on the alias would resolve to that pre-existing concrete index even after a no-op promotion, so the assertions were never actually checking the staged index. Reject anything other than success/completed, and assert the alias resolves to a _rebuild_ index — proving the swap moved the alias to a freshly staged index. Per Copilot review on PR #27920. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Harden alias promotion: defer canonical delete, hard-fail on empty aliases, surface in job status Four coupled changes that fix latent silent-failure paths in the per-entity and end-of-job promotion code (predates the recent regression but lands together since they touch the same blocks). 1. Empty aliases is a hard failure `getAliasesFromMapping` returning an empty set used to fall through to a skip-swap WARN, then log "Promoted staged index..." and record `promoteSuccess`. Canonical was already pre-deleted at that point, so the alias resolved to nothing and operators got no error signal. Now: log structured ERROR, call `recordPromotionFailure`, return without deleting canonical or claiming success. Same fix in both `promoteEntityIndex` and `finalizeReindex`. 2. Defer canonical-index deletion until swap success Old order: delete canonical → swapAliases (with canonical-name in the alias set). If swap fails after delete, the canonical name has nothing to resolve to → live data unavailable. New order: - swapAliases of all non-canonical-name aliases (atomic move from canonical to staged). If this fails, canonical still serves with all original aliases — no data loss. - Delete canonical (only if its name is needed as alias). If this fails, parent aliases work; canonical-name lookups still hit the old index until retry — degraded, not lost. - addAliases for the canonical name on staged. If this fails (data loss path: canonical was deleted but alias-add failed), mark dataLossPromotions; operator alarm. 3. Promotion outcomes affect job status `DefaultRecreateHandler` tracks `failedPromotions` and `dataLossPromotions` sets. `RecreateIndexHandler` interface exposes them. `SearchIndexExecutor.determineStatus` and `DistributedIndexingStrategy.determineStatus` now consult both: - any data-loss promotion → ExecutionResult.Status.FAILED - any failed promotion (no data loss) → COMPLETED_WITH_ERRORS Distributed path checks both the strategy's handler and the per-entity executor's handler (different instances, both can record failures). 4. Structured failure log markers Replace single-line ERROR with `[ALIAS_PROMOTE_FAILED phase=... entity=... stagedIndex=... canonicalIndex=... aliases=...]` markers at every promotion-fail exit (empty-aliases / swap1 / delete-canonical / swap2 / exception). Each line states whether canonical was deleted and what the blast radius is, so operators can grep and triage without reading code. Tests: - testPromoteEntityIndexEmptyAliasesIsHardFailure - testPromoteEntityIndexCanonicalNotDeletedWhenStep1Fails - testPromoteEntityIndexThreeStepSwapOrder (InOrder swap1 → delete → swap2) - testPromoteEntityIndexFlagsDataLossOnAddAliasFailure - SearchIndexExecutor: determineStatusFlagsPromotionFailuresAndDataLoss - DistributedIndexingStrategy: determineStatusFlagsPromotionFailuresFromEitherHandler All 146 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Consolidate finalizeReindex and promoteEntityIndex into one core path Previously two separate methods with ~80 lines each that did the same thing. PR #25594 forked the code (per-entity vs end-of-job), PR #27865 added a critical settings-revert step to one half but missed the other — that was the original regression. The duplication itself is the regression source, so collapse it: both methods are now thin wrappers around a single private `promote(EntityReindexContext, boolean)` core. Future features land in one place by construction; can't drift. Behavior consolidation: - Single source of truth for aliasesToAttach: the EntityReindexContext fields (existingAliases ∪ canonicalAliases ∪ parentAliases). These are populated by recreateIndexFromMapping at stage-create time and read aliases from the live index — strictly a superset of what the old getAliasesFromMapping derived from IndexMapping.json (preserves operator-added aliases). - getAliasesFromMapping deleted. promoteEntityIndex no longer fetches IndexMapping at promote time; it reads everything from the context the caller built. - shouldPromote / staged-delete-on-failure / settings revert / empty- aliases hard-fail / three-step swap / cleanup / unregister-staged: all in one method body now. Caller-side semantic change: any code path that called promoteEntityIndex with a context that had populated existingAliases/canonicalAliases/ parentAliases (which is all production callers — DistributedSearchIndex Executor and SearchIndexExecutor both already populate them) is unaffected. A caller that built a bare context with only entity/canonical/staged set would previously have re-derived aliases from IndexMapping; now it hits the empty-aliases hard fail. This is strictly safer (fail loud beats silent skip-with-success) and we've audited callers. Tests: - All 17 existing PromoteEntityIndexTests updated to populate the context with the alias fields they previously depended on getAliasesFromMapping to produce. One test ("Should handle null indexMapping gracefully") rewritten to "Empty aliases on context is handled" — same behavior, new wording for the new model. - Old GetAliasesFromMappingTests nested class deleted — exercised the removed method. - New EntryPointParityTests nested class with 3 tests that explicitly run the same EntityReindexContext through both finalizeReindex and promoteEntityIndex and assert byte-for-byte identical alias state, deleted-indices set, and failure-tracking fields. These pin the two entry points together against future drift. Integration test: - Added perEntityPromotionIsIdempotentAcrossRepeatedRuns to SearchIndexAliasPromotionIT. Triggers the app twice, asserts the second run produces a different _rebuild_ concrete index and that live settings still apply — exercises the full pre-existing-canonical → three-step-swap path which is what production actually does. Total: 160 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address PR review: post-state checks, FAILED listener, hermetic IT, InOrder Five additional behavioral fixes from Copilot review on PR #27920. 1. delete-canonical and addAliases: detect failure via post-state, not via try/catch. ElasticSearchIndexManager#deleteIndexWithBackoff and #addAliasesInternal both swallow transport exceptions and return void (same shape on the OS side), so the existing try/catch could never observe a failed delete or alias-add. After deleteIndexWithBackoff, verify the canonical no longer exists; after addAliases, verify the canonical-name alias is actually attached. If either post-condition fails, log [ALIAS_PROMOTE_FAILED reason=delete-not-acknowledged] / [reason=alias-not-attached] and mark the promotion failed (data loss for the alias-not-attached path). 2. SearchIndexExecutor.executeSingleServer: wire onJobFailed for the new FAILED status. Previously the listener chain only had callbacks for COMPLETED / COMPLETED_WITH_ERRORS / STOPPED, so promotion-driven FAILED ended without populating jobData.failure or notifying observers. Pass in an IllegalStateException naming the data-loss entities so the app run record carries the right failure context. 3. SearchIndexAliasPromotionIT trigger payload: explicitly set liveIndexSettings (1s/1/request), liveIndexSettingsByEntity (empty), and useDistributedIndexing=true. /v1/apps/trigger/X merges the payload into the persisted config rather than replacing it, so without these the test could be affected by previous local config or silently exercise the single-server path. The hard-coded post-promotion assertions are now anchored to values the test itself supplies. 4. testPromoteEntityIndexForceMergesWhenConfigured: replace standalone verify() with InOrder(forceMerge → swapAliases) so a refactor that swaps aliases first and merges afterward fails the test instead of passing. All 148 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Wrap post-state checks: indexExists / getAliases throws no longer escape Post-state verification I added in `9a7fa49494` (indexExists after delete, getAliases.contains(canonical) after add) called the search client directly. If those calls themselves threw — network timeout, transport error — the exception escaped promoteWithDeferredCanonicalDelete and was caught by the outer phase=exception handler with markPromotionFailed( dataLoss=false). For the getAliases case the canonical index has already been deleted at that point, so dataLoss=false misclassifies a real data- unavailability state. Three small helpers: - safeIndexExists(client, index, entityType): gate-time check; returns false on throw (conservative — skip delete attempt; if canonical actually exists, step-3 addAliases will fail with name collision and the alias-attached post-check will record the right blast radius). - checkIndexExists(client, index): tristate Boolean for post-delete check; null on throw means "couldn't determine state". - checkAliasAttached(client, staged, alias): tristate Boolean for post-add check; null on throw means "couldn't determine state". Caller logic: - delete-canonical post-check returning null → markPromotionFailed( dataLoss=false). Conservative: we don't know if delete actually took. - add-aliases post-check returning null → markPromotionFailed( dataLoss=true). Canonical IS deleted; alias state unknown is the worst case. Tests: - testPromoteEntityIndexHandlesIndexExistsPostCheckThrow: gate returns true, post-delete check throws → failed but NOT data loss. - testPromoteEntityIndexHandlesGetAliasesPostCheckThrow: post-add check throws → failed AND data loss (canonical already gone). Per gitar-bot review on PR #27920. All 40 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot review 4232747647: positive-evidence dataLoss, hermetic IT Six findings from copilot-pull-request-reviewer. 1+4. Track canonicalDeleted via positive evidence only (DefaultRecreateHandler). ES/OS indexExists/getAliases/listIndicesByPrefix all swallow transport errors and return false/empty. A "negative" probe result cannot be distinguished from "probe failed". The previous shape blindly trusted probe values and could misclassify a transient failure as data loss when canonical was actually still serving (gate-says-no-but-actually-yes case) or, conversely, as not-data-loss when canonical was actually gone. Now: a `canonicalDeleted` boolean defaults to false and only flips to true after both the delete call and a positive post-state check confirm the index is gone. dataLoss classification uses this flag — never claim data loss without positive evidence canonical was deleted. Added regression test `testPromoteEntityIndexAmbiguousGateProbeIsNotDataLoss` for the gate- ambiguity case. 2. Wire onJobFailed in DistributedIndexingStrategy. Previously only SearchIndexExecutor emitted the listener callback for FAILED. The distributed strategy returned the status but never invoked listeners.onJobFailed, so jobData.failure stayed empty and the AppRunRecord/WebSocket update had no failure context. Now mirrors the single-server behavior with an IllegalStateException naming the data-loss entities. 3. IT: assertEquals("request", durability) instead of assertNotEquals ("async"). The non-equals assertion would pass if a silent translog- revert drop left durability at any non-async cluster default, missing the regression. Pin the exact configured value. 5. IT: assert exactly one concrete index resolves the alias, not at-least-one. A broken swap that leaves the alias attached to BOTH the pre-existing live index AND the new _rebuild_ index would satisfy "any rebuild present" but duplicate search results in production. Use assertEquals(1, settingsByIndex.size()). 6. IT hermeticity: snapshot/restore SearchIndexingApplication's appConfiguration. /v1/apps/trigger/{name} merges payload into the persisted config, so without restore later tests in the suite inherit this test's bulkIndexSettings / liveIndexSettings / useDistributedIndexing values — making suite ordering change what they exercise. Both IT methods now do a try/finally with snapshotAppConfig() + restoreAppConfig(). All 151 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Wait for restore-triggered run to settle in SearchIndexAliasPromotionIT restoreAppConfig POSTs to /v1/apps/trigger/{name}, which both merges the body into the persisted config AND starts a new run. Returning without waiting left SearchIndexingApplication running into the next test class (AppsResourceIT.test_triggerApp_200), which then timed out for 2 minutes on "already running". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix AppsResourceIT.waitForAppJobCompletion case mismatch and timeout The terminal-state check compared status against uppercase "SUCCESS", "FAILED", "COMPLETED", but appRunRecord.json defines the status enum in lowercase ("success", "failed", "completed", ...). The check never matched and the 30s wait silently fell through to the catch block, making it a no-op. test_triggerApp_200 then relied on its 2-minute "already running" trigger retry, which timed out whenever a longer reindex (e.g. SearchIndexingFieldsParityIT's "all entities" reindex) was still in flight. Switch the terminal check to "not running and not started" case-insensitively, and raise the ceiling to 5 minutes so the wait actually covers a long in-flight reindex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Run SearchIndexAliasPromotionIT in the sequential bucket The test triggers SearchIndexingApplication and waits for it to complete, but during the parallel-tests slot other classes can also trigger the same app concurrently. The resulting in-flight run then leaks into AppsResourceIT.test_triggerApp_200 (which runs in the sequential slot) and exhausts its 2-minute "already running" trigger Awaitility. AppsResourceIT is already in the sequential bucket for the same reason. Mirror it for SearchIndexAliasPromotionIT across all seven failsafe profiles (mysql/postgres × elasticsearch/opensearch and the RDF profile) — include in sequential-tests, exclude from parallel-tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot PR review 4233452655 DefaultRecreateHandler.promote: when canonicalIndex is null but stagedIndex is non-null, the early-return previously skipped the finally clause that unregisters the staged index. Live writes could stay routed to the staged index after we bail. Release the registration explicitly before returning. Extend the existing testPromoteWithNullCanonicalIndex unit test to assert the unregister call. SearchIndexAliasPromotionIT.snapshotAppConfig: distinguish "snapshot failed" from "config absent". The previous Map.of() return on exception caused restoreAppConfig to POST an empty body to /v1/apps/trigger/{name}, which is a no-op merge that silently leaks this test's bulk/live setting overrides into downstream tests and starts a spurious app run. Return null on failure so the caller short-circuits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Remove SearchIndexAliasPromotionIT in favor of unit test coverage The IT triggered the bundled SearchIndexingApplication, which made it expensive (full reindex per run, ~1-2 minutes) and a source of bleed-through into AppsResourceIT.test_triggerApp_200 even after moving to the sequential failsafe bucket. The same surface is already covered by unit tests: - DefaultRecreateHandlerTest "Should restore live serving settings on staged index before alias swap" (and the per-step swap-order, data-loss-flagging, post-state-check tests) - DistributedSearchIndexExecutorTest verifies the withJobData(jobConfig) wiring on the per-entity promotion handler Drop the IT and revert its pom.xml include/exclude entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot PR review 4236718653 DefaultRecreateHandler.promote: validate aliasesToAttach BEFORE applyLiveServingSettings/maybeForceMerge so the empty-aliases error path skips wasted I/O and segment churn on a staged index that will never be swapped in. Extend testPromoteWithEmptyContextAliases to verify updateIndexSettings and forceMerge are not invoked. Adjust testPromoteEntityIndexUnregistersStagedIndexOnSettingsFailure to populate canonicalAliases so it still reaches applyLiveServingSettings. Refresh checkAliasAttached Javadoc: with positive-evidence canonicalDeleted tracking, a null result is classified as data loss only when canonicalDeleted=true; otherwise it is degraded (retryable). The previous wording claimed every null was data loss, which no longer matches the call site. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix per-entity promote when canonical is an alias, not a concrete index On every reindex after the first, the canonical name (e.g. openmetadata_table_search_index) is an alias on the previous staged index — not a concrete index. The three-step swap then attempted deleteIndexWithBackoff(canonicalAlias), which OS/ES rejects with illegal_argument_exception ("matches an alias, specify the corresponding concrete indices instead"). The 6-attempt exponential backoff burned ~31s per entity. With 30+ default entity types the SearchIndexingApplication ran past CI's 300s setup budget, blocking every Playwright shard and python-integration setup at "Setup OpenMetadata Test Environment". Two-pronged fix in DefaultRecreateHandler.promote: 1. Detect canonicalIsAlias via getIndicesByAlias(canonicalIndex). When true, take the new promoteByAtomicAliasSwap path: a single swapAliases call atomically moves every alias (parents + canonical) from old → new staged. No name collision, no per-entity delete-by-alias-name, no degraded/data-loss windows. 2. listIndicesByPrefix returns the canonical alias name itself among its results (alongside concrete _rebuild_ indices). Filter that out of oldIndicesToCleanup when canonicalIsAlias, so the cleanup loop's deleteIndexWithBackoff doesn't replay the same 31s backoff. Keep canonical in cleanup when it is concrete — that's where the first-reindex flow drops the original concrete after the three-step swap moves aliases off it. Local repro: SearchIndexingApplication now completes in ~7s instead of hanging past 300s. New unit test testPromoteEntityIndexAtomicSwapWhenCanonicalIsAlias locks the new shape (single swap, no delete-by-alias, old concrete cleaned up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add ALIAS_PROMOTE_BEGIN diagnostic log per entity Local repro on the CI-equivalent stack (mysql + elasticsearch + sample data + ingestion) completes in ~30s with all 60 entities going through the atomic-swap path. CI on the same commit still hangs past 300s. Add a structured log line at the top of every promote() so the next CI run shows which entities reach promotion and what shape (atomic vs three-step) was selected — pinpoints whether a specific entity gets stuck or if reindex never reaches promote(). No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Drop heavy alias-promotion refactor; rely on PR #27930 fix already in main The original "live settings not restored on staged after promote" regression is fixed by PR #27930 (commit `e56abb80d5`), which is already on main: applyLiveServingSettings + maybeForceMerge run before alias swap, and DistributedSearchIndexExecutor wires job configuration into the per-entity DefaultRecreateHandler. That minimal fix is sufficient for the original symptom (newly created entities not searchable until manual _refresh). This commit reverts every file we refactored further on top of that minimal fix back to origin/main: - DefaultRecreateHandler.java — drop deferred-canonical-delete three-step swap, post-state checks, dataLoss tracking, safeIndexExists/checkIndexExists/ checkAliasAttached helpers, promoteByAtomicAliasSwap path, ALIAS_PROMOTE_BEGIN diagnostic - RecreateIndexHandler.java — drop getFailedPromotions / getDataLossPromotions defaults - DistributedIndexingStrategy.java — drop FAILED-status emit and dataLoss aggregation - SearchIndexExecutor.java — drop FAILED-status emit - DistributedSearchIndexExecutor.java — drop @Getter on recreateIndexHandler - The matching unit tests in DefaultRecreateHandlerTest / DistributedIndexingStrategyTest / SearchIndexExecutorControlFlowTest / DistributedSearchIndexExecutorTest all revert to origin/main. The branch's only remaining contribution is the AppsResourceIT case-mismatch fix in waitForAppJobCompletion (a pre-existing bug discovered while diagnosing). Reason: the further refactor has consistently failed CI on this branch since the first commit. Local repro on the CI-equivalent stack (./docker/run_local_docker.sh -d mysql -m no-ui -s true -i true) showed the canonical-is-alias fix working end-to-end (~30s, 60 entities), but CI still timed out at 300s. Without server-side logs from CI we can't target the remaining gap. PR #27930 already addresses the user-visible regression, so the safest move is to drop the further refactor and ship just the case-mismatch fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skip delete-by-alias-name when canonical is currently an alias After the first reindex, the canonical name (e.g. table_search_index) is an alias on the previous staged, not a concrete index. OpenSearch's listIndicesByPrefix returns the alias name as one of its result keys, which then drives a deleteIndexWithBackoff(canonicalIndex) attempt that fails with "[illegal_argument_exception] matches an alias, specify the corresponding concrete indices instead". The 6-attempt exponential backoff burns ~31s per entity (1+2+4+8+16s); on a 60-entity reindex that wastes ~30 minutes in cleanup with search degraded throughout. Drop the alias name from oldIndicesToDelete when getIndicesByAlias proves canonical is an alias right now. The atomic swapAliases call moves the canonical alias from the old concrete to the new staged in one step; the underlying old concrete is already in oldIndicesToDelete and gets cleaned up normally by the post-swap loop. No three-step swap or deferred-canonical-delete restructure needed. Adjusts testFinalizeReindexPromotesPartialData to use a realistic canonical-concrete setup (no self-alias on its own name — that state cannot exist in real OS/ES) so the new guard does not misfire on the existing test fixture. New unit test testFinalizeReindexSkipsDeleteWhenCanonicalIsAlias locks the new behavior: when canonical is an alias, deleteIndexWithBackoff is never called with the canonical name, and the old concrete rebuild is cleaned up via the swap path. Verified locally on the CI-equivalent stack (./docker/run_local_docker.sh -d mysql -m no-ui -s true -i true): both first reindex (canonical-concrete) and second reindex (canonical-alias) complete in ~8s with no "matches an alias" errors and clean cleanup logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>	2026-05-06 15:16:57 -07:00
Rohit Jain	88b932a90d	Added Missing operators(Contains, Not Contains) for description (#27913 ) * Added Missing operators(Contains, Not Contains) for description * addressed PR comment and fixed unit test * addressed gitar comment * fix playwright * lint fix * removed dead code	2026-05-07 00:05:02 +05:30
Sriharsha Chintalapani	3487bfbcaa	fix(reindex): Stop button + O(N²) cursor init in distributed mode (#27927 ) * Distributed reindex: fix Stop button + O(N²) cursor init Two independent bugs in the distributed search-index pipeline that surface together as "stop does nothing, distributed mode hangs even on a single server" in production. Reproduced both with new tests and fixed. Bug 1 — requestStop only cancels PENDING partitions ==================================================== DistributedSearchIndexCoordinator.requestStop() called partitionDAO.cancelPendingPartitions(jobId), whose SQL is: UPDATE search_index_partition SET status='CANCELLED' WHERE jobId = :jobId AND status = 'PENDING' PROCESSING rows are untouched. workerExecutor.shutdownNow() (added in PR #27876) interrupts the worker threads, but the partition rows the threads were holding stay PROCESSING in the DB. checkAndUpdateJobCompletion needs pending.isEmpty() && processing.isEmpty() to flip STOPPING → STOPPED; PROCESSING never empties because nothing updates the rows. Symptom: strategy's monitorDistributedJob loops forever waiting for a terminal status, the AppRunRecord never finalizes, the UI keeps showing "Running" with a ticking timer based on now() - startTime. Fix: - New SQL `cancelInFlightPartitions` covering PENDING + PROCESSING. - requestStop calls cancelInFlightPartitions and immediately invokes checkAndUpdateJobCompletion to drive STOPPING → STOPPED in-call rather than waiting for the next monitor tick. Test: testRequestStop_ProcessingPartitionsTransitionToStopped reproduces the production scenario (RUNNING job + PROCESSING partitions, user clicks Stop), verifies STOPPED is written before requestStop returns. Verified to fail with the old SQL. Bug 2 — Distributed reader is O(N²) due to getCursorAtOffset ============================================================ PartitionWorker.initializeKeysetCursor calls EntityRepository.getCursorAtOffset(filter, partitionStart) per partition. Underneath that's `dao.listAfter(filter, 1, partitionStart)` — SQL `LIMIT 1 OFFSET partitionStart`, which is O(partitionStart) every call. With partitionSize=10000 and 581k records, partitions start at offsets 0, 10k, 20k, ..., 570k. Total cursor-init scan cost is 0+10k+20k+...+570k ≈ 16.5M rows scanned just to find each partition's starting cursor. Single-server (KeysetBatchReader) does ~581k. ~28× more DB work per job + multi-worker parallelism amplifies into 140× wall-clock slowdown observed in production (1.4k r/s single-server reader vs 10 r/s distributed reader on the same dataset). Fix: - DistributedSearchIndexCoordinator.precomputePartitionStartCursors: one keyset walk per entity type at job initialization time, batching in 10k chunks and recording the cursor at every partition's rangeStart. O(N) total reads per entity type. - DistributedSearchIndexCoordinator.getPartitionStartCursor: O(1) lookup from the precomputed map. - PartitionWorker.initializeKeysetCursor consults the cache first; falls back to the existing OFFSET path on a miss (recovery scenarios where another server initialized the partitions and this server picks one up). Test: initializeKeysetCursorHitsPrecomputedCacheAndSkipsOffsetFallback verifies the cache hit short-circuits the OFFSET fallback (using verify(repository, never()).getCursorAtOffset(...)).	2026-05-06 10:55:56 -07:00
Ram Narayan Balaji	8dd98fa765	fix(it): Stabilize Flaky integration tests (#27546 ) * fix(it): stabilize three flaky integration tests - TagResourceIT.test_searchTagByClassificationDisplayName: raise Awaitility timeout from 30s to 90s — under full-suite concurrent load the tag search index can lag well past 30s before the tag is discoverable by classification display name - GlossaryOntologyExportIT.testExportGlossaryAsRdfXml: replace legacy model.write("RDF/XML") with RDFDataMgr.write(RDFXML_PLAIN) — the legacy Jena API attempts external DTD/entity resolution from w3.org, hanging ~104s in network-isolated CI before the client times out at 60s; RIOT writes purely in-memory with no network I/O - SearchResourceIT.testExportWithFromAndSizeForPagination: add _id as a final tiebreaker sort on export requests in both ElasticSearch and OpenSearch managers; from/size pagination without a unique tiebreaker produces duplicate rows across pages when concurrent CI tests mutate the same index between requests; also deduplicate the redundant name.keyword secondary sort when the caller already sorts by name.keyword * fix(search): use id.keyword instead of _id for export sort tiebreaker _id is an Elasticsearch meta-field that requires fielddata to sort on, disabled by default. Use the indexed id.keyword sub-field instead, which is a proper keyword field with doc values and is sortable without any cluster setting changes. * fix(it): retry pagination assertion in Awaitility to tolerate transient index shifts from/size pagination on a shared search index can return duplicate rows across two consecutive requests when concurrent tests mutate the same index in between. Wrapping both page fetches and the assertion in untilAsserted lets the check retry until the index stabilises rather than failing on the first transient collision. * revert(search): drop id.keyword tiebreaker; rely on test-side Awaitility retry * fix(search): strengthen pagination test assertions and restore id.keyword tiebreaker sort * fix(it): revert RdfRepository prod change; increase GlossaryOntologyExportIT timeout to 150s for Jena DTD stall in CI * fix(it): restore tag search index aliases in IndexTemplateIT after index deletion testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference deletes the physical openmetadata_tag_search_index and previously restored it with a bare PUT — leaving all aliases (openmetadata_tag, openmetadata_classification, openmetadata_all) missing for the remainder of the run. This caused TagResourceIT.checkCreatedEntity to time out (searches on tag_search_index hit an empty bare index) and delete_by_query cleanup ops to fail with index_not_found_exception on openmetadata_tag. Fix: replace the bare PUT with Entity.getSearchRepository().createIndex() which recreates the physical index with proper OpenMetadata mappings and restores all aliases. * fix(it): isolate IndexTemplateIT tag test to avoid wiping production search index testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference was deleting the production openmetadata_tag_search_index backing index, racing with TagResourceIT.test_searchTagByClassificationDisplayName which polls that index for 90s. Use a test-scoped index name matching the template pattern instead, consistent with the other tests in this class. * fix(it): make testClaimPendingIncludesRetryStatuses race-tolerant The production SearchIndexRetryWorker (4 daemon threads, 5s poll) races the test by calling the same global claimPending SQL. Replace the brittle size-based assertion with an Awaitility loop that checks claimedAt != null for each inserted record — proving claimPending's SQL filter accepted the record's status regardless of which thread won the race. * fix(it): avoid stale entityStatus in patch_addDeleteReviewers The GlossaryTermApprovalWorkflow fires asynchronously when reviewers are added, setting entityStatus=IN_REVIEW. The final patch sent the stale entityStatus=APPROVED from the previous response, causing a spurious IN_REVIEW→APPROVED transition in the diff which requires the caller to be a reviewer — admin is not. Re-fetch the entity before the reviewer removal so the diff contains only the reviewer change. * fix(it): handle claimedAt reset in testClaimPendingIncludesRetryStatuses updateFailureAndRetryCount sets claimedAt=NULL after the worker processes a record. Add retryCount > 0 as a secondary proof-of-claim signal so records that were claimed, processed, and had claimedAt reset are still counted — covers the FAILED exhaustion path and intermediate PENDING_RETRY_* states where claimedAt is temporarily null. * fix(it): avoid governance workflow race in testApplyFeedback_withRecognizerMetadata repository.create() publishes a ChangeEvent that triggers ApplyRecognizerFeedbackImpl asynchronously. That workflow call races with the direct applyFeedback below: by the time the workflow runs, the GENERATED tag is already removed by the direct call, so getRecognizerIdFromTagLabel returns null and the workflow falls back to ALL recognizers, contaminating recognizer2. Fix: insert directly to DAO (bypassing publishChangeEvent) so the governance workflow is never triggered for this unit-level test. * fix(it): handle worker deleteByEntity path in testClaimPendingIncludesRetryStatuses The worker's processRecord takes the delete path (removeStaleEntityById + deleteByEntity) when resolveEntityReference returns null but entityId is non-empty — which applies to our fake UUID test records. If the worker wins and deletes a record, findByStatus finds nothing and the assertion fails. Fix: track which IDs are still visible in any status. An ID absent from all statuses was deleted by the worker after a successful claim — deleteByEntity is only reached after claimPending accepted the record, so absence is equally valid proof that claimPending's SQL filter worked. * fix(it): re-fetch before reviewer patches in test_glossaryTermReviewersMultipleUpdates Same root cause as patch_addDeleteReviewers: GlossaryTermApprovalWorkflow fires asynchronously after reviewers are added, setting entityStatus=IN_REVIEW. Subsequent patches using the stale APPROVED status from the previous response trigger a spurious IN_REVIEW→APPROVED transition, rejected because admin is not a reviewer. Re-fetch before each subsequent patch to avoid the stale status. --------- Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>	2026-05-06 17:50:13 +00:00
Chirag Madlani	83b9e55122	fix(test): flaky container and activity task specs (#27942 )	2026-05-06 15:25:17 +00:00
Mohit Yadav	e56abb80d5	Fix Entity Promotion issue (#27930 ) Some checks failed Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Has been cancelled Details Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Has been cancelled Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Has been cancelled Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Has been cancelled Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Has been cancelled Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Has been cancelled Details Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Has been cancelled Details Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Has been cancelled Details Java Checkstyle / java-checkstyle (push) Has been cancelled Details Maven Collate Tests / maven-collate-ci (push) Has been cancelled Details OpenMetadata Service Unit Tests / Detect Changes (push) Has been cancelled Details Publish Package to Maven Central Repository / publish-maven-packages (push) Has been cancelled Details	2026-05-06 15:20:49 +02:00

1 2 3 4 5 ...

16392 commits