Commit graph

16392 commits

Author SHA1 Message Date
Sriharsha Chintalapani
d3bbbefe37
fix(rdf): dedupe lineage edges, surface Fuseki failures, port distributed-mode improvements (#27999)
* fix(rdf): dedupe lineage edges and broaden PROV-O coverage

The RDF Knowledge Graph endpoint was emitting two edges per lineage
relationship — once as `om:UPSTREAM` (forward) and once as
`prov:wasDerivedFrom` (reverse) — because the parser preserved each
predicate's native subject/object orientation instead of canonicalizing
both into a single `(upstream, downstream)` edge.

Also extend PROV-O coverage so external SPARQL clients can use the W3C
Provenance vocabulary directly:
- `prov:Entity` / `prov:Activity` / `prov:Agent` class typing on
  datasets / pipelines / users
- `prov:wasAttributedTo` mirror of `om:owners`
- `prov:generated` (inverse of existing `wasGeneratedBy`) and `prov:used`
  on lineageDetails so the Entity → Activity → Entity chain is complete
- `prov:hadPlan` + `prov:Plan` for SQL transformation recipes
- `prov:startedAtTime` / `prov:endedAtTime` on Activity instances
- `prov:wasAssociatedWith` Activity → Agent linking
- `prov:invalidatedAtTime` on soft-deleted entities

Other RDF cleanups in the same area:
- LineageDetails URIs are now deterministic (driven by from/to ids
  instead of a timestamp), so re-indexing collapses duplicate Activity
  resources via the existing DELETE+INSERT idempotency
- Skip emitting the redundant `om:owners` JSON-string literal — the
  mapped path already produces clean `om:hasOwner <agent>` triples
- Skip empty `[]` array literals in the unmapped path
- Propagate failures from `RdfRepository.{addRelationship,
  addLineageWithDetails, bulkAddRelationships,
  bulkAddGlossaryTermRelations}` instead of silently swallowing them,
  so downstream callers can surface the failure

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf-index-app): surface Fuseki failures in app run record

Per-entity and per-batch failures from the RDF index app used to be
logged via SLF4J only — they never made it into the AppRunRecord, so
the UI/run history showed "completed" even when every entity had
silently failed to write to Fuseki.

- `RdfBatchProcessor.processEntities` now captures the last error per
  entity, returns it in `BatchProcessingResult.lastError`, and
  accumulates relationship-processing failures into the same result.
- Relationship and lineage processing methods (`processBatchRelationships`,
  `processLineageRelationship`, `processGlossaryTermRelations`) return
  structured results with failure counts and last-error messages instead
  of `void`, so failures are visible to the partition worker.
- `RdfIndexApp` records the failure on `jobData` for both the
  distributed and non-distributed code paths, so users see a real
  error message in the run history (e.g.
  "Failed to write entity X to Fuseki: ConnectException").

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* perf(rdf-index-app): port distributed-mode improvements from SearchIndex

The RDF distributed-indexing fork was lagging behind several SearchIndex
improvements that addressed concrete reliability and throughput issues.
Port them across:

Core perf / reliability
- Precomputed partition start cursors: coordinator walks each entity
  once via keyset pagination at job init and caches the boundary cursor
  per (jobId, entityType, rangeStart). Workers consult the cache before
  falling back to the OFFSET-based path. Eliminates the previous O(N²)
  per-partition cursor lookup.
- `cancelInFlightPartitions` + `requestStop` + `checkAndUpdateJobCompletion`
  on the coordinator. Stop now cancels both PENDING and PROCESSING
  partitions in a single SQL update and immediately drives the job
  status from STOPPING → STOPPED, so the UI status no longer hangs
  while workers drain.
- Selective field hydration: `RdfPartitionWorker.readEntitiesKeyset`
  uses `ReindexingUtil.getSearchIndexFields(entityType)` instead of
  `List.of("*")`, avoiding expensive fetchers (e.g. fetchAndSetOwns)
  per batch.
- Partition heartbeat thread: virtual thread refreshes
  `lastUpdateAt` every 30s for partitions actively being processed by
  this server, so the stale reclaimer no longer interrupts active work.
- `MAX_IN_FLIGHT_PARTITIONS_PER_SERVER = 5` backpressure: claim path
  rejects when the server already holds 5 PROCESSING partitions, giving
  fair distribution across pods. Verified the existing claim DAO uses
  `FOR UPDATE SKIP LOCKED` for both MySQL and Postgres.
- Gate WebSocket stat broadcasts during the STOPPING phase so the
  Quartz-scheduler-driven STOPPED status push isn't overwritten.

Multi-server scaffolding (single-pod is unaffected)
- `RdfPollingJobNotifier`: DB-polling discovery for other server pods
  to find an in-flight RDF reindex they can join.
- `RdfEntityCompletionTracker`: per-entity-type partition tracking with
  callback firing once all partitions for an entity complete, foundation
  for early per-entity index promotion.

Tests: precomputed-cursor cache lookup, in-flight backpressure,
cancelInFlight delegation, completion tracker callback semantics,
notifier start/stop.

DAO additions on `rdf_index_partition`:
- `cancelInFlightPartitions(jobId, now)` — covers both PENDING and
  PROCESSING in one statement
- `countInFlightPartitionsForServer(jobId, serverId)` — backpressure
- `countPartitionsByStatus(jobId, status)` — used by completion check

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ui-apps): hide misleading data on synthetic 'CurrentConfig' row

When an app has no run history, AppRunsHistory fabricated a synthetic
placeholder row that looked like a real run — `runType: "CurrentConfig"`,
a fake `Run At` timestamp pulled from `appData.updatedAt`, an
ever-growing `Duration` (`now − updatedAt`), and an active `Stop` button
that targeted nothing.

Render `--` for `Run At`, `Run Type`, and `Duration` on synthetic rows,
and hide the `Stop` button so users no longer see "Run now → 19-minute
Running with Stop button" when the actual job never registered. Real
app runs are unaffected — they still display `runType` from the
backend (OnDemandJob, Hourly, Daily, Custom, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): address PR review findings

Four issues raised in PR #27999 review:

- **Cursor format consistency in walkAndRecord** (bug):
  The defensive branch produced cursors via a custom `{name, id}` map
  while the regular path used `repo.getCursorValue()`. For entities
  with quoted names these encodings diverge — a quoted-name entity
  could land in the cache with a cursor incompatible with what the
  worker fetches via keyset pagination. Track the last seen entity
  reference and run it through `repo.getCursorValue()` in both paths.
  `encodeBoundaryCursor` is removed.

- **Adaptive scheduling in RdfPollingJobNotifier** (perf):
  The previous implementation woke the scheduler thread every 1s and
  short-circuited inside the poll method when idle. Reschedule the
  task at the appropriate interval (1s active / 30s idle) when
  `setParticipating` flips, so the thread genuinely sleeps when idle.

- **Cursor cache cleanup on startup recovery** (edge case):
  `partitionStartCursors` was only evicted by `refreshAggregatedJob`
  / `checkAndUpdateJobCompletion`. If a coordinator crashed mid-job
  and never reached either, the cache entry leaked until process
  restart. Add `evictStaleCursorCacheEntries()` invoked by
  `performStartupRecovery` that drops entries for jobs that no longer
  exist in the DB or are already terminal.

- **Consolidate describeError helpers** (quality):
  `describeError`, `describeBulkError`, and `describeLineageError` in
  `RdfBatchProcessor` all walked the cause chain and formatted a
  prefixed message with the same logic. Reduced to a single
  `describeError(prefix, error)` plus a thin `describeEntityError`
  adapter for the per-entity call site.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf-index-app): avoid double workerExecutor.shutdownNow() in stop()

stop() called workerExecutor.shutdownNow() inline AND through
cleanupLocalExecution -> shutdownWorkerExecutor, which broke the
DistributedRdfIndexExecutorTest.stopAndCoordinatorCleanupOnlyTearDownLocalExecutionOnce
verify(workerExecutor, times(1)).shutdownNow() expectation. Drop the
inline call — cleanupLocalExecution is the single owner of the
shutdown path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci: drop redundant DB matrix from openmetadata-service unit tests

The {mysql, postgresql} strategy matrix on openmetadata-service unit
tests doubled CI cost without adding signal: both jobs ran the same
surefire suite. The `-Pmysql` / `-Ppostgresql` profiles are defined
only in `openmetadata-sdk/pom.xml` (lines 190-206), set a single
`test.database` property, and that property is consumed exclusively by
the failsafe plugin (integration tests `*IT.java` / `*IntegrationTest.java`),
which only runs under `-Pintegration-tests` — not enabled here.

`openmetadata-service` itself has zero tests that read `test.database`
or use `MySQLContainer`/`PostgreSQLContainer` (verified by grep). The
only testcontainer-based DB code in the repo lives in
`openmetadata-integration-tests`, a different module that this workflow
doesn't build.

Run the unit suite once. The `openmetadata-service-unit-tests-status`
required-check aggregator is unaffected (it depends on the renamed job
which still has the same name).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): address Copilot PR review findings

Six correctness issues raised on PR #27999:

- **Lineage-details DELETE was too broad** (RdfRepository): the cleanup
  step deleted *all* `<fromUri> om:hasLineageDetails ?d` triples,
  so reindexing one (fromId, toId) edge wiped lineage-details links
  for every other downstream of the same source entity. Pin the
  delete to the specific `<fromUri> om:hasLineageDetails <detailsUri>`
  triple. Same with prov:generated cleanup — anchor it to the
  specific detailsUri instead of any details resource.

- **Predicate not flipped during canonicalization** (RdfRepository):
  `parseEntityGraphEdgesFromResults` swapped subject/object for
  reverse-direction predicates (`prov:wasDerivedFrom`,
  `prov:wasInfluencedBy`) but kept the original predicate URI on the
  resulting EdgeInfo. Exported graphs could carry semantically
  invalid triples like `<upstream> prov:wasDerivedFrom <downstream>`.
  Add `forwardEquivalentPredicate` to substitute the OM-native
  forward predicate when the direction flips.

- **`dct:modified` was an invalid xsd:dateTime** (RdfPropertyMapper):
  `entity.getUpdatedAt().toString()` returns the epoch-millis Long as
  a string, but the literal was tagged `xsd:dateTime`. Convert via
  `Instant.ofEpochMilli(...).toString()` so the lexical form matches
  the type — same fix already in place for prov:invalidatedAtTime.

- **Unmapped EntityReference arrays were dropped entirely**
  (RdfPropertyMapper): the previous fix to skip noisy JSON-string
  literals also dropped fields like `domains`, `reviewers`, `voters`
  for entity contexts that don't have a JSON-LD mapping for them —
  the unmapped path was the only path emitting them, so nothing
  landed in RDF. Expand each array element through
  `addEntityReference` so the data still produces proper
  `om:<fieldName> <ref>` triples; mapped-path duplicates are
  collapsed by Jena's Model dedupe.

- **Partition failure detection missed reader errors**
  (DistributedRdfIndexExecutor): the EntityCompletionTracker was fed
  `result.errorMessage() != null`, but `RdfPartitionWorker` can
  increment `failedCount` from `readerErrors` without ever setting
  `lastError`. Use `result.failedCount() > 0` so partitions whose
  failures came from `ResultList.getErrors()` are also marked as
  failed when promoting an entity.

- **`COMPLETED_WITH_ERRORS` was hidden when failedRecords == 0**
  (RdfIndexApp): the coordinator marks a job COMPLETED_WITH_ERRORS
  whenever any partition is FAILED or CANCELLED, including for
  user-initiated stops where no record-level failures accrued. The
  monitor's `completedWithErrors` gate required `failedRecords > 0`,
  so those terminal states never hit `jobData.setFailure(...)` and
  the run record showed success. Drop the failedRecords precondition
  and tailor the fallback message based on whether there are
  record-level failures or partition-level only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): separate relationship failures + type lineage as prov:Activity

Two more PR review findings on #27999:

- **Relationship failures inflated failedRecords stat**: `processEntities`
  was folding relationship/lineage edge failures into `failedCount`,
  which becomes `failedRecords` in the index stats. Records there mean
  entities, computed from entity counts in `totalRecords`. Counting
  per-edge relationship failures could push `failedRecords` above
  `processedRecords`/`totalRecords` and produce nonsensical
  per-entity stats.

  Track them separately: add `relationshipFailureCount` to
  `BatchProcessingResult` and `PartitionResult`. `failedCount` now stays
  entity-level. The completion tracker is fed the broader
  `result.hasAnyFailure()` so partitions where relationship triples
  failed don't get prematurely promoted as success even though their
  entity writes succeeded.

- **`detailsResource` wasn't typed as prov:Activity**: the resource
  carries Activity-shaped predicates (prov:startedAtTime,
  prov:endedAtTime, prov:used, prov:hadPlan, prov:wasGeneratedBy,
  prov:wasAssociatedWith) but only the OM-specific
  `om:LineageDetails` rdf:type. Add an explicit
  `rdf:type prov:Activity` so PROV-O reasoners and federated SPARQL
  clients recognize it as an Activity without having to learn the
  OM type.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): label lineage edges relative to focal node

The Knowledge Graph view was labeling every edge with relation
type "upstream" as "Upstream" regardless of direction relative to the
focal node. For a focal node F, the raw stored relation `(F, X, upstream)`
means "F is upstream of X" — i.e. X is *downstream* of F. The previous
output labeled both `F → X` and `X → F` edges as "Upstream", which made
bidirectional lineage look like a duplicated relation.

Re-orient the label in `convertEdgesToGraphData` based on whether the
focal is the edge's source or target:
- focal → X → "Downstream"
- X → focal → "Upstream"
- non-focal-touching edges keep the raw relation label.

Reported on a sample-data table with a circular lineage cycle
(`dim_customer ↔ fact_orders`) where both directions showed "Upstream".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): close remaining Copilot review gaps

Three findings from PR #27999's third review pass — all about failure
signals being silently dropped between layers:

- **`RdfIndexApp.processTask` ignored relationship failures**: only
  `result.failedCount() > 0` was treated as a failure, so partitions
  whose Fuseki relationship/lineage writes failed (incrementing
  `relationshipFailureCount` but not `failedCount`) never wrote
  `jobData.failure`. Switch to `result.hasAnyFailure()` and report the
  combined count.

- **`checkAndUpdateJobCompletion` ignored partition `lastError`**: a
  partition can finish COMPLETED with `lastError` set when a relationship
  bulk write was caught and recorded but didn't bump `failedRecords` or
  flip the partition to FAILED. The job would then go to COMPLETED even
  though there were real failures. Treat the presence of any
  `rdf_index_partition.lastError` as an error signal — promote to
  COMPLETED_WITH_ERRORS and aggregate sample errors into the job's
  errorMessage if it was blank.

- **`forwardEquivalentPredicate` mapped to a non-existent
  `om:DOWNSTREAM` URI**: OpenMetadata only stores lineage with
  `om:UPSTREAM` (forward) and `prov:wasDerivedFrom` (reverse PROV-O
  pair); there is no `om:DOWNSTREAM` predicate written anywhere — the
  downstream view is derived by reading the same UPSTREAM edge from the
  other side. Map both `prov:wasDerivedFrom` and `prov:wasInfluencedBy`
  to `om:UPSTREAM` (both are reverse-direction causation predicates: in
  `B wasDerivedFrom A` / `B wasInfluencedBy A` the source is A and
  effect is B, so the canonical forward predicate is the same).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix RDF tag mapper

* Fix all the comments

Cherry-picked from #27562 (without bin/ autogenerated noise).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Align RdfPropertyMapper tests with refactor and isolate ontology export IT

RdfPropertyMapperTest still referenced the removed addVotes helper and
expected addStructuredProperty to dispatch votes — both gone after votes
was added to IGNORED_PROPERTIES. Update the assertions accordingly.

GlossaryOntologyExportIT timed out on the full suite because it flips a
global RDF singleton in @BeforeAll and each test blocks a server thread on
synchronous Fuseki writes. SAME_THREAD only serialized methods within the
class — concurrent classes still raced for server threads. Adding @Isolated
matches the pattern already used by RdfResourceIT for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rdf): align addCertification typing + relationType after predicate flip

Two findings on PR #27999 from the post-cherry-pick review pass:

- **`addCertification` mis-typed glossary-source certifications and
  skipped skos:Concept**: it always emitted `om:Tag` regardless of
  source, even though `resolveTagResource` returns a glossaryTerm URI
  when the certification points at a glossary term. It also didn't add
  `skos:Concept` (or the `createTypeResource("tag")` `skos:Concept` for
  classification tags), so SPARQL queries filtering certification
  targets by `a skos:Concept` missed them while `addTagLabel`-emitted
  tags were findable. Mirror `addTagLabel`: branch on source
  (`Glossary` vs `Classification`), emit the right primary type plus
  `skos:Concept` (glossary) or `om:Tag` (classification), and include
  `om:tagSource`.

- **`relationType` left stale after predicate flip**: when
  `parseEntityGraphEdgesFromResults` flipped subject/object for a
  reverse-direction predicate and rewrote `canonicalPredicate` to
  `om:UPSTREAM`, it kept the original `relationType` derived from the
  reverse predicate. So `prov:wasInfluencedBy` produced an EdgeInfo
  with `relationType=downstream` + `predicate=om:UPSTREAM` —
  internally inconsistent, and the mismatched `edgeKey` prevented
  dedup against an existing UPSTREAM edge with the same endpoints.
  Re-derive `relationType` from the canonical predicate after the
  flip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): close 2 review findings + add parser-helper unit tests

Two outstanding Copilot findings on PR #27999 plus targeted unit
coverage for the helpers that drive lineage canonicalization.

Findings:

- **`colLineageUri` collision risk** (RdfRepository): the deterministic
  key replaced non-alphanumerics in `toColumn` with `_`, so distinct
  column names (e.g. `a-b` vs `a_b`) collapsed onto the same URI, which
  would lose / overwrite column-lineage resources during reindex.
  Append the loop index as a tiebreaker so distinct columns keep
  distinct URIs.

- **`createTypeResource` missing dprod prefix** (RdfPropertyMapper):
  the `getNamespace` switch didn't recognize `dprod`, so
  `RdfUtils.getRdfType("dataProduct")` (returns `dprod:DataProduct`)
  produced an invalid `dprod:DataProduct` URI on the wire. Added the
  `DPROD_NS = https://ekgf.github.io/dprod/` constant and a `dprod`
  case in the switch.

Coverage:

- New `RdfParserHelpersTest` exercises the canonicalization helpers
  via reflection: `isReverseDirectionPredicate` (recognizes
  PROV-O causation predicates, ignores forward predicates),
  `forwardEquivalentPredicate` (both `wasDerivedFrom` and
  `wasInfluencedBy` collapse to `om:UPSTREAM` so dedup works),
  `relativeRelationLabel` (focal-relative Upstream/Downstream
  flipping with all the boundary cases — non-focal edges,
  non-lineage relations, null focal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): merge array contexts before per-field resolution

The third (low-confidence "suppressed") finding on review 4256830399
turned out to be a real duplication: when a field is mapped in one
context map of an array context but absent from another, the previous
processArrayContext ran processContextMappings once per map. The pass
where the field IS mapped emits the proper `om:hasOwner <ref>` triples
(plus `prov:wasAttributedTo`); the pass where the field is absent
falls through to processUnmappedField and emits an additional
`om:owners <ref>` triple. Net: two predicates for the same logical
relationship.

Verified on the live Fuseki: 113 `om:hasOwner` triples vs 112
`om:owners` triples — one set per pass.

Fix: flatten all context maps in the array into a single merged map
once, then iterate entity fields exactly once against that combined
view (later contexts win on key conflicts, matching JSON-LD context
merge semantics). Each field is resolved against the union of
mappings, so the unmapped fallback only fires for fields truly absent
from every context. Net effect: `prov:wasAttributedTo` count is
unchanged, `om:hasOwner` is unchanged, and the redundant `om:owners`
triples disappear.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): close 2 review findings on coordinator finalization race

Two findings from PR #27999 review 4259628860:

- **`checkAndUpdateJobCompletion` early-returned before lastError check
  could promote**: `refreshAggregatedJob` already marks the job COMPLETED
  when partitions all finish without `failedRecords`/`failedPartitions`,
  so `checkAndUpdateJobCompletion`'s subsequent `if (job.isTerminal())`
  short-circuit silently dropped the lastError signal. Move the
  partition-lastError check INTO `refreshAggregatedJob` so both code
  paths produce consistent terminal status — a partition that finished
  COMPLETED but carries a non-null lastError now correctly promotes the
  job to COMPLETED_WITH_ERRORS regardless of which finalizer wins the
  race.

- **`completePartition` / `failPartition` overwrote CANCELLED state**:
  the unconditional partition row update lost a concurrent Stop's
  CANCELLED status if a worker finished its batch after the Stop
  request landed but before noticing it. Add a status-guarded
  `updateIfProcessing` DAO method (UPDATE ... WHERE id = :id AND
  status = 'PROCESSING') and have both completion paths use it; if 0
  rows update, log and skip the side effects (no server-stat increment,
  no refreshAggregatedJob call) so the authoritative CANCELLED status
  stays. Mirrors the pattern SearchIndex's coordinator uses for the
  same race.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2026-05-11 06:14:50 -07:00
Mohit Tilala
3d6fd71de3
Fixes #27950: [Datalake] JSON columns incorrectly typed as STRING for empty dict values (#27951)
* fix: datalake JSON columns incorrectly typed as STRING for empty dict values

* fix: wrap df_row_val with str() for strptime and parse calls to satisfy type checker

* fix: address static check type errors and review comments in datalake utils

* Restore debug logging, fix dead-code fallback, strengthen tests

* Replace lexicographic max() with explicit type precedence in fetch_col_types
2026-05-11 18:02:06 +05:30
Shailesh Parmar
a00a8dcdb4
test: enhance FailedTestCaseSampleData tests with mock Table component (#28028) 2026-05-11 12:04:53 +00:00
Ryad-Lotfi MAHTAL
97e3ae52db
Fixes #22916: Add chart-level lineage for Metabase connector (#26778)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
* fix: add chart-level lineage for Metabase connector

* refactor: extract _get_chart_entity helper and move lookups outside source_tables loop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test_yield_lineage to assert chart-level lineage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add type guards for chart-level lineage to satisfy basedpyright

Guard chart lineage yields with isinstance(from_entity, Table) and
None-check on chart_entity to produce type-safe generator yields,
eliminating reportArgumentType and reportReturnType errors from the
static-checks CI step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: guard optional metabase lineage lookups

* fix: normalize metabase lineage search results

* test: cover metabase lineage fallback cases

* build: use canonical Maven Central URL

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2026-05-11 16:40:49 +05:30
sonika-shah
6c30d82f4c
fix(security): pin libthrift, provided jsonschema2pojo, bump azure-kv/sjm/reactor-netty, exclude netty-epoll (#28010)
* fix(security): pin libthrift 0.23.0 and exclude Jackson 3.x from jsonschema2pojo-core

- Pin org.apache.thrift:libthrift to 0.23.0 in dependencyManagement.
  apache-jena-libs:4.10.0 transitively pulls libthrift:0.19.0 which is
  vulnerable to CVE-2026-43869 (fixed in 0.23.0).

- Exclude tools.jackson.core:jackson-core and jackson-databind from
  jsonschema2pojo-core in common/pom.xml. jsonschema2pojo-core 1.3.x
  switched its internal Jackson to 3.x; the existing exclusion only
  covered the legacy com.fasterxml.jackson.core groupId, so 3.0.2 jars
  were leaking into the runtime classpath despite our annotator code
  using Jackson 2.x exclusively. Removes exposure to:
    - GHSA-2m67-wjpj-xhg9
    - CVE-2026-29062
    - GHSA-72hv-8253-57qq (3.x line)

* chore(security): bump azure-security-keyvault-secrets and simple-java-mail to fix transitive CVEs

- com.azure:azure-security-keyvault-secrets 4.10.0 → 4.10.7
  4.10.7 declares azure-core-http-netty 1.16.4, which uses
  reactor-netty-http 1.2.16. Replaces the second source path of
  reactor-netty-http 1.0.48 in the OM standalone dist.
  Fixes CVE-2025-22227 (the azure-kv path).

- org.simplejavamail:simple-java-mail 8.12.2 → 8.12.6
  Hygiene bump (4 patch versions). Note: simple-java-mail 8.12.6's
  master pom still pins angus-mail to 2.0.3, so the actual angus-mail
  fix for CVE-2025-7962 still relies on OM's existing
  <angus-mail.version>2.0.4</angus-mail.version> dep-management entry,
  which already wins for OM standalone (verified: openmetadata-1.12.7
  dist already ships angus-mail-2.0.4.jar).

* fix(security): switch libthrift fix from version-pin to exclusion; expand reasoning comments

libthrift (CVE-2026-43869):
  Replace the dependencyManagement pin to 0.23.0 with an explicit <exclusion>
  on apache-jena-libs. OM's source tree has zero org.apache.thrift imports and
  no references to RDF Thrift binary serialization (RDF_THRIFT, ThriftConvert,
  RDFFormat.*THRIFT) — the only consumer of libthrift in our dep tree is Jena's
  optional RDF Thrift I/O code path, which OM never exercises.

  libthrift 0.23.0 was published 2026-05-08 and no Jena release yet ships it
  (Jena 6.0.0 and 5.6.0 still ship libthrift 0.22.0, also vulnerable). Pinning
  would force a Jena-uncertified libthrift onto code Jena tests with 0.22.0;
  excluding the unused JAR is cleaner and self-cleaning when Jena bumps.

  Lucene/Solr (also in this dep tree) already excludes libthrift for the same
  reason — confirmed via lucene-solr-grandparent pom.

Jackson 3.x exclusion: expanded the comment in common/pom.xml to record the
upstream state (jsonschema2pojo-core 1.3.3 still pins jackson3.version=3.0.2)
and the verification that build succeeds with the exclusion.

* fix(security): mark jsonschema2pojo-core as <optional> instead of maintaining per-dep exclusion list

Per Copilot review on PR #28010 (line 66 of common/pom.xml): jsonschema2pojo-core
is build-time only — the annotator classes that reference it (PasswordAnnotator,
MaskedAnnotator, etc.) are invoked exclusively by the jsonschema2pojo-maven-plugin
at code-gen time, never on the runtime classpath of any deployed service.

Switch from a growing list of <exclusion> entries (which only caught the deps
known at the time each entry was added) to <optional>true</optional>. This stops
jsonschema2pojo-core AND every transitive dep it pulls — current and future —
from propagating to downstream consumers' runtime classpath.

Effect on the GHSA-2m67-wjpj-xhg9 / CVE-2026-29062 / GHSA-72hv-8253-57qq fix:
the jackson-core-3.0.2 / jackson-databind-3.0.2 jars (groupId tools.jackson.core)
no longer leak into the dist via this path. Verified:

  $ mvn -pl openmetadata-service dependency:tree -Dincludes='tools.jackson.core:*,org.jsonschema2pojo:*'
  (empty)

  $ mvn -pl openmetadata-spec -am install -DskipTests
  BUILD SUCCESS  (annotator code-gen still works — jsonschema2pojo-maven-plugin
                  pulls jsonschema2pojo-core via its own <dependencies> block,
                  and adds common.jar there too via openmetadata-spec/pom.xml)

* fix(security): revert libthrift exclusion → pin to 0.23.0; Jena statically references TException

The exclusion broke RDF tests:
  RdfInferenceConfigurationTest, RdfPropertyMapperTest,
  SparqlBuilderNestedFieldsTest, SqlToSparqlTranslatorTest
fail with `Could not initialize class org.apache.jena.rdf.model.ModelFactory`
and `org/apache/thrift/TException` (NoClassDefFoundError).

Even though OM never calls RDF Thrift I/O directly, several Jena classes
(ModelFactory, PrefixMappingImpl, etc.) statically reference
org.apache.thrift.TException at class-init time. Removing libthrift fails
class loading on the very first use of any Jena Model.

The grep for `org.apache.thrift` in OM source missed this because the
references are in Jena's own bytecode, not OM's source.

Reverting the exclusion. Pinning libthrift to 0.23.0 in dependencyManagement
remains the only available fix:
  - No Jena release ships the fix (latest 6.0.0 still uses 0.22.0;
    libthrift 0.23.0 was published 2026-05-08).
  - Exclusion breaks the build (above).
  - Pinning forces the fixed version onto Jena's classpath; libthrift
    maintains backwards-compatible binary protocol semantics, so Jena's
    runtime usage continues to work. CI will validate.

In-pom comment expanded to record this discovery so the trade-off doesn't
get re-litigated next round.

* chore: shorten security comments in poms

* fix(security): exclude netty-transport-native-epoll from azure-core-http-netty

GHSA-rwm7-x88c-3g2p / CVE-2026-42577 (AWS Inspector reports HIGH). The bug is
in netty 4.2.x epoll; we ship 4.1.x. The advisory's machine-readable
vulnerable_version_range is < 4.2.13.Final (overly broad), which causes
scanners to flag 4.1.x even though the buggy code path was never in 4.1.

Bumping our netty to 4.2.13.Final is blocked by Azure SDK / gRPC / AWS SDK /
reactor-netty all targeting 4.1.x. Instead, exclude the Linux native binding
JAR (the only thing in our tree that is named io.netty:netty-transport-native-epoll)
so the flagged artifact stops shipping in the dist. Netty's standard pattern is
to call Epoll.isAvailable() and fall back to NioEventLoopGroup when the native
binding is absent — the exact same code path already used on macOS/Windows
deployments. netty-transport-classes-epoll (the Java classes, required by
reactor-netty/lettuce/AWS-netty-nio-client bytecode references) stays.

Verified:
  mvn -pl openmetadata-service -am dependency:tree \
      -Dincludes='io.netty:netty-transport-native-epoll'
  -> empty (was: 4.1.133.Final-linux-x86_64)

* fix(security): align reactor-netty-http dep-mgmt pin to 1.2.16

Per Copilot review on PR #28010 (line 19): the bump of azure-kv to 4.10.7 was
described as bringing reactor-netty-http 1.2.16, but the existing dep-mgmt pin
to 1.2.14 was overriding the transitive (mvn dependency:tree confirmed 1.2.14
was the actual resolved version).

Bump the pin 1.2.14 → 1.2.16 to match what azure-core-http-netty 1.16.4 ships
transitively. Both are above the CVE-2025-22227 fix line (≥ 1.2.8), so this is
a pin-alignment cleanup, not a security delta.

* fix(security): switch jsonschema2pojo-core from <optional> to <scope>provided</scope>

Semantically more correct for a build-time-only dep. The annotator classes
(PasswordAnnotator, MaskedAnnotator, etc.) are invoked only by
jsonschema2pojo-maven-plugin at code-gen time in its own classloader; the
runtime classpath of any deployed service never needs jsonschema2pojo-core.

<scope>provided</scope> says exactly that:
  - on compile + test classpath (so annotators compile)
  - excluded from runtime / dist packaging by default
  - not propagated to downstream consumers

Same scanner outcome as <optional>true</optional> — Jackson 3.x JARs still
don't ship in the dist — but cleaner expression of intent. CVE coverage
unchanged: GHSA-2m67-wjpj-xhg9, CVE-2026-29062, GHSA-72hv-8253-57qq.

Verified:
  mvn -pl openmetadata-spec -am install -DskipTests → BUILD SUCCESS
  mvn -pl openmetadata-service dependency:tree -Dincludes='tools.jackson.core:*,org.jsonschema2pojo:*' → empty

* fix(security): switch netty-epoll exclusion from dep-mgmt to per-direct-dep

Per Copilot review on PR #28010: the previous parent-pom dep-management entry
for azure-core-http-netty with <exclusion> on netty-transport-native-epoll
did work (verified via mvn dependency:tree — exclusion DOES propagate to
transitive resolution in dep-mgmt), but Copilot raised a concern that pinning
azure-core-http-netty to 1.16.4 would block future Azure SDK bumps if a newer
SDK requires a higher azure-core-http-netty.

Same refactor as already applied to ai-platform PR #669. Remove the parent
dep-mgmt entry; apply per-direct-dep <exclusions> on the 3 azure-* deps that
transitively bring azure-core-http-netty in openmetadata-service:
  - azure-security-keyvault-secrets
  - azure-identity
  - azure-storage-blob

Exclusion now travels with whatever azure-core-http-netty version each SDK
chooses; SDK bumps are no longer blocked by a hardcoded version.

Verified: mvn -pl openmetadata-service dependency:tree -Dincludes='io.netty:netty-transport-native-epoll'
returns empty.

* fix(security): extend netty-epoll exclusion to azure-identity-extensions

Per gitar-bot review on PR #28010: add the netty-transport-native-epoll
<exclusion> to azure-identity-extensions for consistency with the 3 other
azure-* direct deps in openmetadata-service/pom.xml that already have it
(azure-security-keyvault-secrets, azure-identity, azure-storage-blob).

Defensive: today's resolution is already clean because Maven's
nearest-definition rule picks the directly-declared azure-identity:1.15.2
(with our exclusion) over the transitive azure-identity:1.7.1 brought by
azure-identity-extensions:1.0.0. Adding the exclusion here protects against
a future refactor that removes the direct azure-identity declaration.

Verified: mvn -pl openmetadata-service dependency:tree -Dincludes='io.netty:netty-transport-native-epoll'
still returns empty.

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2026-05-11 14:08:26 +05:30
Harsh Vador
7c844d77d2
Fix fast-uri Dependabot vulnerabilities in UI core components (#28020) 2026-05-11 08:30:58 +00:00
Pere Miquel Brull
f4cb7d0f14
feat(ingestion): add QuestDB database connector (#27604)
* feat(ingestion): add QuestDB database connector

QuestDB speaks the PostgreSQL wire protocol but implements a minimal
pg_catalog, so the default PG dialect queries fail on the CHAR->DOUBLE
cast in pg_class.relkind. This connector routes SQLAlchemy inspection
through information_schema and short-circuits constraint/index lookups
(QuestDB has no PK/FK/unique/indexes), letting CommonDbSourceService
handle the rest of the topology unchanged.

- Fixed /qdb target in the psycopg2 URL regardless of databaseName
  (which remains the OpenMetadata display name)
- get_database_names defaults to 'qdb' instead of 'default'
- 12 unit tests + live-verified against QuestDB 9.3.5 on localhost:8812

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(questdb): address review feedback — rename to QuestDB, wire UI

Code review fixes for PR #27604:

Blockers resolved:
- Rename Questdb -> QuestDB across schema, enum, Python classes, and all
  generated TS files. Matches peer connectors (PinotDB, DynamoDB) and the
  product's actual brand. Changing post-merge would be a breaking migration.
- Remove sslConfig from schema. QuestDB's sslConfig was declared but never
  wired — ssl_manager.check_ssl_and_init is @singledispatch and has no
  QuestDBConnection registration, so enabling SSL in the UI was a silent
  no-op. Can be added in a follow-up with an explicit psycopg2 wiring.

Warnings resolved:
- authType now in schema's required array — was failing with opaque 401.
- Delete dead queries.py (QUESTDB_TEST_GET_TABLES was defined but never
  imported).
- Add bytea -> LargeBinary to the type map (verified via live information_schema
  probe against QuestDB 9.3.5 — all other native types normalize to standard
  PG names that were already mapped).
- Complete type annotations on utils._get_table_names, _get_columns,
  _information_schema_type.
- Dialect patch test now uses a real PGDialect_psycopg2 instance instead of
  a MagicMock dialect, so it catches signature drift against the real
  SQLAlchemy Inspector contract. Added a separate test that verifies
  get_table_names emits a query against information_schema.tables (not
  pg_catalog).
- Add ingestion_logger() to utils.py with a debug log on dialect patching.
- _empty_view_definition now returns None instead of "" to match how other
  dialects signal the absence of a DDL.

Also fixes UI visibility (QuestDB was missing from the service picker):
- Regenerate 15 TS enum files via json2ts.sh -> quicktype so the new
  DatabaseServiceType.QuestDB value flows through the UI.
- Register service-icon-questdb.png in ServiceIconUtils.ts.
- Add locales/en-US/Database/QuestDB.md connector form docs.
- Add quicktype as a devDependency — json2ts.sh needs it and it wasn't
  installed.

Docs: update skills/connector-building and skills/standards/registration
to reflect reality — i18n locale files are not needed, icon + locale MD
registration steps are, and Services.constant.ts is deprecated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* skill

* fix(questdb): restore databaseSchema field for test connection

test_connection_db_schema_sources reads service_connection.databaseSchema
directly with no hasattr guard. Removing it from the schema in the prior
review fix broke GetTables and GetViews steps:

  'QuestDBConnection' object has no attribute 'databaseSchema'

Restored as an optional string with a clearer description (defaults to
public when unset).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix owners

* add yaml

* Update generated TypeScript types

* Sync package.json and yarn.lock with main

* Fix: ingestion files , Added Lineage for questdb tests and UI changes, Refactored code

* FIX: python_checkstyle

* Fix: test and unused param

* Fix: yield_table enforcing tabletype to partition, Refactored lineage

* Fix: Failing test and remove print statement

* FIX: python_checkstyle and added error handling

* FIX: Resolved comments

* FIX: failing tests and schema cleaning

* Minor change

* Fix: Failing unit tests

* Fix: Unit test unrelated changes ignored

* FIX: tests

* Fix: Failing test due to extra parameter in yaml

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>
Co-authored-by: Akash Verma <138790903+akashverma0786@users.noreply.github.com>
2026-05-11 13:02:32 +05:30
Eugenio
6ac135dc7e
Fixes 21329: exclude temporal table period columns from autoClassification sampling (#27960)
* fix(azuresql): exclude temporal table period columns from sampling

Query sys.columns for generated_always_type to detect SYSTEM_TIME period
columns (ValidFrom/ValidTo) and skip them in both schema reflection
(mssql/utils.py) and sample data fetching (AzureSQLSampler). Also moves
the catalog round-trip inside the `if columns` guard to avoid the query
when column filtering is not in use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(azuresql): add unit tests for temporal column exclusion

Adds sampler unit tests covering period-column filtering and NOT_COMPUTE_PYODBC
exclusion. Adds a PII processor test case for temporal tables using single
first-names to avoid non-deterministic NER matches. Corrects customers_sensitive
expected tags to include address→PII.NonSensitive, which the classifier now
correctly detects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(azuresql): add full workflow integration test for temporal tables

Replaces the isolated sampler unit test with an end-to-end integration test
that registers the AzureSQL service, creates a system-versioned table, runs
MetadataWorkflow then AutoClassificationWorkflow, and asserts that sample
data excludes ValidFrom/ValidTo. Includes SQL permission prerequisites and
troubleshooting guide in the module docstring. Teardown controlled by
AZURE_SQL_CLEANUP env var (default: true).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix `spacy<3.8` for `ingestion/[dev]`

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 11:45:40 +05:30
Harsh Vador
6068a10dbe
(feat)ui: migrate form builder in connection form (#27812)
* (feat)ui: migrate form builder in connection form

* add core field support

* fix failing test & fix checkstyle

* fix failing test

* improvise fields visibility

* fix failing test

* improve spacing

* add password hint support

* fix ui checkstyle

* address gitar

* fix oneOfField stale issues

* address gitar along with test

* fix failing sonar

* array field type to ui-core and advanced config to accordion usage

* address gitar

* remove form bg, handle breadcrumb navigation

* fix mocks

* handle layout, spacing , bg color

* handle bg colors

* use core-components

* fix checkstyle

* radio buttons bg color and spacing

* remove hideBgGrey prop

* nit

* add dedicated EmbeddedAddServicePage for askcollate route & fix checkstyle

* add unit tests
2026-05-11 11:07:17 +05:30
Harsh Vador
86e1d88386
security: Include branch name in security scan Slack alerts and fail only on high vulnerabilities (#27977)
* Add branch context to security scan Slack alerts and upload CSV findings summary

* change failing severity from medium to  high & address gitar

* fix csv formatting

* revert flattening changes
2026-05-11 10:41:48 +05:30
Pere Miquel Brull
7e0ee80c28
feat(search): add Google Gemini embedding provider (#27974)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Add design: Google Gemini embedding client

Adds a fourth embedding provider (google) alongside openai/bedrock/djl,
using the Generative Language API with a single API key.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Add implementation plan: Google Gemini embedding client

7 tasks covering schema change + regen, client implementation,
validation tests, error path tests, request shape tests, switch
wiring, and final verification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(spec): add google embedding provider config block

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(search): add GoogleEmbeddingClient with happy-path test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(search): extract MODELS_PREFIX constant in GoogleEmbeddingClient

The string "models/" appeared in both DEFAULT_BASE_URL and the buildRequestBody
method. Extract it as a named constant per project standards.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): add constructor validation tests for GoogleEmbeddingClient

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): add blank model id test and clarify null-modelId workaround

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): add HTTP error and malformed response tests for GoogleEmbeddingClient

* test(search): tighten empty values array assertion to check message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(search): verify Google embedding request URL, headers, and body shape

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(search): extract endpoint constant and harden extractBody helper

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(search): wire google embedding provider into SearchRepository switch

* test(search): cover null dimension and custom endpoint, drop redundant comment

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Remove internal planning docs from PR

These were workflow scaffolding (design spec + implementation plan)
generated by the superpowers brainstorming/planning flow; they belong
in the local development trail, not the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address PR review comments

- GoogleEmbeddingClient.buildRequest: handle endpoint with existing query
  string by switching the key separator from '?' to '&' as needed; document
  why the API key travels in the URL (Google Generative Language API
  requirement, not Bearer-header).
- GoogleEmbeddingClient.extractErrorMessage: replace empty catch block with
  a trace-level log to comply with the 'no empty catch' standard.
- elasticSearchConfiguration.json: clarify google.endpoint description so
  operators know it must be the full ':embedContent' URL, not a base URL.
- GoogleEmbeddingClientTest.extractBody: await onComplete via
  CompletableFuture.get(5s) instead of relying on synchronous publisher
  delivery; surface onError properly.
- New test: testEndpointWithExistingQueryStringUsesAmpersand verifies the
  '?' / '&' separator logic.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Wire google embedding provider into openmetadata.yaml defaults

- Add `google:` block under naturalLanguageSearch with env-var fallbacks
  (GOOGLE_API_KEY, GOOGLE_EMBEDDING_MODEL_ID, GOOGLE_EMBEDDING_DIMENSION,
  GOOGLE_API_ENDPOINT).
- Update embeddingProvider option list comment to include "google".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Use gemini-embedding-001 default and pass outputDimensionality

The previous default (text-embedding-004) is rejected on some Google
projects with `404: not found for API version v1beta, or is not
supported for embedContent`. Switch to gemini-embedding-001 — the
current GA model, available at v1beta and broadly accessible.

- GoogleEmbeddingClient.buildRequestBody: include outputDimensionality
  from the configured embeddingDimension. Required for gemini-embedding-001
  (defaults to 3072 dims otherwise) and supported as a truncation hint
  by text-embedding-004.
- elasticSearchConfiguration.json + openmetadata.yaml: change default
  embeddingModelId to gemini-embedding-001 and document the
  outputDimensionality semantics on the embeddingDimension field.
- GoogleEmbeddingClientTest.testRequestBodyShape: assert
  outputDimensionality=768 in the captured body and use
  gemini-embedding-001 as the test fixture model.
- SystemRepository.getEmbeddingConfigurationMessage: add a `google` case
  so /api/v1/system/status surfaces the configured model/endpoint
  instead of "Unknown provider 'google'".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Guard against missing google config in SystemRepository diagnostic

If `embeddingProvider=google` but the `google` config block is absent,
calling `nlpConfig.getGoogle().getEndpoint()` would NPE and produce
a misleading "Unable to determine embedding configuration" message.
Add an explicit null check that yields a clear diagnostic instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Validate google.endpoint contains :embedContent at construction

A custom endpoint missing the `:embedContent` action used to silently
produce 404s at runtime. Fail fast at startup with a clear message
showing the expected URL form, so misconfiguration surfaces in logs
instead of in vector-search failures.

- Update testCustomEndpointConstruction to use a valid full URL.
- Add testCustomEndpointWithoutEmbedContentThrows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(spec): add modelId chat field to google block

Adds a `modelId` property to the natural-language-search `google` block,
parallel to how the `openai` block exposes both `modelId` (chat) and
`embeddingModelId` (embedding). This enables Gemini-based NLQ filter
extraction (chat completions via :generateContent) on top of the existing
embedding support.

Default: gemini-2.5-flash.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update generated TypeScript types

* Update generated TypeScript types

* trigger

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-05-10 16:37:53 +02:00
Shailesh Parmar
2967a7f0a8
refactor: replace RouterUtils with ObservabilityRouterClassBase for navigation paths (#27956)
* refactor: replace RouterUtils with ObservabilityRouterClassBase for navigation paths

* feat: migrate navigation to observabilityRouterClassBase in DataQuality and IncidentManager components

* refactor: format navigation calls and imports for consistency across components

* test: mark 'Pipeline Alert' and permission tests as slow
2026-05-10 16:50:00 +05:30
dependabot[bot]
41cfcf995e
chore(deps): bump fast-uri in /openmetadata-ui/src/main/resources/ui (#28004)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
Bumps [fast-uri](https://github.com/fastify/fast-uri) from 3.1.0 to 3.1.2.
- [Release notes](https://github.com/fastify/fast-uri/releases)
- [Commits](https://github.com/fastify/fast-uri/compare/v3.1.0...v3.1.2)

---
updated-dependencies:
- dependency-name: fast-uri
  dependency-version: 3.1.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-09 17:22:17 +00:00
Laura
882ef3f8c5
add nlq to OpenMetadataApplicationConfig (#27988)
* add nlq to OpenMetadataApplicationConfig

* move config under naturalLanguageSearch

* openai client

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2026-05-09 18:15:00 +02:00
Harshit Shah
0ff09b4915
Migrate FailedTestCaseSampleData table to core-ui Table component (#27985)
Some checks are pending
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
* refactor(FailedTestCaseSampleData): migrate table to core-ui Table component

- Replace Ant Design Table with core-ui Table (react-aria-components)
- Add border wrapper tw:border tw:border-border-secondary tw:rounded-[10px]
- Add 210px min-width on data cells with horizontal scroll
- Add 8px padding on header and data cells
- Center diff-type column content vertically and horizontally
- Move all styles from .less file to tw: classes using theme tokens
- Delete failed-test-case-sample-data.less

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix checkstyle

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 09:38:14 +00:00
Harshit Shah
8570d36830
Migrate IncidentManager table to core-ui Table component (#27972)
* refactor(IncidentManager): migrate table to core-ui Table component

- Replace Ant Design Table with core-ui Table (react-aria-components)
- Use plain renderRow function (matching DataQualityTab pattern) with
  static Table.Cell children and Table.Body dependencies to fix status/
  severity/assignee columns stuck at loading skeleton
- Fix popover max-height distortion by adding popoverClassName prop to
  IncidentStatusPopoverShell and applying tw:!max-h-none via react-aria
  className override
- Update unit test mock for @openmetadata/ui-core-components to include
  Table component
- Update e2e selector from Ant Design .ant-table-tbody to
  data-testid based tbody tr selector

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix checkstyle

* address gitar-bot comments

* address comments

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 12:04:32 +05:30
Harsh Vador
f3ef11cf50
fix: use useClipboard hook in CodeBlockComponent to fix clipboard on non-secure contexts (#28003) 2026-05-09 09:31:21 +05:30
Sriharsha Chintalapani
22a6c10072
Context center (#27558)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Add Context Center: Migrate Knowledge Center , Images/ PDFs document support

* Add Context Center: Migrate Knowledge Center , Images/ PDFs document support

* Address PR #27558 review comments

- KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch
  and getHierarchyWithSearchForActivePage so the /search/hierarchy
  endpoint no longer NPEs when the pageType query param is omitted. The
  ES/OS client helpers already skip the pageType term when the value is
  null or empty, so this is a pure null-guard.
- ContextFileResource.uploadFile: when a failure happens after the
  ContextFileContent row is created (e.g. inside extractionService.submit),
  the cleanup path now hard-deletes that content row so the DB is not
  left with an orphaned record.
- ContextFileResource: replace the raw Content-Disposition string with a
  buildContentDisposition helper that emits both the legacy quoted
  filename= and the RFC 5987 filename*=UTF-8'' parameter with
  percent-encoded bytes, so international filenames round-trip while
  staying header-injection safe. sanitizeFileName also falls back to
  "download" on null/blank input.
- ContextFileResourceTest: new cases for sanitizeFileName null/blank
  fallbacks and for buildContentDisposition ASCII/unicode/space/injection
  behaviour (18 tests, all passing).

* Address copilot review comments on PR #27558

- AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix)
  matches the DAO signature — previous ordering always missed the index.
- FolderResource / ContextFileResource getEntitySpecificOperations: return
  List.of() instead of null so callers iterating the returned list cannot
  NPE.
- SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid
  helper that returns null for missing/malformed values and logs a warning
  instead of failing the whole hierarchy response.
- DaoListFilter: qualify the pageType column with the caller-provided
  tableName, rename getArticleCondition to getPageTypeCondition (legacy
  no-arg method kept as @Deprecated wrapper for compatibility).
- Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit
  getChildrenCountForPage search (N+1) with a single pass over the batch
  that derives childrenCount from pages whose parent is in the same
  result set. Also drops the now-unused helper and its throws clause.
- openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient,
  jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional>
  so they don't leak into SDK consumers that only use the core client.
- InMemoryAssetService: use the shared AsyncService executor for upload
  /read/delete instead of the JVM common ForkJoinPool.
- sample-pricing.xlsx: replace the plain-text placeholder with a real
  minimal XLSX workbook so detection-based and extraction-based code
  paths see a valid Microsoft Excel 2007+ file.

* Use one filters aggregation for page hierarchy childrenCount

Follow-up to b8458e2868. The previous fix derived childrenCount from
pages whose parent appeared in the same batch — that worked for
listPageHierarchyForActivePage (which fetches all depths) but always
returned 0 on the plain listPageHierarchy path (which only fetches one
depth), so top-level listings lost the count semantically.

Replace with a single `filters` aggregation keyed by page id: each
named bucket matches descendants via a fullyQualifiedName prefix query
against the page's FQN. That gives accurate direct-descendant counts
for every returned page in one aggregation round-trip, still O(1)
additional search requests regardless of batch size.

* Add allowedFields entries for contextFile, folder, page

Fixes SearchSettingsHandlerTest.testEveryAssetTypeHasCorrespondingAllowedFields.

searchSettings.json already had assetTypeConfigurations for contextFile,
folder, and page but no matching allowedFields entries, so the test that
asserts every assetType has a corresponding allowedFields block failed
with 'Asset type contextFile has no corresponding allowedFields entry'.

Adds the three missing blocks with the fields that each index actually
exposes — name / displayName (with .keyword and .ngram variants),
description, fqn, fqnParts, tags/tier/domains/dataProducts, plus
entity-specific fields (fileType/contentType/extractedText for
contextFile, parent.displayName for folder/page, pageType for page).

* Fix ui checkstyle

* Fix Java checkstyle

* Address PR #27558 copilot review round 2

- ES/OS populateChildrenCounts: add fqnDepth == parentDepth + 1 to the
  per-page filter so childrenCount is direct children only, matching the
  field name and the UI's isLeaf check semantics. Previously matched all
  descendants.
- ES/OS buildPageNestedSearchHierarchy: filter out hits with a null id
  before Collectors.toMap, which would otherwise NPE when SearchUtils
  drops a malformed UUID.
- SearchUtils.getPageHierarchy: wrap PageType.fromValue in a parsePageType
  helper that logs and returns null on unknown values, so a single bad
  hit can no longer break the whole hierarchy response.
- TestSuiteBootstrap.setupMinIO: pin minio/minio to
  RELEASE.2024-01-16T16-07-38Z instead of :latest so a newly-published
  image cannot break integration tests without a code change.
- createContextFile.json: rewrite the assetId description to be provider
  agnostic (S3 / Azure Blob / in-memory / no-op) and flag it as the legacy
  path, preferring headContentId / ContextFileContent.

* Update generated TypeScript types

* Address PR #27558 copilot review round 3

- bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql:
  - asset_entity: add PRIMARY KEY (id); mark all generated columns STORED
    for consistency with the other drive/knowledge tables in the same
    migration; compute deleted as a real boolean via
    IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) so the boolean index
    behaves correctly.
  - knowledge_center: mark name, updatedAt, updatedBy, pageType as STORED
    and apply the same deleted expression so the existing indexes on
    name and (fqnHash, deleted) are reliable on fresh installs.
  - drive_folder / context_file / context_file_content: update the
    deleted generated column to use the same boolean-safe expression.
- ElasticSearch/OpenSearch hierarchy search: add an explicit sort on
  fullyQualifiedName ASC with _id ASC as tiebreaker so from/size
  pagination is deterministic and cannot skip/duplicate pages between
  requests.

* Fix UI checkstyle

* Address PR #27558 copilot review round 4

- createPage.json: rewrite the field descriptions for name, displayName,
  owners, reviewers, and entityStatus. They were copy/pasted from other
  schemas ('query', 'tag') and were misleading in generated docs and
  clients.
- NoOpAssetService.generateDownloadUrlWithExpiry: return asset.getUrl()
  instead of a synthetic 'https://cdn.example.com/...' URL. The old
  behaviour let clients attempt downloads that would never resolve when
  object storage was disabled; returning the asset's own (empty) URL
  surfaces the misconfiguration cleanly.
- AzureAssetService: normalize the prefix path the same way S3 does.
  Previously a null/blank prefix produced the literal 'null/' prefix,
  writing blobs under the wrong key. New formatPrefix returns "" for
  null/blank and ensures exactly one trailing '/' for a real prefix.
- AssetRepository.getByFQN: treat null *or* empty list as 'not found',
  matching getByFqnPrefix. Callers previously received an empty list
  silently when the DAO returned [] instead of a 404.

* Update generated TypeScript types

* Fix UI checkstyle

* Address PR #27558 copilot review round 5

- AssetDAO.update / AssetRepository.update: switch the UPDATE target from
  fqnHash to id. Two assets can share the same fullyQualifiedName (e.g.
  successive revisions of the same context file), so the old SQL could
  silently update sibling rows.
- ContextFileExtractionService: run the extraction pipeline on a
  dedicated fixed thread pool instead of AsyncService.getExecutorService.
  process() blocks on assetService.read(...).join(), and S3/Azure reads
  are themselves scheduled on AsyncService — sharing the same bounded
  pool risks starving those reads (and deadlocking) once every thread is
  busy running extractions.
- postgres/schemaChanges.sql: wrap the generated deleted column in
  COALESCE((json ->> 'deleted')::boolean, false) (and the asset_entity
  CAST variant) so an absent 'deleted' key is stored as FALSE, not NULL.
  Otherwise "non-deleted" filters based on the boolean index drop rows
  silently. Matches the MySQL IFNULL(..., FALSE) side of the migration.
- ContextFileUploadSupport.sanitizeEntityName: treat null/blank input as
  'file' instead of NPE-ing on replaceAll. Multipart uploads can arrive
  without filename metadata; the upload should still succeed with a
  stable generated name.

* Remove macOS-only @rollup/rollup-darwin-arm64 dev dep

I pinned this during local troubleshooting to get a Vite dev server
running on macOS (rollup's optional native binary was missing). CI runs
on Linux, where yarn install --frozen-lockfile refuses the package
('The platform \"linux\" is incompatible with this module'), which
broke license-header, lint-src, lint-playwright, i18n-sync, app-docs,
and ui-coverage-tests for PR #27558.

rollup re-resolves its native binary per platform — there's no reason
to pin the darwin one. Remove it from package.json and drop the
matching '@rollup/rollup-darwin-arm64@^4.60.2' block from yarn.lock.

* Re-declare optional SDK test deps on integration-tests classpath

KnowledgeCenterIT failed in CI with
'java.lang.NoClassDefFoundError: org/glassfish/jersey/apache/connector/ApacheConnectorProvider'
after I marked the JAX-RS client stack in openmetadata-sdk as
<optional>true</optional> during review round 2. That change stops the
deps from leaking to every SDK consumer, but integration-tests actually
uses org.openmetadata.sdk.test.util.RestClient, so the optional deps
must be re-declared on its own classpath.

Adds jakarta.ws.rs-api, jersey-client, jersey-apache-connector,
httpclient, jakarta.json-api, and parsson to
openmetadata-integration-tests/pom.xml as <scope>test</scope>.

* Fix IT failures from CI integration-tests-mysql-elasticsearch

1. MySQL deleted column: revert the IFNULL wrapper to plain
   (json -> '$.deleted'). My earlier
   IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) hit
   'Incorrect integer value: false for column deleted' on fresh installs
   because MySQL cannot coerce the resulting JSON scalar into TINYINT(1)
   when the column is STORED. The bare '(json -> '$.deleted')' form is
   what other OM tables already use, and MySQL converts JSON true/false
   to 1/0 directly for the BOOLEAN column. STORED + PRIMARY KEY stay
   in place.
2. DriveFileUploadIT: raise the four short atMost(5s) awaits to 20s
   with explicit pollDelay(ZERO) + pollInterval(200ms).
   K8sOMJobOperatorIT sets a global Awaitility pollInterval of 5s at
   class setup; any subsequent test with atMost <= 5s hits
   'Timeout must be greater than the poll delay'. Overriding the
   per-call poll settings insulates these asserts from the global
   leak.

* Document SDK test-utility optional deps

In review round 2 we marked jersey-client, jersey-apache-connector,
jakarta.ws.rs-api, httpclient, jakarta.json-api, parsson, java-jwt, and
junit-jupiter-api as <optional>true</optional> on openmetadata-sdk so
that core SDK consumers don't inherit a heavy JAX-RS + JUnit stack.
openmetadata-integration-tests hit this immediately with
NoClassDefFoundError from RestClient; its own pom now re-declares the
deps.

Add a "Test utilities" section to the SDK README that lists the
optional deps downstream test-utility consumers must re-declare (with
the concrete <scope>test</scope> XML snippet) and explains the error
they'd otherwise see.

* NoOpAssetService: never return null from generateDownloadUrlWithExpiry

In review round 4 I changed this method to return asset.getUrl() when
the asset is non-null. But Asset.url is optional in the schema, so
asset.getUrl() itself can be null — which breaks the implied "never
returns null" contract downstream callers rely on (AttachmentResource
only null-checks defensively).

Normalize null and blank URLs to an empty string so the method's
non-null, non-blank contract holds even when storage is disabled and
the asset was never populated with a URL.

* AssetServiceFactory: swap to NoOp when re-initialized with storage off

init(...) previously only assigned NoOpAssetService when instance was
null. On a re-init with object storage toggled off (config reload, test
teardown, etc.), the previously wired S3/Azure/InMemory provider stayed
live and kept serving real IO against a backend the operator thought
was disabled.

Replace the instance with a fresh NoOp when storage is disabled unless
the instance is already a NoOp (idempotent on repeated disabled
inits).

* Type create-request domains arrays as fullyQualifiedEntityName

The three new KC/Drive create schemas (createFolder, createContextFile,
createPage) had domains as an array of unconstrained strings. The rest
of the OM API models domain references as FQNs, and the shared
basic.json#/definitions/fullyQualifiedEntityName is the convention for
this.

Point all three items refs at fullyQualifiedEntityName so generated
clients see a consistent FQN type and requests get validated for
non-empty length/format rather than any string.

* Update generated TypeScript types

* Address PR #27558 copilot review 4144965142

- ContextFileExtractionService: switch the default thread pool to
  a static final DEFAULT_EXECUTOR, so every production instance of the
  service reuses the same pool instead of leaking a fresh fixed pool
  per construction (tests especially create multiple instances).
  Threads remain daemons, so the pool never blocks JVM shutdown.
- ObjectDeleteQueueService: when queueCapacity is 0, use a
  SynchronousQueue so "reject-if-all-workers-busy, no buffering" holds.
  Previous Math.max(1, queueCapacity) silently allocated a 1-slot
  ArrayBlockingQueue, contradicting the caller's stated capacity and
  potentially buffering one task past the semaphore's accounting.

Not fixing:
- SearchUtils @Slf4j 'LOG' vs 'log'. OM's openmetadata-service/lombok.config
  sets 'lombok.log.fieldName = LOG', so @Slf4j correctly generates
  'LOG' for every class in this module. The reviewer's concern only
  applies to projects without that directive. Verified clean compile.

* Address PR #27558 copilot review 4144917449

- knowledgeCenterTags.json: change mutuallyExclusive from the string
  "false" to the JSON boolean false. The Classification schema declares
  this as `"type": "boolean"`; jackson's lenient string->boolean
  coercion masked it until now, but strict validators would reject and
  the other OM bootstrap tag files that use the correct boolean
  (piiTagsWithRecognizers.json) model what this should look like.

- ContextFileExtractionService.process: guard the updateContent
  updater with the same head-content check already used in
  updateFile. Previously, if headContentId flipped between the
  initial check and the status writes, updateFile would no-op while
  updateContent still marked the now-stale content "Analyzing",
  leaving it stuck once the later early-return fires.

- AzureAssetService.upload: stream the InputStream straight to the
  blob using the known asset.getSize() instead of reading the whole
  payload into a byte[] via IOUtils.toByteArray. Matches the S3
  streaming behaviour and avoids full-file heap pressure / OOM risk
  on larger files. Buffered fallback retained when size is unknown.

- Size fields modeled as integer: flip fileSize / size on
  createContextFile.json, contextFile.json, asset.json,
  createAsset.json, and contextFileContent.json from
  "type": "number" to "type": "integer" with "format": "int64" and
  "minimum": 0. Byte counts are inherently whole numbers; floating
  point loses precision above 2^53 and makes validation murky.
  Update the (double) call sites in ContextFileResource,
  ContextFileUploadSupport, and AttachmentResource to match.

Not fixing:
- ContextEntityPromptService "unused Authorizer import" — false
  positive, the class uses it in the constructor.
- NoOpAssetService.generateDownloadUrlWithExpiry null return — already
  fixed earlier in commit a4a2dcc91d (returns "" when url is
  null/blank).

* AssetService.read: run inline instead of hopping through AsyncService

Every caller of AssetService.read(...) immediately .join()s on the
returned future:

- ContextFileExtractionService.process reads + extracts
- ContextFileResource.downloadFile reads + streams back
- AttachmentResource.serveAsset reads + streams back
- QueuedDeleteAssetService just delegates

None of them exploit the async nature, but the S3/Azure/InMemory
implementations all wrapped the blocking fetch in
AsyncService.executeAsync or CompletableFuture.supplyAsync on a
bounded pool. That created a starvation path when any caller thread
was already running on AsyncService (or could monopolize it under
load) — join() would block the caller while the submitted read
task fought for a free worker.

Switch S3, Azure, and InMemory read() to execute on the caller's
thread and return CompletableFuture.completedFuture(...). Interface
is unchanged so existing .join() callers keep working; the extra
thread hop and the potential for AsyncService starvation are both
gone. Combined with the dedicated context-file-extraction pool, the
extraction pipeline no longer touches AsyncService for any
asset-read step.

* Address PR #27558 copilot review 4151211562

- FolderIndex / ContextFileIndex: stop re-setting entityType, deleted,
  owners, totalVotes inside buildSearchIndexDocInternal. Those common
  fields are populated by populateCommonFields in the SearchIndex
  template method (Phase 1) before Phase 3 calls the entity-specific
  internal builder, so the explicit puts were redundant and silently
  overrode the template output. Aligns with PageIndex convention and
  updates the unit tests to assert the internal builder sets only
  entity-specific fields.

- ContextFileTextExtractor: bound the Tika BodyContentHandler at
  MAX_CANONICAL_TEXT_LENGTH instead of passing -1 (unbounded) so a
  pathological image cannot drive OCR to accumulate arbitrary output
  on the heap.

- ContextFileExtractionService: replace the unbounded
  Executors.newFixedThreadPool backing queue with a ThreadPoolExecutor
  using an ArrayBlockingQueue + AbortPolicy. Without a bounded queue
  the RejectedExecutionException handling in submit(...) was dead
  code; with it, an overloaded server surfaces a "retry later"
  failure status instead of silently accumulating work.

- S3AssetService / AssetService / AssetServiceFactory /
  QueuedDeleteAssetService: make AssetService extend AutoCloseable
  with a default no-op, override close() in S3AssetService to release
  the S3Client and S3Presigner connection pools, and register a
  shutdown hook in AssetServiceFactory that closes the current
  provider on JVM exit (and on re-init when the provider changes).

- bootstrap 2.0.0 MySQL schemaChanges: change the deleted generated
  column from (json -> '$.deleted') to
  (JSON_EXTRACT(json, '$.deleted') IS TRUE) so rows where the JSON
  key is absent resolve to FALSE instead of NULL. Avoids filter misses
  on the composite (fqnHash, deleted) index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Java checkstyle

* Fix integration test compile + S3 generateDownloadURL

ContextFileIT / DriveFileUploadIT compile failures came from the
fileSize schema switch to integer/int64 — the generated setter/getter
is now Integer. Replace the double literals with ints and the
assertEquals(double, ...) sites with intValue() so the (int, int)
overload resolves unambiguously.

Also override S3AssetService.generateDownloadURL to return a
short-lived presigned URL (mirroring AzureAssetService) instead of
inheriting the default, which would return the raw S3 key from
asset.url. Addresses review 4151282021.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert MySQL deleted column back to bare json -> expression

The JSON_EXTRACT(...) IS TRUE form broke integration tests — GET after
create started returning 404, consistent with MySQL evaluating the
IS TRUE predicate against the JSON scalar in a way that stored 1
instead of 0 for freshly-created rows (deleted=false).

Restoring the bare (json -> '$.deleted') expression used pre-review.
Rows with the key missing will store NULL on the generated column,
which is a theoretical concern the review flagged but does not affect
current code paths (all inserts write json.deleted explicitly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Transi18next import path in KnowledgeCenter components

Two KnowledgeCenter files imported Transi18next from
'utils/CommonUtils', which is where Collate's UI re-exports it from.
OpenMetadata core exports Transi18next from 'utils/i18next/LocalUtil'
(same path every other core file uses). The Collate-style import
broke the production Vite/Rollup build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Harden ContextFileIT.testFileAppearsInSearch against async indexing

The test used a fixed Thread.sleep(2000) then a single assertEquals
on the status code. That was flaky two ways: ES indexing is async
and the 2s window is not always enough, and on a fresh cluster the
context_file_search_index itself may not exist yet at first query
(yielding 500).

Replace with an await() loop that polls every 200ms for up to 30s
and asserts both status==200 AND that the newly-created file's UUID
appears in the response. Matches the assertSearchContainsFile
helper in DriveFileUploadIT.

Also URL-encode the namespaced query string so the uniqueName
does not break the query parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Make playwright editor shortcuts platform-aware

The SHORTCUTS constant in playwright/constant/KnowledgeCenter.constant.ts
hard-coded "Meta+b" / "Meta+z" / etc. On macOS Meta is Cmd and those
shortcuts trigger bold / undo / copy as expected, but on the Linux CI
runners Meta is the Super (Windows) key — so every ProseMirror
formatting and history test just pressed Super+b, which does nothing,
and the test then fails waiting for the <strong>…</strong> element
(or for the undone text to disappear).

Detect the runner platform and use Meta on macOS, Control everywhere
else — matching the same pattern in src/constants/KnowledgeCenter.constant.ts.

Unblocks the 6 KnowledgeCenterTextEditor failures across Admin / Data
Consumer / Data Steward roles (Text Formatting + Undo/Redo). Slash
commands keep passing because they don't depend on modifier keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Run prettier on DateTimeUtils.ts

CI's lint-src job fails because ESLint+Prettier --fix produces a
non-empty diff against the committed tree. Local prettier pass
trimmed the indentation and added a trailing comma in the imports
block. No behavioral change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Knowledge Page entity-link + DAO filter regressions from the port

Downloaded the failing playwright traces from the PR's postgres e2e run
and walked each one. Three distinct bugs, all present because the
Collate-side overrides (overrides/EntityUtilClassCollate.ts and the
DaoExtension.KnowledgeExtensionDAO custom SQL) were not carried over
into OpenMetadata core when KnowledgeCenter was merged up.

1) CollectionDAO.KnowledgePageDAO: override listCount / listBefore /
   listAfter (plus helper SQL queries) so that
   `GET /v1/knowledgeCenter?entityId=X&entityType=topic` actually INNER
   JOINs entity_relationship and returns only pages whose
   relatedEntities contains the target entity. Without this the base
   EntityDAO ignored entityId/entityType entirely and returned every
   page, which is why the "Knowledge Articles" widget on a data asset
   page showed the 15 fixture articles instead of the one just attached
   — and why updateDataAsset timed out waiting for the linked article.
   Uses OWNS relation for user/team filters (same semantics Collate
   uses) and HAS for every other entity type.

2) EntityUtilClassBase + EntityUtils.getEntityLinkFromType: add
   EntityType.KNOWLEDGE_PAGE cases that route to getKnowledgePagePath.
   Before this, mention notifications for Knowledge Pages fell through
   to the default `/table/<fqn>` branch (confirmed in the captured
   page-snapshot: the mention link pointed at `/table/Article_eEqrWeeU`),
   which 404'd on the Table API and rendered an error page — so the
   entity-header-display-name textarea never appeared and the User
   Mentions test timed out. Search results on Explore had the same
   problem, rendering every Knowledge Page result card with href="/".

3) EntityUtilClassBase.getEntityByFqn / ENTITY_PATCH_API_MAP /
   getResourceEntityFromEntityType: handle KNOWLEDGE_PAGE end-to-end so
   the detail-page fetch, patches, and policy lookups all route through
   the knowledgeCenter REST API rather than falling back to the generic
   entity utilities (which don't know about the 'page' entity type).

Verified against the real trace artifacts from CI run 24790718035:
- shard 3 Knowledge Center page test — widget shows 10 unrelated
  "Article_*" fixture items instead of the created one → root cause
  is the missing DAO JOIN (#1).
- shard 3 User Mentions test — notification link is /table/, not
  /knowledge-center/ (#2).
- shard 3 Reviewer Workflow — data consumer's knowledge-center goto
  renders "No data available" because getEntityByFqn fell back to a
  table fetch for a page FQN (#3).
- shard 5 ExplorePageRightPanel_KnowledgeCenter (22 failures) —
  search result card links are "/explore/" (empty), same root cause
  as (#2) inside getEntityLinkFromType default branch.

Compiles: mvn -pl openmetadata-service -q -DskipTests compile passes;
tsc --noEmit reports no new errors in the touched files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address remaining PR #27558 review feedback

Seven actionable fixes drawn from the still-open review threads; the
rest of the open threads in copilot's bot reviews are either already
addressed in earlier commits or stale against the current code and
are being resolved on the review UI alongside this commit.

- AssetRepository.getByFQN: the LOG.error message said "asset with id"
  but was printing the FQN. Relabel to "asset with FQN" for accurate
  troubleshooting (thread #42).

- KnowledgePageMapper.createToEntity: stop mutating the inbound
  CreatePage by calling create.withRelatedEntities(...). Build the
  effective list as a local variable and pass it to copy(...). Prevents
  the Organization fallback from leaking into the caller's request
  object, which is surprising when the request is re-used or logged
  (thread #43).

- FolderIndex: default childrenCount to 0 when the entity hasn't yet
  had its children recomputed (e.g. a freshly created folder). Prevents
  the numeric field from being indexed as missing, which broke range
  and sort queries that assume it is always present (thread #46).

- NoOpAssetService and InMemoryAssetService: override
  generateDownloadURL to delegate to generateDownloadUrlWithExpiry,
  matching S3/Azure. Without this, callers using the non-expiry API
  got asset.getUrl() (often empty for these providers), yielding broken
  download links (threads #39, #45).

- ObjectDeleteQueueService: register a JVM shutdown hook in the
  singleton's initializer that calls stop(). The service already
  implements Dropwizard Managed, but nothing currently wires it into
  the application lifecycle, so non-daemon delete-worker threads were
  at risk of keeping the JVM alive after ungraceful termination. The
  hook is a belt-and-suspenders fallback to the Managed path
  (threads #52, #53).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add java-checkstyle skill for Claude + Codex agents

CI keeps surfacing "Java checkstyle failed — please run mvn spotless:apply"
comments on PRs (including this branch). CLAUDE.md and AGENTS.md already
mentioned the command, but a one-line prose note in the middle of each file
wasn't enough to make it a reliable habit.

This commit:

- Adds a dedicated invocable skill at .claude/skills/java-checkstyle/SKILL.md
  (for the Claude Code harness) and a mirror at
  .agents/skills/java-checkstyle/SKILL.md (for Codex-style agents). Both
  describe the same procedure: when / why to run spotless, the `-pl <module>`
  scoping option, the verify-only `spotless:check` form, the expected
  diff shape, and the rule to never hand-edit formatting around a plugin
  error.

- Promotes the existing one-liners in CLAUDE.md and AGENTS.md to explicit
  "run before finishing any Java task" instructions, pointing at the skill so
  agents have a reusable procedure to invoke rather than improvising.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Harden AttachmentResource upload/download against three regressions

Carried over from the latest AttachmentResource review. Three issues:

1. Content-Disposition header injection (security) — downloadAsset() built
   the header by direct string interpolation of asset.getFileName(). A
   filename containing double-quotes or CRLF could inject arbitrary HTTP
   headers. ContextFileResource already has a sanitize + RFC-5987 encode
   helper; rather than duplicate it, promote
   ContextFileUploadSupport.sanitizeFileName / buildContentDisposition to
   public, delete the duplicates from ContextFileResource (now delegators),
   and reuse the shared helpers from AttachmentResource.

2. Unbounded upload buffering (performance / DoS) — createAssetFromUpload
   read the full multipart body into a byte[] via IOUtils.toByteArray
   before checking against MAX_FILE_SIZE. An attacker could send an
   arbitrarily large body and exhaust heap before the validation ran.
   Replace with ContextFileUploadSupport.bufferUpload(), which streams to
   a bounded temp file and throws MaxFileSizeExceededException the moment
   the configured limit is passed; translate that into the same
   AttachmentException size-validation error the previous code raised.
   Promoted BufferedUpload and MaxFileSizeExceededException to public so
   the attachments package can consume them.

3. Startup NPE when objectStorage is null (bug) — initialize() called
   config.getObjectStorage().getMaxFileSize() without a null guard, so a
   deployment that doesn't configure object storage would NPE on server
   start. Added the same guard ContextFileResource.initialize() already
   uses, gave MAX_FILE_SIZE a safe 5 MiB default, and also null-guarded
   the S3-configuration branch of the CDN URL lookup so a pure-Azure or
   pure-NoOp setup doesn't fall off the end of the ternary.

Ran mvn spotless:apply — picks up formatting-only changes in
CollectionDAO.java and FolderIndex.java as a side effect of the shared
helper additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add ui-checkstyle skill + fix residual import-order drift

CI's UI Checkstyle workflow has three per-area jobs (lint-src,
lint-playwright, lint-core-components) that reformat the files changed
in the PR and fail if the reformat produces a diff. CLAUDE.md and
AGENTS.md didn't previously document this flow, so re-running the fix
was a guessing game — the two lint-core-components and lint-playwright
failures on this branch came from stale import order left over from the
main→context_center merge.

This commit:

- Adds a dedicated invocable skill at .claude/skills/ui-checkstyle/SKILL.md
  (Claude Code harness) and a mirror at .agents/skills/ui-checkstyle/SKILL.md
  (Codex-style agents). Both describe the exact three-command sequence CI
  runs — organize-imports-cli → eslint --fix → prettier --write — the
  per-area file scoping, the `--check` dry-run mode, and the rule that
  organize-imports must run BEFORE prettier (otherwise the indentation /
  trailing-comma round-trip leaves a dirty diff).

- Promotes the existing one-liner in CLAUDE.md and AGENTS.md to an explicit
  "run before finishing any UI task" instruction that points at the skill.

- Fixes two residual import-order drifts (KnowledgePagesHierarchy.tsx,
  EntityUtilClassBase.ts) surfaced by running the skill's sequence locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix UI checkstyle on EntityUtilClassBase.ts

ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the
fallback return in getEntityByFqn. Committing the formatted version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix ContextFileIT.testFileAppearsInSearch flaky 500 from query_string parsing

The previous polling search used the namespaced unique name as a free-text
q= argument. The namespace prefix contains '-' which the ES 9.x query_string
parser treats as a NOT operator, producing a deterministic 500 across the
full 30s polling window even when the document was indexed.

Switch to the direct get-by-id endpoint (/v1/search/get/{index}/doc/{id}),
which performs a real-time ES GET with no query_string parsing and no
analyzer involvement — the most reliable signal that the document was
indexed. Bump the timeout to 60s and capture the response body on any
non-200 so future regressions surface the real ES error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix knowledge center icon

* update knowledge center to context center

Co-authored-by: Copilot <copilot@github.com>

* Revert "update knowledge center to context center"

This reverts commit f0cca5fd65.

* Fix UI checkstyle: sort tag*-related imports in SearchClassBase

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Jest coverage failures in KnowledgeCenter Layout and right panel

KnowledgeCenterLayout was importing i18n directly from LocalUtil, but the
global setupTests mock for that module only exposes t/on. Switch to the
useTranslation() hook so it picks up the react-i18next mock that already
provides i18n.dir(), matching how LeftSidebar and RichTextEditor use the
direction.

EntityRightPanelClassBase.getKnowLedgeArticlesWidget now returns the
KnowledgePages component instead of null. Update the corresponding test
case to assert the new return value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix playwright tests and bugs

Co-authored-by: Copilot <copilot@github.com>

* Fix checkstyle

* Fix /knowledgeCenter/search/hierarchy 500 by removing _id sort

ES 9.x and OpenSearch 3.x reject sorts on the _id field by default
(indices.id_field_data.enabled is false), causing every call to
listPageHierarchy{,ForActivePage} to fail the search_phase_execution_exception
"all shards failed" we see in the screenshot. The _id sort was added
in 4a75852a7e as a tiebreaker for from/size pagination, but
fullyQualifiedName is already a keyword field with doc_values and is
unique per page (name is unique within a parent's children) — so no
tiebreaker is needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Cascade hard-delete to descendant pages in search index

KnowledgeCenter pages are nested via FQN (parent.fqn -> parent.fqn.child),
not via a parent.id field on the child doc. The default deleteOrUpdateChildren
case for entity type "page" uses page.id field matching, which doesn't exist
on child page docs — so a recursive hard-delete on the parent removed the
parent from search but left every descendant orphaned in the index. Stale
docs only disappeared on a full reindex.

This logic was overridden in the collate fork's SearchRepositoryExt; it was
lost during the migration when the override class was removed. Fold the
override into the base SearchRepository as a Page-specific case that calls
deleteEntityByFQNPrefix, which deletes by fullyQualifiedName.keyword prefix
match — covering every descendant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Add page/folder/contextFile/securityService to SearchIndexingApp picker

The Search Indexing Application's "Entities" picker shows "No data" when
typing "Page" because the enum in src/utils/ApplicationSchemas/SearchIndexingApplication.json
does not include the Knowledge Center / Drive entity types added on this
branch. The collate fork carried these in SearchIndexingApplication-collate.json
(included page); folder, contextFile and securityService are new on this
branch and never made it into the picker enum during the migration.

Without them in the enum, users cannot select these entity types for
targeted reindex, even though every other reindex code path supports them.

src/jsons/applicationSchemas/* is generated by parseSchemas.js from
src/utils/ApplicationSchemas/* at build time and is gitignored, so only
the source schema is updated here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Restore live index settings on per-entity distributed-promote path

DefaultRecreateHandler exposes two finalization paths:

  - finalizeReindex(...)        — centralized end-of-job promotion. Calls
                                  applyLiveServingSettings + maybeForceMerge
                                  before the alias swap, reverting the bulk
                                  overrides (refresh_interval=-1, replicas=0,
                                  async translog) back to live values
                                  (refresh=1s, replicas=1, durable translog).

  - promoteEntityIndex(ctx, ok) — per-entity promotion. Used by the distributed
                                  search-indexer's "promote as soon as all
                                  partitions for an entity complete" callback
                                  (DistributedSearchIndexExecutor.promoteEntityIndex).
                                  Swaps the alias and cleans up old indices —
                                  but never restored live settings.

When an entity finishes its partitions before the final reconciliation
(typically the smallest entities — e.g. knowledge `page` with ~11 rows),
its index is promoted via the per-entity path, the alias swap succeeds,
and the bulk-build overrides become the new live settings. refresh_interval
stays at -1 in production, so live writes after the reindex are buffered in
the translog and never reach searchable segments until a manual _refresh.
Externally this surfaces as "create an article, hierarchy is empty until I
re-trigger reindex" — exactly the user-reported bug.

Mirror the finalizeReindex sequence by calling applyLiveServingSettings
(and maybeForceMerge for parity) at the top of the promote block in
promoteEntityIndex, before the alias swap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Wire jobData into per-entity reindex promotion handler

DefaultRecreateHandler.applyLiveServingSettings reads from the handler's
jobData field (live + bulk index-settings overrides on the EventPublisherJob).
The per-entity distributed-promotion path in DistributedSearchIndexExecutor
created its own DefaultRecreateHandler instance and never called
withJobData(jobData) on it. With jobData=null, buildRevertJson returns null
and applyLiveServingSettings silently no-ops — meaning the previous fix
(b272de85f9) never actually re-applied live settings on the per-entity
promote path, even though the call was reached.

currentJob.getJobConfiguration() is the EventPublisherJob the strategy
created. Wire it into the new handler at construction time, mirroring the
withJobData call DistributedIndexingStrategy already makes on the strategy's
own handler instance.

With this change, the per-entity promote path now logs

  "Applying live serving settings to staged index '...' for entity 'page':
   {\"number_of_replicas\":1,\"refresh_interval\":\"1s\", ...}"

before the alias swap, and post-promotion `_settings` show
refresh_interval=1s instead of the stuck -1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix delete failure

* Fix java checkstyle

* Fix article deletion issue

* refactor(test): streamline Knowledge Center List setup and teardown processes

* Fix GlossaryTags

* Add missing pieces in knowledge articles

* Fix checkstyle

* Remove reviewer workflow spec

* remove unused util

* Fix the localization changes

* Fix unit tests

* deleted unused svg

* added missing svg

* improved ux of save button & autofocus on title

* lint fixes

* Update page index

* Make calculateFqnDepth static

* fixed the kc imports

* import fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com>
Co-authored-by: Rohit0301 <rj03012002@gmail.com>
Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com>
Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>
2026-05-08 10:56:04 -07:00
Sriharsha Chintalapani
9956592b00
chore(security): bump deps to address reported CVEs (#27994)
* chore(security): bump deps to address reported CVEs

- log4j 2.25.3 -> 2.25.4 (CVE-2026-34477/34478/34480)
- jsonschema2pojo 1.2.2 -> 1.3.0 (CVE-2025-3588)
- netty-bom 4.1.132 -> 4.1.133 (netty-codec/transport GHSAs)
- azure-identity 1.14.0 -> 1.15.2 in openmetadata-service to align
  with parent dependencyManagement

* fix: bump jsonschema2pojo to 1.3.1 to fix maven-plugin classpath

1.3.0 dropped its declared dep on plexus-utils, breaking the
maven-plugin at runtime with NoClassDefFoundError on
org/codehaus/plexus/util/DirectoryScanner. 1.3.1 restores it.
1.3.3 has a separate regression (IndexOutOfBoundsException in
ValidRule), so 1.3.1 is the right pin.
2026-05-08 22:33:03 +05:30
Eugenio
483461a003
Add migrations to ensure PII are really enabled (#27921)
This is especially needed for instances that had already upgraded to 1.12.0 onwards, those instaces skipped the migration cherry-picked in 1.12.6
2026-05-08 15:39:29 +00:00
Akash Verma
459dfa30a5
Add missing Customsearch.md (#27968)
Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>
2026-05-08 15:02:05 +00:00
Anujkumar Yadav
d4387aa644
feat: Add request access button for data product (#27973)
* feat: Add request access button for data product

* Fix lint checks

* fix lint issue and addressed comments

* fix test
2026-05-08 12:36:40 +00:00
Harshit Shah
19ca2b96c0
fix: migrate and polish TestSuite pipeline tab (#27914)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* fix(ui): migrate and polish TestSuite pipeline tab

Migrate the TestSuite Pipeline tab to core-ui table primitives and align behavior with the ingestion table experience, including row actions, count rendering, header tooltip styling, and placeholder spacing.

* fix checkstyle

* fix failing tests

* address comments

* fix playwright checkstyle

* remove unnecessary changes
2026-05-08 08:33:46 +00:00
Pere Miquel Brull
10f26581b8
chore(mcp): add server.json for MCP Registry publishing (#27982)
* chore(mcp): add server.json for MCP Registry publishing

Adds metadata for publishing openmetadata-mcp to the official MCP
Registry (registry.modelcontextprotocol.io). Aggregators like PulseMCP
scrape the official registry, so this single entry surfaces the server
across the ecosystem.

The server is self-hosted per deployment, so the streamable-http URL
uses an {openmetadata_host} template variable that clients resolve to
their own OpenMetadata hostname.

* chore(mcp): align server.json description with #27975 messaging

Reframes the registry description to match the "trusted context and
business semantics for AI" positioning from the README rebrand in #27975.

Also tightens the description to satisfy the schema's 100-char cap on
the field (the prior 506-char copy would have failed validation at
publish time) and adds websiteUrl pointing to the MCP docs page.

* chore(mcp): mark server.json description as the official MCP

The registry namespace (io.github.open-metadata/*) is invisible to users
browsing aggregators like PulseMCP — they see only title and description.
Calling out "Official OpenMetadata MCP" differentiates this canonical
entry from any community wrappers people might publish under other
namespaces.

* chore(mcp): clarify host variable supports custom ports

Many self-hosted OpenMetadata deployments run on the default :8585
without a reverse proxy. Spell that out in the openmetadata_host
variable description so users know they can include a port.

* fix

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-08 10:14:31 +02:00
harshsoni2024
44ed018064
MINOR: add more logs in pbi for lineage (#27970)
* add more logs on lineage

* minor fix

* string literal fix

* RUF010 exception

* space fix

---------

Co-authored-by: Satender K <satendra.kumar@getcollate.io>
2026-05-08 10:13:54 +02:00
Harshit Shah
bb6c43768f
Migrate ColumnProfileTable from antd to core components Table (#27965)
* feat(profiler): migrate ColumnProfileTable from antd to core-ui Table

Replaces the antd-based Table wrapper in ColumnProfileTable with the
@openmetadata/ui-core-components Table primitive (react-aria-components
foundation). Removes antd ColumnsType column definitions in favour of
explicit Table.Row/Table.Cell render, adds client-side sort via
SortDescriptor state, manual expand/collapse for nested columns via
FlatRow flattening, and preserves data-row-key/expand-icon attributes
for e2e selector compatibility.

Ref: https://github.com/open-metadata/openmetadata-collate/issues/3837

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix checkstyle

* refactor(profiler): replace inline style width constants with Tailwind classes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:57:54 +05:30
Harshit Shah
7c93c1c54a
[UI] Migrate Observability Alerts table to core-ui components (#27906)
* feat(ui): migrate observability alerts table to core-ui components

Move Observability Alerts page table rendering to openmetadata core-ui Table components and align column layout, loading behavior, and pagination divider handling. Update unit and existing pagination e2e coverage to validate action controls and table structure, and close issue #3837.

* address gitar-bot comments

* fix ui checkstyle

* fix failing tests

* fix playwright checkstyle

* fix failing test

* fix failing pagination tests
2026-05-08 10:12:27 +05:30
Chirag Madlani
a90e7729a6
refactor: streamline SchemaTable component and optimize related metrics form (#27959)
* refactor: streamline SchemaTable component and optimize related metrics form

* fix row expansion issue on update

---------

Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>
2026-05-07 22:58:07 +05:30
Mayur Singal
4e24ba1d8b
fix(cli-e2e): use profileSampleConfig in profiler test builders (#27947)
PR #27184 (commit 47c88d49ce, "Dynamic Sampling Config") moved
profileSample/profileSampleType out of DatabaseServiceProfilerPipeline
and TableProfilerConfig into a nested profileSampleConfig object, but
the CLI E2E test config builders weren't updated. Both pydantic models
now use extra='forbid', so the old format raises "Extra parameter
'profileSample'" and the scheduled py-cli-e2e-tests workflow has been
red on every run since 2026-04-17 (postgres, mysql, mssql, oracle,
redshift, snowflake, redash, metabase, quicksight, tableau,
bigquery_multiple_project, dbt_redshift).

Update the ProfilerConfigBuilder to emit the new schema and update the
BigQuery TableProfilerConfig usage to match.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 22:35:31 +05:30
Teddy
219c5683fa
ISSUE #3032 (#27912)
* feat: move flat sampling to sampling config + dynamic sampling option

* feat: move flat sampling on the backend to sample profile conifg object

* feat: fix circular import

* feat: align UI with new profiler config

* feat: fix json schema

* feat: align python imports with new schema path

* feat: update migration to look at extension

* feat: remove enable

* feat: remove enable

* feat: added titles to sample config

* feat: generated ts classes

* feat: addressed comments

* feat: change sample config instantiation to match new structure

* feat: removed backward compatible fields

* feat: ran java linting

* feat: updated imports to point to generated files

* feat: added dynamic sampler resolution logic

* feat: ran python linting

* feat: remove duplicate migration

* chore: merge upstream and clean conflicts

* feat: update logic to support dynamic and static sampling

* feat: adjust sample config call

* feat: test for statis, dynamic, row count and tier methods

* feat: more sample config unit tests

* feat: added tests for metric and sampling

* feat: added tests to validate fallback is not called i nmetric computers

* feat: strengthen profiler validation tests

* feat: fix sampling config

* feat: fix sampling config

* feat: fix sampling config

* feat: generated typescript models

* feat: fixed missing dq pipeline migration

* feat: fixed static check

* feat: fixed ci failures

* feat: fixed ci failures

* feat: fixed unit tests faioure and linting

* feat: fixed integration tests failures

* chore: fixe burstiq refactor

* chore: fix trino ci failures

* chore: revert baseline.json file

* chore: fix sampler availabl burst iq changes

* feat: added smart sampling radio button

* feat: ignore static checks errors

* feat: ran ts linting

* feat burstiq infinite recursion issue with dynamic as default

* feat: translate i8n keys

* feat: fix failing tests
2026-05-07 09:01:18 -07:00
Rohit Jain
b42c9ad3ba
Fixed the translations issues in AdvancedSearch description option (#27961)
* Fixed the translations issues in AdvancedSearch description option

* nit
2026-05-07 15:19:54 +00:00
Laura
4c07b28c82
Add alias marketplace (#27943)
* Add alias marketplace

* wire fingerprint and embeddings in domain_index_mapping
2026-05-07 16:50:49 +02:00
Pere Miquel Brull
54ae549fc6
Fixes #27852: propagate tolerations from CronOMJob to scheduled OMJob (#27955)
CronOMJobReconciler.deepCopyPodSpec was copying nodeSelector but
silently dropping tolerations when generating an OMJob from a
CronOMJob template. Manual runs worked because they go straight
through K8sPipelineClient.buildOMJob, but scheduled runs went
through this deep-copy and lost the field, leaving pods Pending
on tainted nodes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 14:38:49 +02:00
Anujkumar Yadav
80375a7dc6
Add data access request support (#27879)
* Add DAR tasks

* Removed UI related changes of DAR

* nit

* Update generated TypeScript types

* fix linting issue

* Removed all languages changes

* nit

* removed white space

* add request data access button with owner/status conditions

* fix lint issue

* fix minor validation for data access button

* fix lint issue

* fix data access button visiable condition

* fix java lint checks and fix test cases

* nit

* fix test

* fix(tasks): model CreateTask.about as entityLink, validate target entity

Replace `about` (FQN string) + `aboutType` (string) with a single
`about` field of type entityLink (`<#E::{entityType}::{fqn}>`). The
resource layer parses the link and resolves it via
`Entity.getEntityReferenceByName(type, fqn, NON_DELETED)`, which
guarantees the target asset exists and is not soft-deleted.

Why: long-FQN data assets were rejected with `[query param name size
must be between 1 and 256]` because the modal was constructing a Task
`name` from the FQN. The `about` was modelled as a free string with
no schema validation that the target was a real, non-deleted entity.
The Threads API already uses entityLink for this exact purpose; tasks
now align with that pattern. The link is supplied as a hidden field
by the UI — users never see it.

Also fixes the missing `@ExtendWith(TestNamespaceExtension.class)` on
`DataAccessRequestIT` that caused four test failures in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix unit test failure

* fix(test): await workflow stage transition in DataAccessRequestIT

The workflow advances the task from pending-workflow-start to review
asynchronously. Asserting on the object returned by create() was a
race condition. Use Awaitility to poll until the stage is review,
matching the pattern in IncidentTaskIntegrationIT.

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-05-07 17:56:44 +05:30
Laura
b12506fc6d
Add container entity type (#27957)
* Add alias marketplace

* wire fingerprint and embeddings in domain_index_mapping

* add container entity to dataAssetEmbeddings

* add container to VECTOR_INDEXABLE_ENTITIES

* Move changes for marketplace to another PR
2026-05-07 14:21:29 +02:00
sonika-shah
e91c90c144
fix: validate custom property name charset (#27808)
* fix: validate custom property name charset

Tighten custom property name validation to block characters that break
downstream parsers, with verified empirical reproduction:

- `"` causes HTTP 500 on PUT /metadata/types/{id}
- `:` breaks CSV import — exporter writes `key:value;key:value`, importer
  splits at first colon, treats prefix as the field name
- `^` breaks OpenSearch query when the name is in
  searchSettings.searchFields — Lucene reads `^` as the boost separator
  in `field^boost`
- `$` breaks CSV import via java.util.regex.Matcher.replaceAll which
  interprets `$<letter>` as a backreference

Adds a `customPropertyName` definition in basic.json and switches
customProperty.json to reference it. Adds a defensive regex check in
TypeRepository.validateProperty so the API returns 400 with a clear
error message even if schema validation is bypassed.

Tests cover allowed-charset acceptance, the four blocked characters,
leading-character validation, max-length enforcement, and unbalanced
brackets.

* Update generated TypeScript types

* test: add schema-vs-Java consistency test for custom property name

Guards against drift between basic.json#customPropertyName and the
TypeRepository regex/length constants. If either side is updated
without the other, CI fails with a message pointing to both files.

The Java validator is kept (better error message + covers internal
callers that bypass the HTTP layer); the consistency test guarantees
the two definitions cannot drift.

* fix: extend custom property name charset after gap-coverage matrix

Re-ran the matrix on previously-untested chars (+ ? * ~ ` \) across all
17 property types × create/patch/CSV/search:

- + ? * ~ ` all pass cleanly on every operation × every property type — add to allow list
- \ fails CSV roundtrip for entityReference and entityReferenceList types
  (escape inconsistency in CSV serialization) — add to block list

Updates the regex, schema description, Java validator error message, and
adds the new chars to the allow/block integration tests. Consistency
unit tests in TypeRepositoryTest continue to pass.

Final allow set: alphanumeric _ - . / & % # @ ! , ; = | ' + ? * ~ `
                 space ( ) < > [ ] { }
Final block set: " : ^ $ \

* Update generated TypeScript types

* updated the custom property name validation

* added name suffix in custom property name

* lint fixes

* include backslash in invalid char

Co-authored-by: Copilot <copilot@github.com>

* fixed the playwright issue

Co-authored-by: Copilot <copilot@github.com>

* lint fix

* fix check style

* Drop redundant Java validator for custom property name; tighten IT assertions

Schema is the single source of truth: jsonschema2pojo emits @Pattern + @Size
on CustomProperty.name from basic.json#/definitions/customPropertyName, and
@Valid on TypeResource.addOrUpdateProperty enforces them at the HTTP boundary.
The hand-written Pattern constant, validateCustomPropertyName, and the
schema-vs-Java sync test were duplicating that rule and could never reach the
HTTP user (Bean Validation always fires first via @Valid).

Tighten the new TypeResourceIT cases from assertThrows(Exception.class) to
assertThrows(InvalidRequestException.class) so a regression to a different
exception type or status code fails loudly.

* restrict few more special characters from Cp name

* minor fix

* Disallow & < > in custom property names; align IT cases

Schema-side counterpart to the UI changes in the previous two commits:
basic.json#/definitions/customPropertyName now blocks &, <, > alongside the
existing " : ^ $ \\. The DOMPurify pass on the UI sanitizes &, <, > into HTML
entities, which produced inconsistent persisted values; rejecting them at the
schema layer prevents that drift across all write paths.

IT updates:
- Drop &, <, > from the allowed-charset cases (and the "withMatched(pair)And<more>" composite)
- Add &, <, > to the disallowed-charset cases
- Drop "<" leading-character case (now covered as a disallowed character)
- Drop "<" / ">" unbalanced-bracket cases

* Update generated TypeScript types

* Close PATCH bypass for custom property name validation on Type

Bean Validation runs for the dedicated PUT /types/{id} (addOrUpdateProperty)
because the resource declares @Valid CustomProperty, and the createOrUpdate
path can't carry customProperties at all (CreateType schema doesn't include
the field). PATCH /types/{id} accepts an opaque JsonPatch, so @Valid never
reaches into the resulting customProperties[] — a JSON Patch like
[{"op":"add","path":"/customProperties/-","value":{"name":"bad:colon",...}}]
persisted bad-named properties (verified live: HTTP 200 before this fix).

Run Hibernate Validator programmatically inside TypeRepository.prepare() so
every write path enforces the schema-derived @Pattern / @Size / @NotNull on
each CustomProperty. The rule still lives only in basic.json — picked up via
the generated @Pattern annotation, executed via ValidatorUtil.validate.

Tests in TypeResourceIT:
- test_patchCannotAddCustomPropertyWithDisallowedName — seeds a valid property
  to ensure /customProperties exists, then PATCHes appending a name with ':',
  asserts InvalidRequestException and verifies the bad name is not persisted
- test_patchCanAddCustomPropertyWithValidName — guards against the fix
  rejecting valid PATCH-driven additions

* Block * in custom property names — breaks ES field-path lookup

When the property name appears in extension.<propertyName>^boost entries of
searchSettings.searchFields, OpenSearch treats * as a field-path wildcard.
The literal * field never matches its own wildcard pattern, so the field
gets silently skipped from the query and Explore search returns no hit for
the value. Bisected against the running server: of 12 candidate Lucene-special
chars, only * actually breaks the mainline UI search flow. ? ~ ( ) { } [ ] /
! and space all returned hits via the searchFields path because OS looks up
the field literally and only treats * as a wildcard at that layer.

Updates the regex + description in basic.json/customProperty.json, the UI
regex in regex.constants.ts, the validation message across 19 locales, the
generated TS docstrings, the Playwright invalid-name fixtures and spec, and
the IT TypeResourceIT case (with*asterisk moves from allowed to disallowed).

* Validate only newly-added custom properties; isolate PATCH IT to fresh types

prepare() previously validated the entire customProperties[] on every Type
write. An upgraded instance with a legacy property whose name contained a
now-banned char would then reject any subsequent PUT/PATCH on that type,
even when the write only adds a different valid property. Move the name
validation into TypeUpdater.updateCustomProperties() and scope it to the
`added` list computed by recordListChange against the original entity. New
properties are still validated; pre-existing names are left alone.

Replace the IT PATCH cases' shared `topic` Type with a fresh, namespaced
entity-category Type per test (createEntityTypeForTest). The shared `topic`
was mutated concurrently by other tests in the class — combined with
PATCH's lack of per-type locking, that produced lost-update races and
flaky asserts. The fresh per-test type has customProperties: [] from
creation, so the patch sets the array directly without a seed property.

* chore: prettier formatting on the new asterisk-rejection test

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* docs: add + ? ~ ` to JSDoc allow-list to match the regex

* fix(it): request customProperties field on read-back in PATCH IT

Type.customProperties is a lazy field — TypeRepository.setFields only
populates it when the request URL includes ?fields=customProperties. The
default getTypeById helper omits the param, so the read-back always saw
customProperties == null. That made test_patchCanAdd... fail (the just-
persisted property wasn't visible) and made test_patchCannotAdd... pass
for the wrong reason (would have stayed green even if the bad name had
slipped through validation).

Add a fields-aware getTypeById overload and use it in both PATCH cases.
Empirically verified against the live server: good name returns 200 +
appears in customProperties, bad name returns 400 + does not.

* minor fix

* playwright test fix

* removed unecessary test

* blocked ~ and / from custom property name

* lint-fix

* Block / and ~ in custom property names (JSON Pointer reservations)

Forward slash and tilde are reserved by JSON Pointer (RFC 6901): / is the
path separator and ~ is the escape lead-in (~0 = ~, ~1 = /). Allowing
them in a property name shifts the burden onto every caller that builds
a JSON Patch by string interpolation; a raw `/extension/${propertyName}`
either splits into the wrong number of segments or contains an invalid
escape sequence, and the server applies the patch to the wrong key (or
400s outright).

This surfaced as a reproducible failure in the table-cp Playwright suite:
the preceding test ended with `path: \`/extension/${propertyName}\`` where
propertyName ended in `/`. The server addressed extension[name-without-/][""]
instead of extension[name-with-/], returned 400, and TableClass.patch
overwrote entityResponseData with the error body — stripping id and FQN.
The next test fell into the search-based navigation path with an empty
search term and timed out at 180s.

Tighten the schema regex in openmetadata-spec/.../basic.json#customPropertyName
to drop / and ~ from the allowed set; update the human-readable description
in basic.json and customProperty.json to call out the RFC 6901 reservation.
Move the with/slash and with~tilde cases from the allowed-charset IT to
the disallowed-charset IT in TypeResourceIT.

* Update generated TypeScript types

* Use fresh per-test Type in custom-property name validation IT

The five charset/length/lead-char tests added in this PR previously mutated
the shared built-in TABLE_ENTITY_TYPE under @Execution(CONCURRENT). The
PUT path acquires TYPE_PROPERTY_LOCKS so concurrent writes serialize, but
relying on that lock for test isolation is fragile — the PATCH-driven IT
in the same class already uses a per-test fresh Type via
createEntityTypeForTest(client, ns, ...) for exactly this reason
(see 1864b0a6ac). Switch the five PUT tests to the same pattern so no
test mutates a shared Type, eliminating cross-test coupling regardless of
whether the server-side lock is in place.

Tests affected:
- test_customPropertyNameAllowedCharacters_succeeds
- test_customPropertyNameDisallowedCharacters_fails
- test_customPropertyNameMustStartWithAlphanumeric_fails
- test_customPropertyNameTooLong_fails
- test_customPropertyNameUnbalancedBrackets_succeeds

* Align UI artifacts with the tightened custom-property-name regex

Three small follow-ups flagged by reviewers:

- regex.constants.ts: JSDoc above CUSTOM_PROPERTY_NAME_REGEX still listed
  / and ~ as allowed even though the pattern below was tightened to drop
  them. Update the comment to match the actual regex and call out the
  RFC 6901 reason so future edits don't reintroduce them.

- CustomProperties.spec.ts: the "should accept a valid name with allowed
  special characters" test fed a hardcoded string containing ~ and /,
  which the new regex rejects — the assertion would fail. Drop those two
  characters so the input stays in the allowed set.

- zh-cn.json: the Simplified Chinese translation of
  custom-property-name-validation was double-escaped (\\\" and \\\\),
  which would render to users as literal \" and \\ rather than " and \.
  Match the escaping pattern used by the other 18 locales.

* addressed gitar comment

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Rohit0301 <rj03012002@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-07 15:35:43 +05:30
Satender K
ea65ce78bf
Fixes 27433: task page is being updated when user is already on same page and clicks on the new assigned task from bell icon. (#27903)
* fixed issue 27433

* updated code as per GitAR comments

* updated code for UI check fails

* added E2E test case for issue 27433

* fix(e2e): make task notification refresh test self-contained

Replace hardcoded 'raw_order' search with a test-owned TableClass entity,
remove fragile URL-splitting FQN extraction, and clean up the created task
in afterAll to prevent residual data across test runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style(e2e): apply prettier and import organization to TaskNavigation spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* worked up comments by Rohit

* updated test case as per review comment

* added afterAll back as removing it will leave resources leaked in DB, as per GitAR

---------

Co-authored-by: Satender <sommy@Satenders-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 15:20:02 +05:30
Ram Narayan Balaji
339b3dfb18
fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs (#27940)
* fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs

- jetty-http: 12.1.6 → 12.1.7 (HTTP Request Smuggling, CRITICAL)
- bcpkix/bcprov/bcutil-jdk18on: 1.80 → 1.84 (Crypto Signature Bypass + Timing Attack)
- postgresql: 42.7.7 → 42.7.11 (SCRAM-SHA-256 DoS)
- httpcore5-h2: pinned to 5.3.5 (HTTP/2 stream reset DoS)
- commons-compress: pinned to 1.26.0 (Infinite Loop DoS)
- jackson-core: 2.18.6 → 2.19.0 (async parser resource exhaustion)
- maven-shade-plugin: 3.5.1 → 3.6.0 (supports Java 22 MR-JAR in jackson-core 2.19.0)
- openapi-generator template override: jackson-version 2.17.1 → 2.19.0 in generated swagger pom

* fix(security): upgrade spring-web 6.2.11 → 6.2.18

* fix(security): align jackson-dataformat-yaml, feign, gson, logback versions

- jackson-dataformat-yaml: 2.17.2 → ${jackson.version} (2.19.0)
- feign-core: 13.2.1 → 13.5 (in openapi-gen template)
- gson: 2.10.1 → 2.11.0 (in openapi-gen template)
- logback-classic: 1.3.13 → 1.5.25 (in openapi-gen template)

* fix(security): use jackson 2.18.7 — highest clean 2.x with full ecosystem

2.19.0-2.21.0 all carry a HIGH (CVSS 8.7) vulnerability per Sonatype.
2.18.7 is the latest clean patch where all Jackson modules are released.

* fix(security): remove hardcoded jackson 2.17.2 override in k8s-operator, inherit 2.18.7 from root

* fix(security): upgrade gson 2.11.0 → 2.13.1 (Medium CVE)

* fix(security): replace 436-line pom.mustache with minimal stub

The openapi-generator-maven-plugin writes target/generated-sources/swagger/pom.xml
at build time with hardcoded jackson 2.17.1. Snyk --all-projects picks up every
pom.xml on disk and flags it as HIGH.

The generated pom.xml is never packaged into any JAR or Docker image — it is a
generator artefact. The actual runtime jackson version comes from the module pom
inheriting jackson.version=2.18.7 from the root. Replace the 436-line verbatim
upstream template (maintained just to change 2 version lines) with a 10-line
coordinate-only stub. The generated pom.xml will have no <dependencies> block,
so Snyk finds nothing to flag.
2026-05-07 09:19:10 +00:00
Sid
c24e5098ce
fix(playwright): unblock glossary bulk import after modal close (#27952)
* fix(playwright): unblock glossary bulk import delete loop after modal close

The trailing `getByRole('dialog').getByRole('img').click()` at line 267
fired after the bulk-import modal had already been closed and asserted
not visible. It would either miss or grab a residual Ant modal element,
leaving a `.ant-modal-mask` attached over the page. The mask was invisible
to the accessibility tree but intercepted pointer events, so the next
`settingClick` in the delete-properties loop hung waiting for the
`customProperties.glossaryTerm` card to become actionable until the 180s
test timeout.

Replace the bogus click with a `.ant-modal-mask` count assertion so the
next step only proceeds once the overlay has detached, and gate each
loop iteration on a `waitForURL` to the glossary-term detail page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(playwright): use toBeAttached over .ant-modal-mask for modal close gate

Drop the `.ant-modal-mask` count assertion — it leaks a UI-library
internal class into the test and would match unrelated modals.

The bulk-import modal is mounted via `{isModalOpen && (...)}`
(BulkImportVersionSummary.component.tsx), so the entire overlay subtree
unmounts atomically when closed. Asserting the existing
`bulk-import-details-modal` testid is `not.toBeAttached()` waits for
that unmount and guarantees no backdrop is left intercepting clicks
— stronger than `not.toBeVisible()`, which would pass mid-animation
while the overlay wrapper is still in the DOM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Siddhant <siddhant@MacBook-Pro-678.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 08:23:53 +00:00
Rohit Jain
e48f3d7ead
Upvote/Downvote Icon Loses Primary Color on Blur After Liking Entity Page (#27898)
* Upvote/Downvote Icon Loses Primary Color on Blur After Liking Entity Page

* fixed bg color

* minor fix
2026-05-07 12:32:07 +05:30
Sriharsha Chintalapani
b837ade95a
docs(github): require issue link, design, tests, UI recording in PR template (#27891)
Expands `.github/pull_request_template.md` to require a linked issue, a
high-level design (for large PRs), a structured Tests section (use cases,
unit + coverage %, backend/ingestion integration tests, Playwright, manual
steps), and a UI screen recording for any UI change. Adds a `/pr-checklist`
skill that walks the template, gathers evidence, and drafts the PR body
before opening via `gh pr create`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 08:05:56 +02:00
Sriharsha Chintalapani
3beb1e020b
Improve cache warmup configuration and availability (#27948)
* Fix cache warmup app config rendering

* Add optional relationship cache warmup

* Restore relationship repository in warmup test

* Update generated TypeScript types

* Disable cache warmup when cache is unavailable

* Address cache warmup review comments

* Address Copilot cache warmup comments

* Memoize app detail tabs

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-05-06 21:56:07 -07:00
Nikhil Chennam
a136325cf0
fix: API response for TableColumnCountToBeBetween (#27900)
* Fix typo in tablecolumnCount

Changes:
- Remove redundant None checks for min/max bounds (already have defaults in base class)
- Add type: ignore for Optional[float] comparison to satisfy type checker
- Update test to assert exact result message instead of substring matching
- Test now verifies full message format including min/max bound values
2026-05-07 10:15:39 +05:30
Eugenio
adcdd345be
Fix recognizer inclusion based on language (#27919) 2026-05-07 00:16:20 +00:00
Sriharsha Chintalapani
f9d3c85d20
fix(search): restore live settings on per-entity promote path (#27920)
* Restore live index settings on per-entity distributed-promote path

DefaultRecreateHandler exposes two finalization paths:

  - finalizeReindex(...)        — centralized end-of-job promotion. Calls
                                  applyLiveServingSettings + maybeForceMerge
                                  before the alias swap, reverting the bulk
                                  overrides (refresh_interval=-1, replicas=0,
                                  async translog) back to live values
                                  (refresh=1s, replicas=1, durable translog).

  - promoteEntityIndex(ctx, ok) — per-entity promotion. Used by the distributed
                                  search-indexer's "promote as soon as all
                                  partitions for an entity complete" callback
                                  (DistributedSearchIndexExecutor.promoteEntityIndex).
                                  Swaps the alias and cleans up old indices —
                                  but never restored live settings.

When an entity finishes its partitions before the final reconciliation
(typically the smallest entities — e.g. knowledge `page` with ~11 rows),
its index is promoted via the per-entity path, the alias swap succeeds,
and the bulk-build overrides become the new live settings. refresh_interval
stays at -1 in production, so live writes after the reindex are buffered in
the translog and never reach searchable segments until a manual _refresh.
Externally this surfaces as "create an article, hierarchy is empty until I
re-trigger reindex" — exactly the user-reported bug.

Mirror the finalizeReindex sequence by calling applyLiveServingSettings
(and maybeForceMerge for parity) at the top of the promote block in
promoteEntityIndex, before the alias swap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Wire jobData into per-entity reindex promotion handler

DefaultRecreateHandler.applyLiveServingSettings reads from the handler's
jobData field (live + bulk index-settings overrides on the EventPublisherJob).
The per-entity distributed-promotion path in DistributedSearchIndexExecutor
created its own DefaultRecreateHandler instance and never called
withJobData(jobData) on it. With jobData=null, buildRevertJson returns null
and applyLiveServingSettings silently no-ops — meaning the previous fix
(b272de85f9) never actually re-applied live settings on the per-entity
promote path, even though the call was reached.

currentJob.getJobConfiguration() is the EventPublisherJob the strategy
created. Wire it into the new handler at construction time, mirroring the
withJobData call DistributedIndexingStrategy already makes on the strategy's
own handler instance.

With this change, the per-entity promote path now logs

  "Applying live serving settings to staged index '...' for entity 'page':
   {\"number_of_replicas\":1,\"refresh_interval\":\"1s\", ...}"

before the alias swap, and post-promotion `_settings` show
refresh_interval=1s instead of the stuck -1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Add regression test for live serving settings on per-entity promote

Asserts that DefaultRecreateHandler.promoteEntityIndex calls
searchClient.updateIndexSettings with the live-revert JSON
(refresh_interval=1s, replicas=1, translog.durability=request) before
swapping the alias, given a handler with bulk overrides wired through
withJobData. Without the two preceding fixes the assertion fails with
"Wanted but not invoked" — applyLiveServingSettings was never reached
on the per-entity promotion path, so the staged index inherited
refresh_interval=-1 and post-promotion live writes never became
searchable until a manual _refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Expand unit coverage around the per-entity promotion contract

DefaultRecreateHandlerTest.PromoteEntityIndexTests:
  - testPromoteEntityIndexAppliesSettingsBeforeAliasSwap: InOrder
    verification that updateIndexSettings runs BEFORE swapAliases. Catches a
    swap-then-revert misordering (which would briefly serve live reads against
    refresh=-1 settings).
  - testPromoteEntityIndexForceMergesWhenConfigured: forceMerge(staged, 1) is
    invoked when bulkIndexSettings.forceMergeOnPromote=true. Catches a
    regression where the force-merge call gets dropped without anyone
    noticing.
  - testPromoteEntityIndexSkipsSettingsWithoutJobData: locks in the safe no-op
    behavior when a handler is constructed without withJobData. Documents that
    no-jobData → no settings call (vs. crash or silent revert to defaults).

DistributedSearchIndexExecutorTest:
  - initializeEntityTrackerWiresJobDataIntoDefaultRecreateHandler: triggers
    the private initializeEntityTracker with currentJob holding a populated
    jobConfiguration and verifies recreateHandler.withJobData(jobData) is
    called on the per-entity handler. This catches the second half of the
    original regression: even if applyLiveServingSettings is reached on
    promoteEntityIndex, jobData=null makes it a silent no-op. Future edits
    that drop the wiring or move handler construction elsewhere will fail
    here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add integration test for live settings restoration after alias promotion

Triggers SearchIndexingApplication with bulkIndexSettings configured
(refresh_interval=-1, number_of_replicas=0, translog.durability=async),
waits for the run to terminate, then queries _settings on the promoted
table_search_index alias against the real OpenSearch/Elasticsearch
container (via TestSuiteBootstrap.createSearchClient()). Asserts that
each concrete index resolved by the alias has the live values applied
(refresh=1s, replicas=1, translog.durability=request) and not the bulk
overrides.

This is the end-to-end counterpart to the unit-level regression test in
DefaultRecreateHandlerTest. Catches the same class of bug at the layer
where it actually surfaced in production: an alias swap that completed
successfully according to logs but left the new live index unsearchable
because refresh was disabled and writes were buffered indefinitely.

Modeled on SearchIndexingFieldsParityIT for run-trigger / poll structure;
adds the post-completion _settings verification step that no other IT
performs today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address PR review: harden settings revert + lock InOrder + drop redundant test

DefaultRecreateHandler: move applyLiveServingSettings + maybeForceMerge
inside the try/finally that unregisters the staged index. Without this, a
transient OS/ES failure during _settings update or _forcemerge propagated out
before the finally ran, leaving searchRepository.unregisterStagedIndex
permanently registered — so live writes kept routing to a staged index
nothing reads from. Same fix applied to finalizeReindex for consistency
(its window is shorter since it runs at end-of-job, but the leak shape is
identical). Per gitar-bot review.

DefaultRecreateHandlerTest:
  - testPromoteEntityIndexAppliesLiveServingSettingsBeforeSwap: replace
    independent verify() calls with InOrder so the test actually locks the
    "settings before alias swap" ordering its name and the PR description
    promise. A swap-then-revert refactor would have passed before this.
    Per Copilot review.
  - Drop testPromoteEntityIndexAppliesSettingsBeforeAliasSwap (the standalone
    InOrder test added in the previous commit) — folded back into the test
    above, which now covers both ordering and JSON content in one place.
  - Add testPromoteEntityIndexUnregistersStagedIndexOnSettingsFailure —
    regression test for the gitar-bot fix above. Verified to fail with
    "IllegalState connection reset" when the calls are moved back outside
    the try block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop verbose explanatory comments from promote-path edits

The why-it's-in-a-try and why-jobData-is-wired blocks read like commit
messages, not code annotations. Tests and commit history carry the
rationale; the code itself reads fine without them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Close Rest5Client in IT _settings helper

readIndexSettings opened a Rest5Client and never closed it, leaking
HTTP connections on test re-runs. Wrap in try-with-resources.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Tighten SearchIndexAliasPromotionIT against false-positive runs

Two reasons the IT could pass without the regression:

  - waitForLatestRunSuccess accepted activeError, which maps to
    COMPLETED_WITH_ERRORS. In that path EntityCompletionTracker can
    invoke promoteEntityIndex(..., false), the staged index is deleted,
    and the alias stays on the pre-existing live index. The
    pre-existing index already has live settings, so the _settings
    assertions pass against it without exercising the promotion path.

  - readIndexSettings on the alias would resolve to that pre-existing
    concrete index even after a no-op promotion, so the assertions
    were never actually checking the staged index.

Reject anything other than success/completed, and assert the alias
resolves to a *_rebuild_* index — proving the swap moved the alias to
a freshly staged index.

Per Copilot review on PR #27920.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Harden alias promotion: defer canonical delete, hard-fail on empty aliases, surface in job status

Four coupled changes that fix latent silent-failure paths in the per-entity
and end-of-job promotion code (predates the recent regression but lands
together since they touch the same blocks).

1. Empty aliases is a hard failure
   `getAliasesFromMapping` returning an empty set used to fall through to a
   skip-swap WARN, then log "Promoted staged index..." and record
   `promoteSuccess`. Canonical was already pre-deleted at that point, so the
   alias resolved to nothing and operators got no error signal. Now: log
   structured ERROR, call `recordPromotionFailure`, return without deleting
   canonical or claiming success. Same fix in both `promoteEntityIndex` and
   `finalizeReindex`.

2. Defer canonical-index deletion until swap success
   Old order: delete canonical → swapAliases (with canonical-name in the
   alias set). If swap fails after delete, the canonical name has nothing to
   resolve to → live data unavailable.

   New order:
     - swapAliases of all non-canonical-name aliases (atomic move from
       canonical to staged). If this fails, canonical still serves with all
       original aliases — no data loss.
     - Delete canonical (only if its name is needed as alias). If this
       fails, parent aliases work; canonical-name lookups still hit the old
       index until retry — degraded, not lost.
     - addAliases for the canonical name on staged. If this fails (data
       loss path: canonical was deleted but alias-add failed), mark
       dataLossPromotions; operator alarm.

3. Promotion outcomes affect job status
   `DefaultRecreateHandler` tracks `failedPromotions` and `dataLossPromotions`
   sets. `RecreateIndexHandler` interface exposes them.
   `SearchIndexExecutor.determineStatus` and
   `DistributedIndexingStrategy.determineStatus` now consult both:
     - any data-loss promotion → ExecutionResult.Status.FAILED
     - any failed promotion (no data loss) → COMPLETED_WITH_ERRORS
   Distributed path checks both the strategy's handler and the per-entity
   executor's handler (different instances, both can record failures).

4. Structured failure log markers
   Replace single-line ERROR with `[ALIAS_PROMOTE_FAILED phase=... entity=...
   stagedIndex=... canonicalIndex=... aliases=...]` markers at every
   promotion-fail exit (empty-aliases / swap1 / delete-canonical / swap2 /
   exception). Each line states whether canonical was deleted and what the
   blast radius is, so operators can grep and triage without reading code.

Tests:
  - testPromoteEntityIndexEmptyAliasesIsHardFailure
  - testPromoteEntityIndexCanonicalNotDeletedWhenStep1Fails
  - testPromoteEntityIndexThreeStepSwapOrder (InOrder swap1 → delete → swap2)
  - testPromoteEntityIndexFlagsDataLossOnAddAliasFailure
  - SearchIndexExecutor: determineStatusFlagsPromotionFailuresAndDataLoss
  - DistributedIndexingStrategy: determineStatusFlagsPromotionFailuresFromEitherHandler

All 146 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Consolidate finalizeReindex and promoteEntityIndex into one core path

Previously two separate methods with ~80 lines each that did the same thing.
PR #25594 forked the code (per-entity vs end-of-job), PR #27865 added a
critical settings-revert step to one half but missed the other — that was
the original regression. The duplication itself is the regression source,
so collapse it: both methods are now thin wrappers around a single private
`promote(EntityReindexContext, boolean)` core. Future features land in one
place by construction; can't drift.

Behavior consolidation:
  - Single source of truth for aliasesToAttach: the EntityReindexContext
    fields (existingAliases ∪ canonicalAliases ∪ parentAliases). These are
    populated by recreateIndexFromMapping at stage-create time and read
    aliases from the live index — strictly a superset of what the old
    getAliasesFromMapping derived from IndexMapping.json (preserves
    operator-added aliases).
  - getAliasesFromMapping deleted. promoteEntityIndex no longer fetches
    IndexMapping at promote time; it reads everything from the context the
    caller built.
  - shouldPromote / staged-delete-on-failure / settings revert / empty-
    aliases hard-fail / three-step swap / cleanup / unregister-staged: all
    in one method body now.

Caller-side semantic change: any code path that called promoteEntityIndex
with a context that had populated existingAliases/canonicalAliases/
parentAliases (which is all production callers — DistributedSearchIndex
Executor and SearchIndexExecutor both already populate them) is unaffected.
A caller that built a bare context with only entity/canonical/staged set
would previously have re-derived aliases from IndexMapping; now it hits
the empty-aliases hard fail. This is strictly safer (fail loud beats
silent skip-with-success) and we've audited callers.

Tests:
  - All 17 existing PromoteEntityIndexTests updated to populate the context
    with the alias fields they previously depended on getAliasesFromMapping
    to produce. One test ("Should handle null indexMapping gracefully")
    rewritten to "Empty aliases on context is handled" — same behavior,
    new wording for the new model.
  - Old GetAliasesFromMappingTests nested class deleted — exercised the
    removed method.
  - New EntryPointParityTests nested class with 3 tests that explicitly
    run the same EntityReindexContext through both finalizeReindex and
    promoteEntityIndex and assert byte-for-byte identical alias state,
    deleted-indices set, and failure-tracking fields. These pin the two
    entry points together against future drift.

Integration test:
  - Added perEntityPromotionIsIdempotentAcrossRepeatedRuns to
    SearchIndexAliasPromotionIT. Triggers the app twice, asserts the
    second run produces a different *_rebuild_* concrete index and that
    live settings still apply — exercises the full
    pre-existing-canonical → three-step-swap path which is what production
    actually does.

Total: 160 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address PR review: post-state checks, FAILED listener, hermetic IT, InOrder

Five additional behavioral fixes from Copilot review on PR #27920.

1. delete-canonical and addAliases: detect failure via post-state, not via
   try/catch. ElasticSearchIndexManager#deleteIndexWithBackoff and
   #addAliasesInternal both swallow transport exceptions and return void
   (same shape on the OS side), so the existing try/catch could never
   observe a failed delete or alias-add. After deleteIndexWithBackoff,
   verify the canonical no longer exists; after addAliases, verify the
   canonical-name alias is actually attached. If either post-condition
   fails, log [ALIAS_PROMOTE_FAILED reason=delete-not-acknowledged] /
   [reason=alias-not-attached] and mark the promotion failed (data loss
   for the alias-not-attached path).

2. SearchIndexExecutor.executeSingleServer: wire onJobFailed for the new
   FAILED status. Previously the listener chain only had callbacks for
   COMPLETED / COMPLETED_WITH_ERRORS / STOPPED, so promotion-driven FAILED
   ended without populating jobData.failure or notifying observers. Pass
   in an IllegalStateException naming the data-loss entities so the app
   run record carries the right failure context.

3. SearchIndexAliasPromotionIT trigger payload: explicitly set
   liveIndexSettings (1s/1/request), liveIndexSettingsByEntity (empty),
   and useDistributedIndexing=true. /v1/apps/trigger/X merges the payload
   into the persisted config rather than replacing it, so without these
   the test could be affected by previous local config or silently exercise
   the single-server path. The hard-coded post-promotion assertions are
   now anchored to values the test itself supplies.

4. testPromoteEntityIndexForceMergesWhenConfigured: replace standalone
   verify() with InOrder(forceMerge → swapAliases) so a refactor that
   swaps aliases first and merges afterward fails the test instead of
   passing.

All 148 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Wrap post-state checks: indexExists / getAliases throws no longer escape

Post-state verification I added in 9a7fa49494 (indexExists after delete,
getAliases.contains(canonical) after add) called the search client
directly. If those calls themselves threw — network timeout, transport
error — the exception escaped promoteWithDeferredCanonicalDelete and was
caught by the outer phase=exception handler with markPromotionFailed(
dataLoss=false). For the getAliases case the canonical index has already
been deleted at that point, so dataLoss=false misclassifies a real data-
unavailability state.

Three small helpers:
  - safeIndexExists(client, index, entityType): gate-time check; returns
    false on throw (conservative — skip delete attempt; if canonical
    actually exists, step-3 addAliases will fail with name collision and
    the alias-attached post-check will record the right blast radius).
  - checkIndexExists(client, index): tristate Boolean for post-delete
    check; null on throw means "couldn't determine state".
  - checkAliasAttached(client, staged, alias): tristate Boolean for
    post-add check; null on throw means "couldn't determine state".

Caller logic:
  - delete-canonical post-check returning null → markPromotionFailed(
    dataLoss=false). Conservative: we don't know if delete actually took.
  - add-aliases post-check returning null → markPromotionFailed(
    dataLoss=true). Canonical IS deleted; alias state unknown is the
    worst case.

Tests:
  - testPromoteEntityIndexHandlesIndexExistsPostCheckThrow: gate returns
    true, post-delete check throws → failed but NOT data loss.
  - testPromoteEntityIndexHandlesGetAliasesPostCheckThrow: post-add check
    throws → failed AND data loss (canonical already gone).

Per gitar-bot review on PR #27920. All 40 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review 4232747647: positive-evidence dataLoss, hermetic IT

Six findings from copilot-pull-request-reviewer.

1+4. Track canonicalDeleted via positive evidence only (DefaultRecreateHandler).
   ES/OS indexExists/getAliases/listIndicesByPrefix all swallow transport
   errors and return false/empty. A "negative" probe result cannot be
   distinguished from "probe failed". The previous shape blindly trusted
   probe values and could misclassify a transient failure as data loss
   when canonical was actually still serving (gate-says-no-but-actually-yes
   case) or, conversely, as not-data-loss when canonical was actually
   gone. Now: a `canonicalDeleted` boolean defaults to false and only
   flips to true after both the delete call and a positive post-state
   check confirm the index is gone. dataLoss classification uses this
   flag — never claim data loss without positive evidence canonical was
   deleted. Added regression test
   `testPromoteEntityIndexAmbiguousGateProbeIsNotDataLoss` for the gate-
   ambiguity case.

2. Wire onJobFailed in DistributedIndexingStrategy.
   Previously only SearchIndexExecutor emitted the listener callback for
   FAILED. The distributed strategy returned the status but never invoked
   listeners.onJobFailed, so jobData.failure stayed empty and the
   AppRunRecord/WebSocket update had no failure context. Now mirrors the
   single-server behavior with an IllegalStateException naming the
   data-loss entities.

3. IT: assertEquals("request", durability) instead of assertNotEquals
   ("async"). The non-equals assertion would pass if a silent translog-
   revert drop left durability at any non-async cluster default, missing
   the regression. Pin the exact configured value.

5. IT: assert exactly one concrete index resolves the alias, not
   at-least-one. A broken swap that leaves the alias attached to BOTH
   the pre-existing live index AND the new *_rebuild_* index would
   satisfy "any rebuild present" but duplicate search results in
   production. Use assertEquals(1, settingsByIndex.size()).

6. IT hermeticity: snapshot/restore SearchIndexingApplication's
   appConfiguration. /v1/apps/trigger/{name} merges payload into the
   persisted config, so without restore later tests in the suite
   inherit this test's bulkIndexSettings / liveIndexSettings /
   useDistributedIndexing values — making suite ordering change what
   they exercise. Both IT methods now do a try/finally with
   snapshotAppConfig() + restoreAppConfig().

All 151 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Wait for restore-triggered run to settle in SearchIndexAliasPromotionIT

restoreAppConfig POSTs to /v1/apps/trigger/{name}, which both merges the
body into the persisted config AND starts a new run. Returning without
waiting left SearchIndexingApplication running into the next test class
(AppsResourceIT.test_triggerApp_200), which then timed out for 2 minutes
on "already running".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix AppsResourceIT.waitForAppJobCompletion case mismatch and timeout

The terminal-state check compared status against uppercase "SUCCESS",
"FAILED", "COMPLETED", but appRunRecord.json defines the status enum
in lowercase ("success", "failed", "completed", ...). The check never
matched and the 30s wait silently fell through to the catch block,
making it a no-op. test_triggerApp_200 then relied on its 2-minute
"already running" trigger retry, which timed out whenever a longer
reindex (e.g. SearchIndexingFieldsParityIT's "all entities" reindex)
was still in flight.

Switch the terminal check to "not running and not started"
case-insensitively, and raise the ceiling to 5 minutes so the wait
actually covers a long in-flight reindex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Run SearchIndexAliasPromotionIT in the sequential bucket

The test triggers SearchIndexingApplication and waits for it to
complete, but during the parallel-tests slot other classes can also
trigger the same app concurrently. The resulting in-flight run
then leaks into AppsResourceIT.test_triggerApp_200 (which runs in
the sequential slot) and exhausts its 2-minute "already running"
trigger Awaitility.

AppsResourceIT is already in the sequential bucket for the same
reason. Mirror it for SearchIndexAliasPromotionIT across all seven
failsafe profiles (mysql/postgres × elasticsearch/opensearch and the
RDF profile) — include in sequential-tests, exclude from
parallel-tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot PR review 4233452655

DefaultRecreateHandler.promote: when canonicalIndex is null but
stagedIndex is non-null, the early-return previously skipped the
finally clause that unregisters the staged index. Live writes could
stay routed to the staged index after we bail. Release the
registration explicitly before returning. Extend the existing
testPromoteWithNullCanonicalIndex unit test to assert the unregister
call.

SearchIndexAliasPromotionIT.snapshotAppConfig: distinguish "snapshot
failed" from "config absent". The previous Map.of() return on
exception caused restoreAppConfig to POST an empty body to
/v1/apps/trigger/{name}, which is a no-op merge that silently leaks
this test's bulk/live setting overrides into downstream tests and
starts a spurious app run. Return null on failure so the caller
short-circuits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Remove SearchIndexAliasPromotionIT in favor of unit test coverage

The IT triggered the bundled SearchIndexingApplication, which made
it expensive (full reindex per run, ~1-2 minutes) and a source of
bleed-through into AppsResourceIT.test_triggerApp_200 even after
moving to the sequential failsafe bucket.

The same surface is already covered by unit tests:
- DefaultRecreateHandlerTest "Should restore live serving settings on
  staged index before alias swap" (and the per-step swap-order,
  data-loss-flagging, post-state-check tests)
- DistributedSearchIndexExecutorTest verifies the
  withJobData(jobConfig) wiring on the per-entity promotion handler

Drop the IT and revert its pom.xml include/exclude entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot PR review 4236718653

DefaultRecreateHandler.promote: validate aliasesToAttach BEFORE
applyLiveServingSettings/maybeForceMerge so the empty-aliases error
path skips wasted I/O and segment churn on a staged index that will
never be swapped in. Extend testPromoteWithEmptyContextAliases to
verify updateIndexSettings and forceMerge are not invoked. Adjust
testPromoteEntityIndexUnregistersStagedIndexOnSettingsFailure to
populate canonicalAliases so it still reaches applyLiveServingSettings.

Refresh checkAliasAttached Javadoc: with positive-evidence
canonicalDeleted tracking, a null result is classified as data loss
only when canonicalDeleted=true; otherwise it is degraded (retryable).
The previous wording claimed every null was data loss, which no longer
matches the call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix per-entity promote when canonical is an alias, not a concrete index

On every reindex after the first, the canonical name (e.g.
openmetadata_table_search_index) is an alias on the previous staged
index — not a concrete index. The three-step swap then attempted
deleteIndexWithBackoff(canonicalAlias), which OS/ES rejects with
illegal_argument_exception ("matches an alias, specify the
corresponding concrete indices instead"). The 6-attempt exponential
backoff burned ~31s per entity. With 30+ default entity types the
SearchIndexingApplication ran past CI's 300s setup budget, blocking
every Playwright shard and python-integration setup at "Setup
OpenMetadata Test Environment".

Two-pronged fix in DefaultRecreateHandler.promote:

1. Detect canonicalIsAlias via getIndicesByAlias(canonicalIndex).
   When true, take the new promoteByAtomicAliasSwap path: a single
   swapAliases call atomically moves every alias (parents + canonical)
   from old → new staged. No name collision, no per-entity
   delete-by-alias-name, no degraded/data-loss windows.

2. listIndicesByPrefix returns the canonical alias name itself among
   its results (alongside concrete *_rebuild_* indices). Filter that
   out of oldIndicesToCleanup when canonicalIsAlias, so the cleanup
   loop's deleteIndexWithBackoff doesn't replay the same 31s backoff.
   Keep canonical in cleanup when it is concrete — that's where the
   first-reindex flow drops the original concrete after the three-step
   swap moves aliases off it.

Local repro: SearchIndexingApplication now completes in ~7s instead
of hanging past 300s. New unit test
testPromoteEntityIndexAtomicSwapWhenCanonicalIsAlias locks the new
shape (single swap, no delete-by-alias, old concrete cleaned up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add ALIAS_PROMOTE_BEGIN diagnostic log per entity

Local repro on the CI-equivalent stack (mysql + elasticsearch + sample
data + ingestion) completes in ~30s with all 60 entities going through
the atomic-swap path. CI on the same commit still hangs past 300s.
Add a structured log line at the top of every promote() so the next
CI run shows which entities reach promotion and what shape (atomic
vs three-step) was selected — pinpoints whether a specific entity
gets stuck or if reindex never reaches promote().

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop heavy alias-promotion refactor; rely on PR #27930 fix already in main

The original "live settings not restored on staged after promote"
regression is fixed by PR #27930 (commit e56abb80d5), which is already
on main: applyLiveServingSettings + maybeForceMerge run before alias
swap, and DistributedSearchIndexExecutor wires job configuration into
the per-entity DefaultRecreateHandler. That minimal fix is sufficient
for the original symptom (newly created entities not searchable until
manual _refresh).

This commit reverts every file we refactored further on top of that
minimal fix back to origin/main:

  - DefaultRecreateHandler.java         — drop deferred-canonical-delete
                                           three-step swap, post-state
                                           checks, dataLoss tracking,
                                           safeIndexExists/checkIndexExists/
                                           checkAliasAttached helpers,
                                           promoteByAtomicAliasSwap path,
                                           ALIAS_PROMOTE_BEGIN diagnostic
  - RecreateIndexHandler.java           — drop getFailedPromotions /
                                           getDataLossPromotions defaults
  - DistributedIndexingStrategy.java    — drop FAILED-status emit and
                                           dataLoss aggregation
  - SearchIndexExecutor.java            — drop FAILED-status emit
  - DistributedSearchIndexExecutor.java — drop @Getter on
                                           recreateIndexHandler
  - The matching unit tests in DefaultRecreateHandlerTest /
    DistributedIndexingStrategyTest / SearchIndexExecutorControlFlowTest /
    DistributedSearchIndexExecutorTest all revert to origin/main.

The branch's only remaining contribution is the AppsResourceIT
case-mismatch fix in waitForAppJobCompletion (a pre-existing bug
discovered while diagnosing).

Reason: the further refactor has consistently failed CI on this branch
since the first commit. Local repro on the CI-equivalent stack
(./docker/run_local_docker.sh -d mysql -m no-ui -s true -i true) showed
the canonical-is-alias fix working end-to-end (~30s, 60 entities), but
CI still timed out at 300s. Without server-side logs from CI we can't
target the remaining gap. PR #27930 already addresses the user-visible
regression, so the safest move is to drop the further refactor and ship
just the case-mismatch fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Skip delete-by-alias-name when canonical is currently an alias

After the first reindex, the canonical name (e.g. table_search_index)
is an alias on the previous staged, not a concrete index. OpenSearch's
listIndicesByPrefix returns the alias name as one of its result keys,
which then drives a deleteIndexWithBackoff(canonicalIndex) attempt
that fails with "[illegal_argument_exception] matches an alias,
specify the corresponding concrete indices instead". The 6-attempt
exponential backoff burns ~31s per entity (1+2+4+8+16s); on a 60-entity
reindex that wastes ~30 minutes in cleanup with search degraded
throughout.

Drop the alias name from oldIndicesToDelete when getIndicesByAlias
proves canonical is an alias right now. The atomic swapAliases call
moves the canonical alias from the old concrete to the new staged in
one step; the underlying old concrete is already in oldIndicesToDelete
and gets cleaned up normally by the post-swap loop. No three-step swap
or deferred-canonical-delete restructure needed.

Adjusts testFinalizeReindexPromotesPartialData to use a realistic
canonical-concrete setup (no self-alias on its own name — that state
cannot exist in real OS/ES) so the new guard does not misfire on the
existing test fixture.

New unit test testFinalizeReindexSkipsDeleteWhenCanonicalIsAlias locks
the new behavior: when canonical is an alias, deleteIndexWithBackoff
is never called with the canonical name, and the old concrete rebuild
is cleaned up via the swap path.

Verified locally on the CI-equivalent stack
(./docker/run_local_docker.sh -d mysql -m no-ui -s true -i true): both
first reindex (canonical-concrete) and second reindex (canonical-alias)
complete in ~8s with no "matches an alias" errors and clean cleanup
logs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>
2026-05-06 15:16:57 -07:00
Rohit Jain
88b932a90d
Added Missing operators(Contains, Not Contains) for description (#27913)
* Added Missing operators(Contains, Not Contains) for description

* addressed PR comment and fixed unit test

* addressed gitar comment

* fix playwright

* lint fix

* removed dead code
2026-05-07 00:05:02 +05:30
Sriharsha Chintalapani
3487bfbcaa
fix(reindex): Stop button + O(N²) cursor init in distributed mode (#27927)
* Distributed reindex: fix Stop button + O(N²) cursor init

Two independent bugs in the distributed search-index pipeline that surface
together as "stop does nothing, distributed mode hangs even on a single
server" in production. Reproduced both with new tests and fixed.

Bug 1 — requestStop only cancels PENDING partitions
====================================================

DistributedSearchIndexCoordinator.requestStop() called
partitionDAO.cancelPendingPartitions(jobId), whose SQL is:

  UPDATE search_index_partition SET status='CANCELLED'
   WHERE jobId = :jobId AND status = 'PENDING'

PROCESSING rows are untouched. workerExecutor.shutdownNow() (added in
PR #27876) interrupts the worker threads, but the partition rows the
threads were holding stay PROCESSING in the DB. checkAndUpdateJobCompletion
needs pending.isEmpty() && processing.isEmpty() to flip STOPPING → STOPPED;
PROCESSING never empties because nothing updates the rows.

Symptom: strategy's monitorDistributedJob loops forever waiting for a
terminal status, the AppRunRecord never finalizes, the UI keeps showing
"Running" with a ticking timer based on now() - startTime.

Fix:
- New SQL `cancelInFlightPartitions` covering PENDING + PROCESSING.
- requestStop calls cancelInFlightPartitions and immediately invokes
  checkAndUpdateJobCompletion to drive STOPPING → STOPPED in-call rather
  than waiting for the next monitor tick.

Test: testRequestStop_ProcessingPartitionsTransitionToStopped reproduces
the production scenario (RUNNING job + PROCESSING partitions, user clicks
Stop), verifies STOPPED is written before requestStop returns. Verified
to fail with the old SQL.

Bug 2 — Distributed reader is O(N²) due to getCursorAtOffset
============================================================

PartitionWorker.initializeKeysetCursor calls
EntityRepository.getCursorAtOffset(filter, partitionStart) per partition.
Underneath that's `dao.listAfter(filter, 1, partitionStart)` — SQL
`LIMIT 1 OFFSET partitionStart`, which is O(partitionStart) every call.

With partitionSize=10000 and 581k records, partitions start at
offsets 0, 10k, 20k, ..., 570k. Total cursor-init scan cost is
0+10k+20k+...+570k ≈ 16.5M rows scanned just to find each partition's
starting cursor. Single-server (KeysetBatchReader) does ~581k. ~28× more
DB work per job + multi-worker parallelism amplifies into 140× wall-clock
slowdown observed in production (1.4k r/s single-server reader vs 10 r/s
distributed reader on the same dataset).

Fix:
- DistributedSearchIndexCoordinator.precomputePartitionStartCursors:
  one keyset walk per entity type at job initialization time, batching
  in 10k chunks and recording the cursor at every partition's rangeStart.
  O(N) total reads per entity type.
- DistributedSearchIndexCoordinator.getPartitionStartCursor: O(1) lookup
  from the precomputed map.
- PartitionWorker.initializeKeysetCursor consults the cache first; falls
  back to the existing OFFSET path on a miss (recovery scenarios where
  another server initialized the partitions and this server picks one up).

Test: initializeKeysetCursorHitsPrecomputedCacheAndSkipsOffsetFallback
verifies the cache hit short-circuits the OFFSET fallback (using
verify(repository, never()).getCursorAtOffset(...)).
2026-05-06 10:55:56 -07:00
Ram Narayan Balaji
8dd98fa765
fix(it): Stabilize Flaky integration tests (#27546)
* fix(it): stabilize three flaky integration tests

- TagResourceIT.test_searchTagByClassificationDisplayName: raise Awaitility
  timeout from 30s to 90s — under full-suite concurrent load the tag search
  index can lag well past 30s before the tag is discoverable by classification
  display name
- GlossaryOntologyExportIT.testExportGlossaryAsRdfXml: replace legacy
  model.write("RDF/XML") with RDFDataMgr.write(RDFXML_PLAIN) — the legacy
  Jena API attempts external DTD/entity resolution from w3.org, hanging ~104s
  in network-isolated CI before the client times out at 60s; RIOT writes
  purely in-memory with no network I/O
- SearchResourceIT.testExportWithFromAndSizeForPagination: add _id as a
  final tiebreaker sort on export requests in both ElasticSearch and
  OpenSearch managers; from/size pagination without a unique tiebreaker
  produces duplicate rows across pages when concurrent CI tests mutate the
  same index between requests; also deduplicate the redundant name.keyword
  secondary sort when the caller already sorts by name.keyword

* fix(search): use id.keyword instead of _id for export sort tiebreaker

_id is an Elasticsearch meta-field that requires fielddata to sort on,
disabled by default. Use the indexed id.keyword sub-field instead, which
is a proper keyword field with doc values and is sortable without any
cluster setting changes.

* fix(it): retry pagination assertion in Awaitility to tolerate transient index shifts

from/size pagination on a shared search index can return duplicate rows
across two consecutive requests when concurrent tests mutate the same
index in between. Wrapping both page fetches and the assertion in
untilAsserted lets the check retry until the index stabilises rather
than failing on the first transient collision.

* revert(search): drop id.keyword tiebreaker; rely on test-side Awaitility retry

* fix(search): strengthen pagination test assertions and restore id.keyword tiebreaker sort

* fix(it): revert RdfRepository prod change; increase GlossaryOntologyExportIT timeout to 150s for Jena DTD stall in CI

* fix(it): restore tag search index aliases in IndexTemplateIT after index deletion

testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference deletes the physical
openmetadata_tag_search_index and previously restored it with a bare PUT — leaving
all aliases (openmetadata_tag, openmetadata_classification, openmetadata_all)
missing for the remainder of the run. This caused TagResourceIT.checkCreatedEntity
to time out (searches on tag_search_index hit an empty bare index) and delete_by_query
cleanup ops to fail with index_not_found_exception on openmetadata_tag.

Fix: replace the bare PUT with Entity.getSearchRepository().createIndex() which
recreates the physical index with proper OpenMetadata mappings and restores all aliases.

* fix(it): isolate IndexTemplateIT tag test to avoid wiping production search index

testDocUpdateOnDeletedIndexUsesTemplateNotAutoInference was deleting the
production openmetadata_tag_search_index backing index, racing with
TagResourceIT.test_searchTagByClassificationDisplayName which polls that
index for 90s. Use a test-scoped index name matching the template pattern
instead, consistent with the other tests in this class.

* fix(it): make testClaimPendingIncludesRetryStatuses race-tolerant

The production SearchIndexRetryWorker (4 daemon threads, 5s poll) races
the test by calling the same global claimPending SQL. Replace the brittle
size-based assertion with an Awaitility loop that checks claimedAt != null
for each inserted record — proving claimPending's SQL filter accepted the
record's status regardless of which thread won the race.

* fix(it): avoid stale entityStatus in patch_addDeleteReviewers

The GlossaryTermApprovalWorkflow fires asynchronously when reviewers are
added, setting entityStatus=IN_REVIEW. The final patch sent the stale
entityStatus=APPROVED from the previous response, causing a spurious
IN_REVIEW→APPROVED transition in the diff which requires the caller to
be a reviewer — admin is not. Re-fetch the entity before the reviewer
removal so the diff contains only the reviewer change.

* fix(it): handle claimedAt reset in testClaimPendingIncludesRetryStatuses

updateFailureAndRetryCount sets claimedAt=NULL after the worker processes
a record. Add retryCount > 0 as a secondary proof-of-claim signal so
records that were claimed, processed, and had claimedAt reset are still
counted — covers the FAILED exhaustion path and intermediate PENDING_RETRY_*
states where claimedAt is temporarily null.

* fix(it): avoid governance workflow race in testApplyFeedback_withRecognizerMetadata

repository.create() publishes a ChangeEvent that triggers
ApplyRecognizerFeedbackImpl asynchronously. That workflow call races
with the direct applyFeedback below: by the time the workflow runs, the
GENERATED tag is already removed by the direct call, so
getRecognizerIdFromTagLabel returns null and the workflow falls back to
ALL recognizers, contaminating recognizer2.

Fix: insert directly to DAO (bypassing publishChangeEvent) so the
governance workflow is never triggered for this unit-level test.

* fix(it): handle worker deleteByEntity path in testClaimPendingIncludesRetryStatuses

The worker's processRecord takes the delete path (removeStaleEntityById +
deleteByEntity) when resolveEntityReference returns null but entityId is
non-empty — which applies to our fake UUID test records. If the worker
wins and deletes a record, findByStatus finds nothing and the assertion
fails.

Fix: track which IDs are still visible in any status. An ID absent from
all statuses was deleted by the worker after a successful claim —
deleteByEntity is only reached after claimPending accepted the record,
so absence is equally valid proof that claimPending's SQL filter worked.

* fix(it): re-fetch before reviewer patches in test_glossaryTermReviewersMultipleUpdates

Same root cause as patch_addDeleteReviewers: GlossaryTermApprovalWorkflow
fires asynchronously after reviewers are added, setting entityStatus=IN_REVIEW.
Subsequent patches using the stale APPROVED status from the previous response
trigger a spurious IN_REVIEW→APPROVED transition, rejected because admin is
not a reviewer. Re-fetch before each subsequent patch to avoid the stale status.

---------

Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>
2026-05-06 17:50:13 +00:00
Chirag Madlani
83b9e55122
fix(test): flaky container and activity task specs (#27942) 2026-05-06 15:25:17 +00:00
Mohit Yadav
e56abb80d5
Fix Entity Promotion issue (#27930)
Some checks failed
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Has been cancelled
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Has been cancelled
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Has been cancelled
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Has been cancelled
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Has been cancelled
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Has been cancelled
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Has been cancelled
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Has been cancelled
Java Checkstyle / java-checkstyle (push) Has been cancelled
Maven Collate Tests / maven-collate-ci (push) Has been cancelled
OpenMetadata Service Unit Tests / Detect Changes (push) Has been cancelled
Publish Package to Maven Central Repository / publish-maven-packages (push) Has been cancelled
2026-05-06 15:20:49 +02:00