mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
* fix(rdf): dedupe lineage edges and broaden PROV-O coverage
The RDF Knowledge Graph endpoint was emitting two edges per lineage
relationship — once as `om:UPSTREAM` (forward) and once as
`prov:wasDerivedFrom` (reverse) — because the parser preserved each
predicate's native subject/object orientation instead of canonicalizing
both into a single `(upstream, downstream)` edge.
Also extend PROV-O coverage so external SPARQL clients can use the W3C
Provenance vocabulary directly:
- `prov:Entity` / `prov:Activity` / `prov:Agent` class typing on
datasets / pipelines / users
- `prov:wasAttributedTo` mirror of `om:owners`
- `prov:generated` (inverse of existing `wasGeneratedBy`) and `prov:used`
on lineageDetails so the Entity → Activity → Entity chain is complete
- `prov:hadPlan` + `prov:Plan` for SQL transformation recipes
- `prov:startedAtTime` / `prov:endedAtTime` on Activity instances
- `prov:wasAssociatedWith` Activity → Agent linking
- `prov:invalidatedAtTime` on soft-deleted entities
Other RDF cleanups in the same area:
- LineageDetails URIs are now deterministic (driven by from/to ids
instead of a timestamp), so re-indexing collapses duplicate Activity
resources via the existing DELETE+INSERT idempotency
- Skip emitting the redundant `om:owners` JSON-string literal — the
mapped path already produces clean `om:hasOwner <agent>` triples
- Skip empty `[]` array literals in the unmapped path
- Propagate failures from `RdfRepository.{addRelationship,
addLineageWithDetails, bulkAddRelationships,
bulkAddGlossaryTermRelations}` instead of silently swallowing them,
so downstream callers can surface the failure
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf-index-app): surface Fuseki failures in app run record
Per-entity and per-batch failures from the RDF index app used to be
logged via SLF4J only — they never made it into the AppRunRecord, so
the UI/run history showed "completed" even when every entity had
silently failed to write to Fuseki.
- `RdfBatchProcessor.processEntities` now captures the last error per
entity, returns it in `BatchProcessingResult.lastError`, and
accumulates relationship-processing failures into the same result.
- Relationship and lineage processing methods (`processBatchRelationships`,
`processLineageRelationship`, `processGlossaryTermRelations`) return
structured results with failure counts and last-error messages instead
of `void`, so failures are visible to the partition worker.
- `RdfIndexApp` records the failure on `jobData` for both the
distributed and non-distributed code paths, so users see a real
error message in the run history (e.g.
"Failed to write entity X to Fuseki: ConnectException").
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* perf(rdf-index-app): port distributed-mode improvements from SearchIndex
The RDF distributed-indexing fork was lagging behind several SearchIndex
improvements that addressed concrete reliability and throughput issues.
Port them across:
Core perf / reliability
- Precomputed partition start cursors: coordinator walks each entity
once via keyset pagination at job init and caches the boundary cursor
per (jobId, entityType, rangeStart). Workers consult the cache before
falling back to the OFFSET-based path. Eliminates the previous O(N²)
per-partition cursor lookup.
- `cancelInFlightPartitions` + `requestStop` + `checkAndUpdateJobCompletion`
on the coordinator. Stop now cancels both PENDING and PROCESSING
partitions in a single SQL update and immediately drives the job
status from STOPPING → STOPPED, so the UI status no longer hangs
while workers drain.
- Selective field hydration: `RdfPartitionWorker.readEntitiesKeyset`
uses `ReindexingUtil.getSearchIndexFields(entityType)` instead of
`List.of("*")`, avoiding expensive fetchers (e.g. fetchAndSetOwns)
per batch.
- Partition heartbeat thread: virtual thread refreshes
`lastUpdateAt` every 30s for partitions actively being processed by
this server, so the stale reclaimer no longer interrupts active work.
- `MAX_IN_FLIGHT_PARTITIONS_PER_SERVER = 5` backpressure: claim path
rejects when the server already holds 5 PROCESSING partitions, giving
fair distribution across pods. Verified the existing claim DAO uses
`FOR UPDATE SKIP LOCKED` for both MySQL and Postgres.
- Gate WebSocket stat broadcasts during the STOPPING phase so the
Quartz-scheduler-driven STOPPED status push isn't overwritten.
Multi-server scaffolding (single-pod is unaffected)
- `RdfPollingJobNotifier`: DB-polling discovery for other server pods
to find an in-flight RDF reindex they can join.
- `RdfEntityCompletionTracker`: per-entity-type partition tracking with
callback firing once all partitions for an entity complete, foundation
for early per-entity index promotion.
Tests: precomputed-cursor cache lookup, in-flight backpressure,
cancelInFlight delegation, completion tracker callback semantics,
notifier start/stop.
DAO additions on `rdf_index_partition`:
- `cancelInFlightPartitions(jobId, now)` — covers both PENDING and
PROCESSING in one statement
- `countInFlightPartitionsForServer(jobId, serverId)` — backpressure
- `countPartitionsByStatus(jobId, status)` — used by completion check
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(ui-apps): hide misleading data on synthetic 'CurrentConfig' row
When an app has no run history, AppRunsHistory fabricated a synthetic
placeholder row that looked like a real run — `runType: "CurrentConfig"`,
a fake `Run At` timestamp pulled from `appData.updatedAt`, an
ever-growing `Duration` (`now − updatedAt`), and an active `Stop` button
that targeted nothing.
Render `--` for `Run At`, `Run Type`, and `Duration` on synthetic rows,
and hide the `Stop` button so users no longer see "Run now → 19-minute
Running with Stop button" when the actual job never registered. Real
app runs are unaffected — they still display `runType` from the
backend (OnDemandJob, Hourly, Daily, Custom, etc.).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): address PR review findings
Four issues raised in PR #27999 review:
- **Cursor format consistency in walkAndRecord** (bug):
The defensive branch produced cursors via a custom `{name, id}` map
while the regular path used `repo.getCursorValue()`. For entities
with quoted names these encodings diverge — a quoted-name entity
could land in the cache with a cursor incompatible with what the
worker fetches via keyset pagination. Track the last seen entity
reference and run it through `repo.getCursorValue()` in both paths.
`encodeBoundaryCursor` is removed.
- **Adaptive scheduling in RdfPollingJobNotifier** (perf):
The previous implementation woke the scheduler thread every 1s and
short-circuited inside the poll method when idle. Reschedule the
task at the appropriate interval (1s active / 30s idle) when
`setParticipating` flips, so the thread genuinely sleeps when idle.
- **Cursor cache cleanup on startup recovery** (edge case):
`partitionStartCursors` was only evicted by `refreshAggregatedJob`
/ `checkAndUpdateJobCompletion`. If a coordinator crashed mid-job
and never reached either, the cache entry leaked until process
restart. Add `evictStaleCursorCacheEntries()` invoked by
`performStartupRecovery` that drops entries for jobs that no longer
exist in the DB or are already terminal.
- **Consolidate describeError helpers** (quality):
`describeError`, `describeBulkError`, and `describeLineageError` in
`RdfBatchProcessor` all walked the cause chain and formatted a
prefixed message with the same logic. Reduced to a single
`describeError(prefix, error)` plus a thin `describeEntityError`
adapter for the per-entity call site.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf-index-app): avoid double workerExecutor.shutdownNow() in stop()
stop() called workerExecutor.shutdownNow() inline AND through
cleanupLocalExecution -> shutdownWorkerExecutor, which broke the
DistributedRdfIndexExecutorTest.stopAndCoordinatorCleanupOnlyTearDownLocalExecutionOnce
verify(workerExecutor, times(1)).shutdownNow() expectation. Drop the
inline call — cleanupLocalExecution is the single owner of the
shutdown path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* ci: drop redundant DB matrix from openmetadata-service unit tests
The {mysql, postgresql} strategy matrix on openmetadata-service unit
tests doubled CI cost without adding signal: both jobs ran the same
surefire suite. The `-Pmysql` / `-Ppostgresql` profiles are defined
only in `openmetadata-sdk/pom.xml` (lines 190-206), set a single
`test.database` property, and that property is consumed exclusively by
the failsafe plugin (integration tests `*IT.java` / `*IntegrationTest.java`),
which only runs under `-Pintegration-tests` — not enabled here.
`openmetadata-service` itself has zero tests that read `test.database`
or use `MySQLContainer`/`PostgreSQLContainer` (verified by grep). The
only testcontainer-based DB code in the repo lives in
`openmetadata-integration-tests`, a different module that this workflow
doesn't build.
Run the unit suite once. The `openmetadata-service-unit-tests-status`
required-check aggregator is unaffected (it depends on the renamed job
which still has the same name).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): address Copilot PR review findings
Six correctness issues raised on PR #27999:
- **Lineage-details DELETE was too broad** (RdfRepository): the cleanup
step deleted *all* `<fromUri> om:hasLineageDetails ?d` triples,
so reindexing one (fromId, toId) edge wiped lineage-details links
for every other downstream of the same source entity. Pin the
delete to the specific `<fromUri> om:hasLineageDetails <detailsUri>`
triple. Same with prov:generated cleanup — anchor it to the
specific detailsUri instead of any details resource.
- **Predicate not flipped during canonicalization** (RdfRepository):
`parseEntityGraphEdgesFromResults` swapped subject/object for
reverse-direction predicates (`prov:wasDerivedFrom`,
`prov:wasInfluencedBy`) but kept the original predicate URI on the
resulting EdgeInfo. Exported graphs could carry semantically
invalid triples like `<upstream> prov:wasDerivedFrom <downstream>`.
Add `forwardEquivalentPredicate` to substitute the OM-native
forward predicate when the direction flips.
- **`dct:modified` was an invalid xsd:dateTime** (RdfPropertyMapper):
`entity.getUpdatedAt().toString()` returns the epoch-millis Long as
a string, but the literal was tagged `xsd:dateTime`. Convert via
`Instant.ofEpochMilli(...).toString()` so the lexical form matches
the type — same fix already in place for prov:invalidatedAtTime.
- **Unmapped EntityReference arrays were dropped entirely**
(RdfPropertyMapper): the previous fix to skip noisy JSON-string
literals also dropped fields like `domains`, `reviewers`, `voters`
for entity contexts that don't have a JSON-LD mapping for them —
the unmapped path was the only path emitting them, so nothing
landed in RDF. Expand each array element through
`addEntityReference` so the data still produces proper
`om:<fieldName> <ref>` triples; mapped-path duplicates are
collapsed by Jena's Model dedupe.
- **Partition failure detection missed reader errors**
(DistributedRdfIndexExecutor): the EntityCompletionTracker was fed
`result.errorMessage() != null`, but `RdfPartitionWorker` can
increment `failedCount` from `readerErrors` without ever setting
`lastError`. Use `result.failedCount() > 0` so partitions whose
failures came from `ResultList.getErrors()` are also marked as
failed when promoting an entity.
- **`COMPLETED_WITH_ERRORS` was hidden when failedRecords == 0**
(RdfIndexApp): the coordinator marks a job COMPLETED_WITH_ERRORS
whenever any partition is FAILED or CANCELLED, including for
user-initiated stops where no record-level failures accrued. The
monitor's `completedWithErrors` gate required `failedRecords > 0`,
so those terminal states never hit `jobData.setFailure(...)` and
the run record showed success. Drop the failedRecords precondition
and tailor the fallback message based on whether there are
record-level failures or partition-level only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): separate relationship failures + type lineage as prov:Activity
Two more PR review findings on #27999:
- **Relationship failures inflated failedRecords stat**: `processEntities`
was folding relationship/lineage edge failures into `failedCount`,
which becomes `failedRecords` in the index stats. Records there mean
entities, computed from entity counts in `totalRecords`. Counting
per-edge relationship failures could push `failedRecords` above
`processedRecords`/`totalRecords` and produce nonsensical
per-entity stats.
Track them separately: add `relationshipFailureCount` to
`BatchProcessingResult` and `PartitionResult`. `failedCount` now stays
entity-level. The completion tracker is fed the broader
`result.hasAnyFailure()` so partitions where relationship triples
failed don't get prematurely promoted as success even though their
entity writes succeeded.
- **`detailsResource` wasn't typed as prov:Activity**: the resource
carries Activity-shaped predicates (prov:startedAtTime,
prov:endedAtTime, prov:used, prov:hadPlan, prov:wasGeneratedBy,
prov:wasAssociatedWith) but only the OM-specific
`om:LineageDetails` rdf:type. Add an explicit
`rdf:type prov:Activity` so PROV-O reasoners and federated SPARQL
clients recognize it as an Activity without having to learn the
OM type.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): label lineage edges relative to focal node
The Knowledge Graph view was labeling every edge with relation
type "upstream" as "Upstream" regardless of direction relative to the
focal node. For a focal node F, the raw stored relation `(F, X, upstream)`
means "F is upstream of X" — i.e. X is *downstream* of F. The previous
output labeled both `F → X` and `X → F` edges as "Upstream", which made
bidirectional lineage look like a duplicated relation.
Re-orient the label in `convertEdgesToGraphData` based on whether the
focal is the edge's source or target:
- focal → X → "Downstream"
- X → focal → "Upstream"
- non-focal-touching edges keep the raw relation label.
Reported on a sample-data table with a circular lineage cycle
(`dim_customer ↔ fact_orders`) where both directions showed "Upstream".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): close remaining Copilot review gaps
Three findings from PR #27999's third review pass — all about failure
signals being silently dropped between layers:
- **`RdfIndexApp.processTask` ignored relationship failures**: only
`result.failedCount() > 0` was treated as a failure, so partitions
whose Fuseki relationship/lineage writes failed (incrementing
`relationshipFailureCount` but not `failedCount`) never wrote
`jobData.failure`. Switch to `result.hasAnyFailure()` and report the
combined count.
- **`checkAndUpdateJobCompletion` ignored partition `lastError`**: a
partition can finish COMPLETED with `lastError` set when a relationship
bulk write was caught and recorded but didn't bump `failedRecords` or
flip the partition to FAILED. The job would then go to COMPLETED even
though there were real failures. Treat the presence of any
`rdf_index_partition.lastError` as an error signal — promote to
COMPLETED_WITH_ERRORS and aggregate sample errors into the job's
errorMessage if it was blank.
- **`forwardEquivalentPredicate` mapped to a non-existent
`om:DOWNSTREAM` URI**: OpenMetadata only stores lineage with
`om:UPSTREAM` (forward) and `prov:wasDerivedFrom` (reverse PROV-O
pair); there is no `om:DOWNSTREAM` predicate written anywhere — the
downstream view is derived by reading the same UPSTREAM edge from the
other side. Map both `prov:wasDerivedFrom` and `prov:wasInfluencedBy`
to `om:UPSTREAM` (both are reverse-direction causation predicates: in
`B wasDerivedFrom A` / `B wasInfluencedBy A` the source is A and
effect is B, so the canonical forward predicate is the same).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Fix RDF tag mapper
* Fix all the comments
Cherry-picked from #27562 (without bin/ autogenerated noise).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Align RdfPropertyMapper tests with refactor and isolate ontology export IT
RdfPropertyMapperTest still referenced the removed addVotes helper and
expected addStructuredProperty to dispatch votes — both gone after votes
was added to IGNORED_PROPERTIES. Update the assertions accordingly.
GlossaryOntologyExportIT timed out on the full suite because it flips a
global RDF singleton in @BeforeAll and each test blocks a server thread on
synchronous Fuseki writes. SAME_THREAD only serialized methods within the
class — concurrent classes still raced for server threads. Adding @Isolated
matches the pattern already used by RdfResourceIT for the same reason.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): align addCertification typing + relationType after predicate flip
Two findings on PR #27999 from the post-cherry-pick review pass:
- **`addCertification` mis-typed glossary-source certifications and
skipped skos:Concept**: it always emitted `om:Tag` regardless of
source, even though `resolveTagResource` returns a glossaryTerm URI
when the certification points at a glossary term. It also didn't add
`skos:Concept` (or the `createTypeResource("tag")` `skos:Concept` for
classification tags), so SPARQL queries filtering certification
targets by `a skos:Concept` missed them while `addTagLabel`-emitted
tags were findable. Mirror `addTagLabel`: branch on source
(`Glossary` vs `Classification`), emit the right primary type plus
`skos:Concept` (glossary) or `om:Tag` (classification), and include
`om:tagSource`.
- **`relationType` left stale after predicate flip**: when
`parseEntityGraphEdgesFromResults` flipped subject/object for a
reverse-direction predicate and rewrote `canonicalPredicate` to
`om:UPSTREAM`, it kept the original `relationType` derived from the
reverse predicate. So `prov:wasInfluencedBy` produced an EdgeInfo
with `relationType=downstream` + `predicate=om:UPSTREAM` —
internally inconsistent, and the mismatched `edgeKey` prevented
dedup against an existing UPSTREAM edge with the same endpoints.
Re-derive `relationType` from the canonical predicate after the
flip.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): close 2 review findings + add parser-helper unit tests
Two outstanding Copilot findings on PR #27999 plus targeted unit
coverage for the helpers that drive lineage canonicalization.
Findings:
- **`colLineageUri` collision risk** (RdfRepository): the deterministic
key replaced non-alphanumerics in `toColumn` with `_`, so distinct
column names (e.g. `a-b` vs `a_b`) collapsed onto the same URI, which
would lose / overwrite column-lineage resources during reindex.
Append the loop index as a tiebreaker so distinct columns keep
distinct URIs.
- **`createTypeResource` missing dprod prefix** (RdfPropertyMapper):
the `getNamespace` switch didn't recognize `dprod`, so
`RdfUtils.getRdfType("dataProduct")` (returns `dprod:DataProduct`)
produced an invalid `dprod:DataProduct` URI on the wire. Added the
`DPROD_NS = https://ekgf.github.io/dprod/` constant and a `dprod`
case in the switch.
Coverage:
- New `RdfParserHelpersTest` exercises the canonicalization helpers
via reflection: `isReverseDirectionPredicate` (recognizes
PROV-O causation predicates, ignores forward predicates),
`forwardEquivalentPredicate` (both `wasDerivedFrom` and
`wasInfluencedBy` collapse to `om:UPSTREAM` so dedup works),
`relativeRelationLabel` (focal-relative Upstream/Downstream
flipping with all the boundary cases — non-focal edges,
non-lineage relations, null focal).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): merge array contexts before per-field resolution
The third (low-confidence "suppressed") finding on review 4256830399
turned out to be a real duplication: when a field is mapped in one
context map of an array context but absent from another, the previous
processArrayContext ran processContextMappings once per map. The pass
where the field IS mapped emits the proper `om:hasOwner <ref>` triples
(plus `prov:wasAttributedTo`); the pass where the field is absent
falls through to processUnmappedField and emits an additional
`om:owners <ref>` triple. Net: two predicates for the same logical
relationship.
Verified on the live Fuseki: 113 `om:hasOwner` triples vs 112
`om:owners` triples — one set per pass.
Fix: flatten all context maps in the array into a single merged map
once, then iterate entity fields exactly once against that combined
view (later contexts win on key conflicts, matching JSON-LD context
merge semantics). Each field is resolved against the union of
mappings, so the unmapped fallback only fires for fields truly absent
from every context. Net effect: `prov:wasAttributedTo` count is
unchanged, `om:hasOwner` is unchanged, and the redundant `om:owners`
triples disappear.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(rdf): close 2 review findings on coordinator finalization race
Two findings from PR #27999 review 4259628860:
- **`checkAndUpdateJobCompletion` early-returned before lastError check
could promote**: `refreshAggregatedJob` already marks the job COMPLETED
when partitions all finish without `failedRecords`/`failedPartitions`,
so `checkAndUpdateJobCompletion`'s subsequent `if (job.isTerminal())`
short-circuit silently dropped the lastError signal. Move the
partition-lastError check INTO `refreshAggregatedJob` so both code
paths produce consistent terminal status — a partition that finished
COMPLETED but carries a non-null lastError now correctly promotes the
job to COMPLETED_WITH_ERRORS regardless of which finalizer wins the
race.
- **`completePartition` / `failPartition` overwrote CANCELLED state**:
the unconditional partition row update lost a concurrent Stop's
CANCELLED status if a worker finished its batch after the Stop
request landed but before noticing it. Add a status-guarded
`updateIfProcessing` DAO method (UPDATE ... WHERE id = :id AND
status = 'PROCESSING') and have both completion paths use it; if 0
rows update, log and skip the side effects (no server-stat increment,
no refreshAggregatedJob call) so the authoritative CANCELLED status
stays. Mirrors the pattern SearchIndex's coordinator uses for the
same race.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
|
||
|---|---|---|
| .. | ||
| src/main/resources/ui | ||
| LICENSE | ||
| lombok.config | ||
| pom.xml | ||
| UI_PR_REVIEW_GUIDELINES.md | ||