mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
RdfIndexApp ran daily and never reconciled removed relationships, so triples
grew unboundedly across runs. When Fuseki crash-looped on the resulting disk
pressure, every entity-write hook blocked synchronously on the unreachable
server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating
the bounded AsyncService pool and pushing login to ~45s.
Storage-side fixes (stop growth):
- Drop the extractRelationshipTriples "preserve forward" path in
RdfRepository.createOrUpdate; the translator is the source of truth and the
surrounding orchestration already rewrites the current relationship set.
This also removes a wasted CONSTRUCT round-trip per entity write.
- bulkStoreRelationships now does per-source-entity DELETE WHERE with a
predicate-exclusion FILTER for lineage edges, so relationships that no
longer exist actually leave the store.
- Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's
initializeJob (the method existed but had no callers).
- Flip recreateIndex default to true and move the cron to Saturday midnight
("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the
ontology graph empty before indexing starts.
- Include a 2.0.1 post-data migration that updates existing installed_apps
rows; the app loader is insert-only on upgrade.
Connectivity / concurrency fixes (isolate API latency from Fuseki health):
- Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail
on ConnectException / ClosedChannelException / HttpConnectTimeoutException
instead of retrying. Introduce a 5-failure/30s circuit breaker.
- Route all RdfUpdater mutators through AsyncService.execute with a bounded
pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a
dead Fuseki can no longer block request threads or starve the AsyncService
pool.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 lines
775 B
SQL
18 lines
775 B
SQL
-- Post data migration script for Task workflow cutover - OpenMetadata 2.0.1
|
|
|
|
-- RdfIndexApp: switch to weekly Saturday cron and recreate-on-each-run.
|
|
-- Previous defaults (daily, incremental) were producing unbounded triple growth
|
|
-- because relationship-removal paths weren't fully reconciled. With per-run
|
|
-- CLEAR ALL the dataset always converges to the current MySQL state; weekly
|
|
-- cadence keeps the per-run cost from saturating Fuseki.
|
|
UPDATE installed_apps
|
|
SET json = jsonb_set(
|
|
jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true'),
|
|
'{appSchedule,cronExpression}',
|
|
'"0 0 * * 6"'
|
|
)
|
|
WHERE name = 'RdfIndexApp';
|
|
|
|
UPDATE apps_marketplace
|
|
SET json = jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true')
|
|
WHERE name = 'RdfIndexApp';
|