OpenMetadata/bootstrap/sql/migrations/native/2.0.1/postgres/postDataMigrationSQLScript.sql
Sriharsha Chintalapani a91262d284 fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency
RdfIndexApp ran daily and never reconciled removed relationships, so triples
grew unboundedly across runs. When Fuseki crash-looped on the resulting disk
pressure, every entity-write hook blocked synchronously on the unreachable
server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating
the bounded AsyncService pool and pushing login to ~45s.

Storage-side fixes (stop growth):
- Drop the extractRelationshipTriples "preserve forward" path in
  RdfRepository.createOrUpdate; the translator is the source of truth and the
  surrounding orchestration already rewrites the current relationship set.
  This also removes a wasted CONSTRUCT round-trip per entity write.
- bulkStoreRelationships now does per-source-entity DELETE WHERE with a
  predicate-exclusion FILTER for lineage edges, so relationships that no
  longer exist actually leave the store.
- Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's
  initializeJob (the method existed but had no callers).
- Flip recreateIndex default to true and move the cron to Saturday midnight
  ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the
  ontology graph empty before indexing starts.
- Include a 2.0.1 post-data migration that updates existing installed_apps
  rows; the app loader is insert-only on upgrade.

Connectivity / concurrency fixes (isolate API latency from Fuseki health):
- Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail
  on ConnectException / ClosedChannelException / HttpConnectTimeoutException
  instead of retrying. Introduce a 5-failure/30s circuit breaker.
- Route all RdfUpdater mutators through AsyncService.execute with a bounded
  pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a
  dead Fuseki can no longer block request threads or starve the AsyncService
  pool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 07:30:57 -07:00

18 lines
775 B
SQL

-- Post data migration script for Task workflow cutover - OpenMetadata 2.0.1
-- RdfIndexApp: switch to weekly Saturday cron and recreate-on-each-run.
-- Previous defaults (daily, incremental) were producing unbounded triple growth
-- because relationship-removal paths weren't fully reconciled. With per-run
-- CLEAR ALL the dataset always converges to the current MySQL state; weekly
-- cadence keeps the per-run cost from saturating Fuseki.
UPDATE installed_apps
SET json = jsonb_set(
jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true'),
'{appSchedule,cronExpression}',
'"0 0 * * 6"'
)
WHERE name = 'RdfIndexApp';
UPDATE apps_marketplace
SET json = jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true')
WHERE name = 'RdfIndexApp';