OpenMetadata/bootstrap/sql/migrations/native/2.0.1/mysql/postDataMigrationSQLScript.sql
Sriharsha Chintalapani a91262d284 fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency
RdfIndexApp ran daily and never reconciled removed relationships, so triples
grew unboundedly across runs. When Fuseki crash-looped on the resulting disk
pressure, every entity-write hook blocked synchronously on the unreachable
server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating
the bounded AsyncService pool and pushing login to ~45s.

Storage-side fixes (stop growth):
- Drop the extractRelationshipTriples "preserve forward" path in
  RdfRepository.createOrUpdate; the translator is the source of truth and the
  surrounding orchestration already rewrites the current relationship set.
  This also removes a wasted CONSTRUCT round-trip per entity write.
- bulkStoreRelationships now does per-source-entity DELETE WHERE with a
  predicate-exclusion FILTER for lineage edges, so relationships that no
  longer exist actually leave the store.
- Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's
  initializeJob (the method existed but had no callers).
- Flip recreateIndex default to true and move the cron to Saturday midnight
  ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the
  ontology graph empty before indexing starts.
- Include a 2.0.1 post-data migration that updates existing installed_apps
  rows; the app loader is insert-only on upgrade.

Connectivity / concurrency fixes (isolate API latency from Fuseki health):
- Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail
  on ConnectException / ClosedChannelException / HttpConnectTimeoutException
  instead of retrying. Introduce a 5-failure/30s circuit breaker.
- Route all RdfUpdater mutators through AsyncService.execute with a bounded
  pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a
  dead Fuseki can no longer block request threads or starve the AsyncService
  pool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 07:30:57 -07:00

18 lines
774 B
SQL

-- Post data migration script for Task workflow cutover - OpenMetadata 2.0.1
-- RdfIndexApp: switch to weekly Saturday cron and recreate-on-each-run.
-- Previous defaults (daily, incremental) were producing unbounded triple growth
-- because relationship-removal paths weren't fully reconciled. With per-run
-- CLEAR ALL the dataset always converges to the current MySQL state; weekly
-- cadence keeps the per-run cost from saturating Fuseki.
UPDATE installed_apps
SET json = JSON_SET(
json,
'$.appConfiguration.recreateIndex', CAST('true' AS JSON),
'$.appSchedule.cronExpression', '0 0 * * 6'
)
WHERE name = 'RdfIndexApp';
UPDATE apps_marketplace
SET json = JSON_SET(json, '$.appConfiguration.recreateIndex', CAST('true' AS JSON))
WHERE name = 'RdfIndexApp';