OpenMetadata/bootstrap/sql/migrations/native/2.0.1/postgres/postDataMigrationSQLScript.sql

-- Post data migration script for Task workflow cutover - OpenMetadata 2.0.1

-- RdfIndexApp: switch to weekly Saturday cron and full-rebuild every run.
-- Previous defaults (daily, incremental) were producing unbounded triple growth
-- because relationship-removal paths weren't fully reconciled. With per-run
-- CLEAR ALL the dataset always converges to MySQL state; weekly cadence keeps
-- per-run cost from saturating Fuseki.
--
-- Also rewrite `entities` to `["all"]`. Pre-upgrade, an operator could have
-- narrowed RDF indexing to a subset of entity types; the new recreateIndex=true
-- semantics issues a CLEAR ALL before indexing, which would otherwise wipe
-- triples for entity types still in MySQL but missing from the subset list.
-- Forcing the subset list back to `["all"]` ensures the post-CLEAR-ALL run
-- repopulates the graph fully; operators can re-narrow after the migration if
-- they need partial indexing.
UPDATE installed_apps
SET json = jsonb_set(
    jsonb_set(
        jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true'),
        '{appSchedule,cronExpression}',
        '"0 0 * * 6"'
    ),
    '{appConfiguration,entities}',
    '["all"]'::jsonb
)
WHERE name = 'RdfIndexApp';

UPDATE apps_marketplace
SET json = jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true')
WHERE name = 'RdfIndexApp';
Task redesign (#25894) * Task Redesign: Add Task entity & tests * Task Redesign: Add Task entity & tests * Task Redesign: Add Permissions checks for Task APIs * Task UI changed to the new APIs * Migrate UI and APIs to new tasks system inlcuding suggestions * Add Suggestions integration * Activity Feed Refactor * ActivityFeed -> ActivityStream publisher * Activity Feed redesign * Activity Feed redesign, adding tests * Incident Manager update * Migrate Incidents to new tasks * Migrate Incidents to new tasks * Update generated TypeScript types * Update generated TypeScript types * feat(tasks): add domain-aware task cutover and workflow v2 migration * test(tasks): cover domain filters and task feed visibility flows * Address comments * Fix workflow tests to use new Task entity API and fix UserApprovalTaskV2 candidate transformation Migrated 9 WorkflowDefinitionResourceIT tests from legacy Feed/Thread API to the new Task entity API (UserApprovalTaskV2 creates Task entities, not Thread entities). Fixed a bug in UserApprovalTaskV2 where candidates were passed as raw EntityReferences instead of being transformed into users/teams FQN arrays for SetApprovalAssigneesImpl. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix tests * refactor: stabilize task entity workflows * refactor: finish task entity cutover and activity migration * refactor: migrate legacy thread feed during cutover * refactor: split legacy thread rename and archive migrations * Merge main; fix tests * Update generated TypeScript types * feat: advance task redesign through phase 2 * Merge main; fix tests * Update generated TypeScript types * Fix failing tests * Update generated TypeScript types * fininsh phase 6 of the design, configurable task forms * Update generated TypeScript types * Update generated TypeScript types * Fix linting * Address gitar comments * Address gitar comments * Fix build * Address giar comments * fix build * Add task custom forms * Fix tests * Address tests * Apply UI lint autofixes * Fix tess * Fix linter * Fix task patching * Fix tests * Fix playwright tests * fix java checkstyle * Add python sdk support for tasks, annoucements * Fix playwright tests * Fix playwright tests * Fix playwright tests * Fix python tests * Fix python tests * Fix linting workflows * fix pycheck * fix pycheck * Fix tests * Fix build * Address deviations from main and fix tests * Fix integration tests * Fix integration tests * Fix integration tests * Update generated TypeScript types * Fix Playwright tests * Fix Playwright tests * feat(incident): wire incident manager to task-first architecture (#27369) * feat(incident): wire incident manager to task-first architecture Connect the incident manager to the task redesign so it works end-to-end: resolve data persistence, backward transitions, reopen from resolved, and incident discovery via TCRS. * Update generated TypeScript types * refactor: single-query incident task lookup with parameterized statuses Replace two sequential queries (Open, InProgress) in getOrCreateIncident with one findByAboutAndTypeAndStatuses query using @BindList for status IN (...). --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Fix Playwright tests * Update generated TypeScript types * Fix linter * Fix tests * Fix tests * Fix checkstyle * Fix tests * Fix checkstyle * Update FeedResourceIT.java * Update TableRepository.java * fix tests * Update ActivityFeedProvider.tsx * fix tests * fix tests * Address Task comments * Fix unit test * Fix the feed summary panel showing on landing page * Fix comment functionality * Fix pytests * Fix failing playwright tests * Fix test flakiness * Fix ui-checkstyle * Fix advanced search spec failure * Fix playwright tests Co-authored-by: Copilot <copilot@github.com> * Fix checkstyle * Fix the flaky tests Co-authored-by: Copilot <copilot@github.com> * fix checkstyle * Reduce the workflow polling * Update generated TypeScript types * skip failing tests Co-authored-by: Copilot <copilot@github.com> * Fix ui-checkstyle --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com> Co-authored-by: IceS2 <pablo.takara@getcollate.io> Co-authored-by: karanh37 <karanh37@gmail.com> Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com> Co-authored-by: Copilot <copilot@github.com> 2026-04-23 13:52:30 +00:00			`-- Post data migration script for Task workflow cutover - OpenMetadata 2.0.1`
fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency RdfIndexApp ran daily and never reconciled removed relationships, so triples grew unboundedly across runs. When Fuseki crash-looped on the resulting disk pressure, every entity-write hook blocked synchronously on the unreachable server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating the bounded AsyncService pool and pushing login to ~45s. Storage-side fixes (stop growth): - Drop the extractRelationshipTriples "preserve forward" path in RdfRepository.createOrUpdate; the translator is the source of truth and the surrounding orchestration already rewrites the current relationship set. This also removes a wasted CONSTRUCT round-trip per entity write. - bulkStoreRelationships now does per-source-entity DELETE WHERE with a predicate-exclusion FILTER for lineage edges, so relationships that no longer exist actually leave the store. - Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's initializeJob (the method existed but had no callers). - Flip recreateIndex default to true and move the cron to Saturday midnight ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the ontology graph empty before indexing starts. - Include a 2.0.1 post-data migration that updates existing installed_apps rows; the app loader is insert-only on upgrade. Connectivity / concurrency fixes (isolate API latency from Fuseki health): - Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail on ConnectException / ClosedChannelException / HttpConnectTimeoutException instead of retrying. Introduce a 5-failure/30s circuit breaker. - Route all RdfUpdater mutators through AsyncService.execute with a bounded pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a dead Fuseki can no longer block request threads or starve the AsyncService pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-14 14:30:57 +00:00
fix(rdf): schema default + migration force entities=[all] for safe full reindex - rdfIndexingAppConfig.json: flip recreateIndex.default from false to true so any UI form / config generation path that surfaces the schema default agrees with the install JSON files and the new full-rebuild semantics. - 2.0.1 migration (MySQL + Postgres): in addition to flipping recreateIndex=true and the weekly Saturday cron, also rewrite appConfiguration.entities to ["all"]. Pre-upgrade an operator could have narrowed RDF indexing to a subset of entity types; the new recreateIndex=true semantics issues CLEAR ALL before indexing, which would otherwise wipe triples for excluded entity types and leave the graph permanently missing them. Forcing entities back to ["all"] ensures the post-CLEAR-ALL run repopulates the graph fully. Operators can re-narrow after the migration if they need partial indexing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-15 16:49:48 +00:00			`-- RdfIndexApp: switch to weekly Saturday cron and full-rebuild every run.`
fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency RdfIndexApp ran daily and never reconciled removed relationships, so triples grew unboundedly across runs. When Fuseki crash-looped on the resulting disk pressure, every entity-write hook blocked synchronously on the unreachable server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating the bounded AsyncService pool and pushing login to ~45s. Storage-side fixes (stop growth): - Drop the extractRelationshipTriples "preserve forward" path in RdfRepository.createOrUpdate; the translator is the source of truth and the surrounding orchestration already rewrites the current relationship set. This also removes a wasted CONSTRUCT round-trip per entity write. - bulkStoreRelationships now does per-source-entity DELETE WHERE with a predicate-exclusion FILTER for lineage edges, so relationships that no longer exist actually leave the store. - Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's initializeJob (the method existed but had no callers). - Flip recreateIndex default to true and move the cron to Saturday midnight ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the ontology graph empty before indexing starts. - Include a 2.0.1 post-data migration that updates existing installed_apps rows; the app loader is insert-only on upgrade. Connectivity / concurrency fixes (isolate API latency from Fuseki health): - Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail on ConnectException / ClosedChannelException / HttpConnectTimeoutException instead of retrying. Introduce a 5-failure/30s circuit breaker. - Route all RdfUpdater mutators through AsyncService.execute with a bounded pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a dead Fuseki can no longer block request threads or starve the AsyncService pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-14 14:30:57 +00:00			`-- Previous defaults (daily, incremental) were producing unbounded triple growth`
			`-- because relationship-removal paths weren't fully reconciled. With per-run`
fix(rdf): schema default + migration force entities=[all] for safe full reindex - rdfIndexingAppConfig.json: flip recreateIndex.default from false to true so any UI form / config generation path that surfaces the schema default agrees with the install JSON files and the new full-rebuild semantics. - 2.0.1 migration (MySQL + Postgres): in addition to flipping recreateIndex=true and the weekly Saturday cron, also rewrite appConfiguration.entities to ["all"]. Pre-upgrade an operator could have narrowed RDF indexing to a subset of entity types; the new recreateIndex=true semantics issues CLEAR ALL before indexing, which would otherwise wipe triples for excluded entity types and leave the graph permanently missing them. Forcing entities back to ["all"] ensures the post-CLEAR-ALL run repopulates the graph fully. Operators can re-narrow after the migration if they need partial indexing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-15 16:49:48 +00:00			`-- CLEAR ALL the dataset always converges to MySQL state; weekly cadence keeps`
			`-- per-run cost from saturating Fuseki.`
			`--`
			-- Also rewrite `entities` to `["all"]`. Pre-upgrade, an operator could have
			`-- narrowed RDF indexing to a subset of entity types; the new recreateIndex=true`
			`-- semantics issues a CLEAR ALL before indexing, which would otherwise wipe`
			`-- triples for entity types still in MySQL but missing from the subset list.`
			-- Forcing the subset list back to `["all"]` ensures the post-CLEAR-ALL run
			`-- repopulates the graph fully; operators can re-narrow after the migration if`
			`-- they need partial indexing.`
fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency RdfIndexApp ran daily and never reconciled removed relationships, so triples grew unboundedly across runs. When Fuseki crash-looped on the resulting disk pressure, every entity-write hook blocked synchronously on the unreachable server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating the bounded AsyncService pool and pushing login to ~45s. Storage-side fixes (stop growth): - Drop the extractRelationshipTriples "preserve forward" path in RdfRepository.createOrUpdate; the translator is the source of truth and the surrounding orchestration already rewrites the current relationship set. This also removes a wasted CONSTRUCT round-trip per entity write. - bulkStoreRelationships now does per-source-entity DELETE WHERE with a predicate-exclusion FILTER for lineage edges, so relationships that no longer exist actually leave the store. - Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's initializeJob (the method existed but had no callers). - Flip recreateIndex default to true and move the cron to Saturday midnight ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the ontology graph empty before indexing starts. - Include a 2.0.1 post-data migration that updates existing installed_apps rows; the app loader is insert-only on upgrade. Connectivity / concurrency fixes (isolate API latency from Fuseki health): - Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail on ConnectException / ClosedChannelException / HttpConnectTimeoutException instead of retrying. Introduce a 5-failure/30s circuit breaker. - Route all RdfUpdater mutators through AsyncService.execute with a bounded pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a dead Fuseki can no longer block request threads or starve the AsyncService pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-14 14:30:57 +00:00			`UPDATE installed_apps`
			`SET json = jsonb_set(`
fix(rdf): schema default + migration force entities=[all] for safe full reindex - rdfIndexingAppConfig.json: flip recreateIndex.default from false to true so any UI form / config generation path that surfaces the schema default agrees with the install JSON files and the new full-rebuild semantics. - 2.0.1 migration (MySQL + Postgres): in addition to flipping recreateIndex=true and the weekly Saturday cron, also rewrite appConfiguration.entities to ["all"]. Pre-upgrade an operator could have narrowed RDF indexing to a subset of entity types; the new recreateIndex=true semantics issues CLEAR ALL before indexing, which would otherwise wipe triples for excluded entity types and leave the graph permanently missing them. Forcing entities back to ["all"] ensures the post-CLEAR-ALL run repopulates the graph fully. Operators can re-narrow after the migration if they need partial indexing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-15 16:49:48 +00:00			`jsonb_set(`
			`jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true'),`
			`'{appSchedule,cronExpression}',`
			`'"0 0 * * 6"'`
			`),`
			`'{appConfiguration,entities}',`
			`'["all"]'::jsonb`
fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency RdfIndexApp ran daily and never reconciled removed relationships, so triples grew unboundedly across runs. When Fuseki crash-looped on the resulting disk pressure, every entity-write hook blocked synchronously on the unreachable server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating the bounded AsyncService pool and pushing login to ~45s. Storage-side fixes (stop growth): - Drop the extractRelationshipTriples "preserve forward" path in RdfRepository.createOrUpdate; the translator is the source of truth and the surrounding orchestration already rewrites the current relationship set. This also removes a wasted CONSTRUCT round-trip per entity write. - bulkStoreRelationships now does per-source-entity DELETE WHERE with a predicate-exclusion FILTER for lineage edges, so relationships that no longer exist actually leave the store. - Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's initializeJob (the method existed but had no callers). - Flip recreateIndex default to true and move the cron to Saturday midnight ("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the ontology graph empty before indexing starts. - Include a 2.0.1 post-data migration that updates existing installed_apps rows; the app loader is insert-only on upgrade. Connectivity / concurrency fixes (isolate API latency from Fuseki health): - Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail on ConnectException / ClosedChannelException / HttpConnectTimeoutException instead of retrying. Introduce a 5-failure/30s circuit breaker. - Route all RdfUpdater mutators through AsyncService.execute with a bounded pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a dead Fuseki can no longer block request threads or starve the AsyncService pool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-14 14:30:57 +00:00			`)`
			`WHERE name = 'RdfIndexApp';`

			`UPDATE apps_marketplace`
			`SET json = jsonb_set(json::jsonb, '{appConfiguration,recreateIndex}', 'true')`
			`WHERE name = 'RdfIndexApp';`