OpenMetadata/bootstrap/sql/migrations/native
sonika-shah 0a07caefc2
fix(migration): preempt PDTS duplicates and recover invalid index in 1.12.9 (#28238)
* fix(migration): preempt PDTS duplicates and recover invalid index in 1.12.9

CREATE UNIQUE INDEX CONCURRENTLY aborts when it hits existing duplicate
keys but leaves an invalid index behind. On migration retry, IF NOT EXISTS
no-ops successfully and gets checksum-logged, after which ADD CONSTRAINT
USING INDEX fails permanently with "index ... is not valid". Hit at a
customer with two duplicate table.systemProfile rows on a 10M-row PDTS.

Adds two idempotent statements before the existing constraint build:

- DO block: drops the invalid index and clears its migration-log entry
  when indisvalid=false. No-op on fresh DBs and on already-migrated
  environments (where the index is valid and owned by the constraint).

- DELETE: collapses duplicate rows via single hash aggregate on the
  4-column key + targeted self-join. Reads only key columns (no json
  scan), only touches rows in actual duplicate groups, no-op on clean DBs.
  Efficient on multi-million-row PDTS tables.

Existing CREATE INDEX / ALTER TABLE / ANALYZE statements byte-identical
so checksum-matched skips for already-migrated environments still apply.

* fix(migration): self-heal in one pass + schema-scoped invalid-index probe

Addresses Copilot review on #28238:

1. One-pass self-heal. MigrationFile.parseSQLFiles filters already-logged
   statements at parse time (MigrationFile.java:83). Clearing the CREATE
   log entry from inside a DO block doesn't bring CREATE back into the
   current pass's execution list — it would only re-run on the next
   migration cycle, leaving the same-pass ALTER to fail again. Replace
   the "DROP + clear log" pattern with "DROP + rebuild inline" so a
   valid index exists before ALTER runs in the same pass.

   Inline rebuild uses non-concurrent CREATE UNIQUE INDEX, which takes a
   brief ACCESS EXCLUSIVE lock on the table. Acceptable because this
   path fires only when the environment is already in a degraded state.
   Normal-path customers go through the CONCURRENTLY build below.

2. Schema-scoped invalid-index probe. pg_class.relname is not
   schema-unique. Anchor the lookup via
   i.indrelid = 'profiler_data_time_series'::regclass and DROP by index
   OID (invalid_idx::regclass), so an invalid index with the same name
   in another schema cannot accidentally trigger this branch.

Existing CREATE INDEX / ALTER TABLE / ANALYZE statements byte-identical
to before this PR, so checksum-matched skips still apply for
already-migrated environments. Test gap (Copilot's third comment) for
the recovery scenario tracked as follow-up — existing migration tests
in MigrationWorkflowReprocessingTest are mock-based; verifying recovery
end-to-end needs Postgres integration infrastructure.

* chore(migration): trim verbose comments in 1.12.9/postgres/schemaChanges.sql

Statement bodies unchanged — checksums identical. Detailed mechanism
write-ups live in the commit log / PR description; the file keeps just
the load-bearing intent comment above each statement.

* fix(migration): scope PDTS dedup to operation IS NOT NULL

Addresses Copilot review on PR #28238 discussion r3264066840.

Postgres UNIQUE treats NULLs as DISTINCT by default, so the constraint
on (entityFQNHash, extension, operation, timestamp) permits multiple
rows where operation IS NULL — i.e. table.tableProfile and
table.columnProfile rows that share the other key columns.

The previous dedup used GROUP BY which treats NULLs as equal, so it
would have collapsed retry-induced tableProfile / columnProfile pairs
that the restored constraint never actually blocked. Restricting the
subquery to operation IS NOT NULL (plus a defensive entityFQNHash IS
NOT NULL) aligns dedup with constraint semantics.

DMG's customer rows were all table.systemProfile (operation = INSERT),
so this still removes the customer dupes correctly. tableProfile /
columnProfile retry duplicates — if they exist — stay as-is, which is
the same outcome the unique constraint would produce on its own.

* perf(migration): boost work_mem / maintenance_work_mem for PDTS dedup at scale

Mirrors the tuning pattern from 1.9.9/postgres/postDataMigrationSQLScript.sql
(same table, same operation class). On 50M-row PDTS the dedup DELETE's hash
aggregate spills to disk with default work_mem=4MB, adding ~30-60s of disk
I/O. Bumping work_mem to 256MB keeps the aggregate in memory;
maintenance_work_mem=512MB lets CREATE UNIQUE INDEX CONCURRENTLY sort in
memory too.

Session-level (not SET LOCAL) because schemaChanges runs in autocommit
(CREATE INDEX CONCURRENTLY requires it) — SET LOCAL would reset between
statements. RESET at the end of the file restores defaults before the
connection returns to the Hikari pool.

Expected runtime impact at customer scale:
  20M rows:  ~30s tuned vs ~40s default
  50M rows:  ~40s tuned vs ~90s default (avoids spill)

* chore(migration): trim comments on PDTS dedup additions

* chore(migration): drop 1.9.9 reference from mem comment
2026-05-20 18:27:33 +05:30
..
1.1.0
1.1.1
1.1.2 Issue 8930 - Update profiler timestamp from seconds to milliseconds (#12948) 2023-08-25 08:47:16 +02:00
1.1.5 only add collation to hash columns (#13201) 2023-09-15 12:49:11 +05:30
1.1.6 Add 1.1.6 migrations dir (#13305) 2023-09-22 09:45:00 +02:00
1.1.7 Prep v1.1.7 migrations to address test cases & suites (#13345) 2023-09-27 11:49:21 +02:00
1.2.0 Migration Fixes (#16131) 2024-05-07 22:07:25 +05:30
1.2.1 fix: comment in sql migration (#13979) 2023-11-15 10:32:11 +01:00
1.2.3 Minor: Fix migration location for unity catalog (#14339) 2024-01-03 18:26:11 +05:30
1.2.4 Fix #13982: Fix userFQN encoding while creating mentions (#14496) 2023-12-25 17:28:13 -08:00
1.3.0 Migration Fixes (#16131) 2024-05-07 22:07:25 +05:30
1.3.1 fix: move migration to 1.3.1 (#15463) 2024-03-05 15:30:43 +01:00
1.3.2 Remove SQls from 1.3.2 (#15917) 2024-04-16 18:51:03 +05:30
1.3.3 Move migration for apps to 1.3.3 all together (#15944) 2024-04-18 14:26:05 +05:30
1.4.0 ISSUE #2681 - Add Missing test parameters in PSQL (#25323) 2026-01-16 12:09:15 +01:00
1.4.2 Fix Test Suite Filter (#16615) 2024-06-12 10:40:05 +05:30
1.4.4 Fix #16788: Optimize feed query performance issues introduced in 1.4.2 (#16862) 2024-07-01 19:58:47 -07:00
1.4.5 MINOR - Clean automations_workflow in 1.4.5 (#17006) 2024-07-12 13:54:46 +02:00
1.4.6 Move Migration to 1.4.6 (#17095) 2024-07-19 12:16:53 +05:30
1.4.7 Migrate NameHash (#17317) 2024-08-06 18:41:37 +05:30
1.5.0 Improve count/feed api performance for 1.5 (#17576) 2024-08-23 11:20:34 -07:00
1.5.6 [Search] Indexing Fixes (#18048) 2024-09-30 23:39:27 +05:30
1.5.7 migration: fix duplicate param key insertion (#20802) 2025-04-15 14:10:51 +02:00
1.5.9 MINOR - Move appName migration to 1.5.9 (#18435) 2024-10-28 16:29:56 +01:00
1.5.11 Fix Search Index Contention (#18605) 2024-11-12 20:36:23 +05:30
1.5.15 Domain Policy Update to be non-system (#19060) 2024-12-15 01:18:12 +05:30
1.6.0 Feat# Implementation of Custom Workflows (#23023) 2025-10-08 18:57:44 +05:30
1.6.2 Improvement #19065 : Support removing existing enumKeys (for enum type custom property) (#19054) 2025-01-07 19:25:59 -08:00
1.6.3 Cleanup App data (#19571) 2025-01-28 19:22:33 +05:30
1.6.7 MINOR: chore: added missing timestamp indexes for time series tables (#20373) 2025-03-24 07:43:07 +01:00
1.7.0 Add cleanup apps_extension_time_series (#20857) 2025-04-16 14:54:11 +05:30
1.7.1 Escape ? to causing issues in jdbi binding (#21381) 2025-05-23 17:13:45 +05:30
1.7.2 FIX - Automation Workflows should not be updated by the SM & cleanup migration (#21435) 2025-06-03 12:17:14 +02:00
1.7.4 Disabled bot creating activity feeds (#21773) 2025-06-14 19:21:00 +05:30
1.8.0 Add Data Contracts Specification and APIs (#21164) 2025-06-04 06:36:28 +02:00
1.8.1 Fix #20621: User Status Tracking in the System (#21911) 2025-07-02 14:37:36 -07:00
1.8.2 Fix #20145: Implemented Prefix For Dashboard Service (#21585) 2025-07-08 18:54:35 +02:00
1.8.4 MINOR - Add columns.description in search settings (#22299) 2025-07-15 09:21:57 +02:00
1.8.5 Added missing migration sql files [1.8.5 and 1.10.2] (#24399) 2025-11-18 08:02:35 +01:00
1.8.7 Feature: Security Service (#22450) 2025-07-31 06:38:21 +02:00
1.8.8 Feature: Security Service (#22450) 2025-07-31 06:38:21 +02:00
1.8.9 Feature: Security Service (#22450) 2025-07-31 06:38:21 +02:00
1.9.0 MINOR - Add Tests & fix migrations (#22714) 2025-08-03 15:19:54 +02:00
1.9.2 Add missing domain migrations for entity version history (#23032) 2025-08-21 14:33:37 +05:30
1.9.5 MINOR - Move migrations to 1.9.5 (#23095) 2025-08-28 09:23:21 +02:00
1.9.6 ISSUE #1534 - Profiler Refactor for Metadata Extraction Application (#23200) 2025-09-05 13:07:04 +02:00
1.9.9 Minor fix broken 1.9.8 migrations (#23487) 2025-09-22 13:13:25 +00:00
1.9.10 Fixes #23356: Databricks & UnityCatalog OAuth and Azure AD Auth (#23561) 2025-10-03 19:53:19 +05:30
1.9.11 add entityType.keyword aggregation in searchSettings.json (#23559) 2025-09-25 17:04:49 +05:30
1.10.0 Move migrations to 1.11.x (#24074) 2025-10-30 01:02:45 +05:30
1.10.2 Added missing migration sql files [1.8.5 and 1.10.2] (#24399) 2025-11-18 08:02:35 +01:00
1.10.3 MINOR: dbt migration fix (#23980) 2025-10-23 12:54:34 +02:00
1.10.4 chore: move dbt migration to 1.11 (#24076) 2025-11-03 08:46:47 +01:00
1.10.5 TRUNCATE Flowable History Tables in both 1.10.5 and 1.10.7 Migration (#24323) 2025-11-13 21:05:31 +00:00
1.10.6 Fixes #24132: Airbyte Cloud Support (#24261) 2025-11-11 16:24:09 +05:30
1.10.7 TRUNCATE Flowable History Tables in both 1.10.5 and 1.10.7 Migration (#24323) 2025-11-13 21:05:31 +00:00
1.10.8 Fix email configuration templates default value from 'collate' to 'openmetadata' (#24352) 2025-11-17 08:39:41 +01:00
1.11.0 Moved AI Application and LLM Model entities migrations to 1.12.0 (#25659) 2026-02-02 08:50:37 +01:00
1.11.1 chore: realign main migration with 1.11.1 branch (#24938) 2025-12-22 09:03:28 +01:00
1.11.2 Fix #24578: Datamodels not visible if . in service name (#24779) 2025-12-27 10:00:26 -08:00
1.11.4 Fix search percentile rank scoring (#24859) 2025-12-23 18:06:27 +00:00
1.11.5 Tagging explanation (#24817) 2026-01-08 17:02:40 +01:00
1.11.6 Fix: remove overrideLineage config from database service metadata pipeline (#25379) 2026-01-20 09:08:26 +05:30
1.11.8 Fixes #24546: Add sobjectNames field for multi-object selection in Salesforce connector (#24547) 2026-02-02 16:05:59 +01:00
1.11.9 Add bulk apis for pipeline status (#25731) 2026-02-10 18:14:06 +05:30
1.11.11 Fix-20713: Add support for metadata ingestion using local file in REST connector (#26036) 2026-02-23 21:50:26 +05:30
1.11.12 Fix #26178: Add support for IAM auth for redshift (#26179) 2026-03-02 21:57:28 +05:30
1.12.0 fix(lineage): service nodes appearing in entity lineage view and empty By Service view (#27258) 2026-04-17 00:55:16 -07:00
1.12.1 Continuous indexing to handle failures (#26111) 2026-03-18 16:23:04 +05:30
1.12.2 Fixes #26225: Add index and FORCE INDEX for listLastTestCaseResultsForTestSuite (MySQL) (#26235) 2026-03-06 07:55:41 -08:00
1.12.4 Move Migration to 1.12.4 from 1.12.3 (#26629) 2026-03-20 09:41:15 +00:00
1.12.5 Update indexing schedule (#27204) 2026-04-10 19:15:08 +05:30
1.12.6 fix(lineage): service nodes appearing in entity lineage view and empty By Service view (#27258) 2026-04-17 00:55:16 -07:00
1.12.7 Fixes #27158: ingestion slowdown from tag_usage seq-scan on Postgres (#27745) 2026-05-05 10:30:11 +05:30
1.12.8 Add migrations to ensure PII are really enabled (#27921) 2026-05-08 15:39:29 +00:00
1.12.9 fix(migration): preempt PDTS duplicates and recover invalid index in 1.12.9 (#28238) 2026-05-20 18:27:33 +05:30
1.13.0 fix(glossary): preserve all relation types between same term pair (#28172) 2026-05-16 11:37:21 -07:00
2.0.0 feat: add ContextMemory entity (Context Center memories) (#28224) 2026-05-19 18:10:46 +02:00
2.0.1 fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency (#28117) 2026-05-15 17:36:06 -07:00