Commit graph

26 commits

Author SHA1 Message Date
Sriharsha Chintalapani
4bb6574815
fix(glossary): preserve all relation types between same term pair (#28172)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + Elasticsearch + Redis / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + Elasticsearch + Redis / integration-tests-postgres-elasticsearch-redis (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* fix(glossary): preserve all relation types between same term pair

The entity_relationship primary key (fromId, toId, relation) caused the
second INSERT for the same (term, term) pair to UPSERT and overwrite the
json discriminator, silently dropping any previously stored relationType.
Adding the same target term with "synonym" then "seeAlso" left only one
relation visible on GET.

Extend the PK to (fromId, toId, relation, relationType) so each typed
relation lives in its own row. The new column defaults to '' for every
non-glossary edge, leaving existing call sites and queries semantically
unchanged. CollectionDAO.deleteWithRelationType and countByRelationType
now filter on the column directly instead of JSON_EXTRACT. The 1.13.0
schemaChanges migration backfills relationType from the existing json
for glossaryTerm RELATED_TO rows, then atomically swaps the PK.

Adds three integration tests in GlossaryTermResourceIT covering the
add path, the targeted-removal path, and tag-usage cleanup when a
table tagged with a glossary term is hard-deleted.
2026-05-16 11:37:21 -07:00
Eugenio
483461a003
Add migrations to ensure PII are really enabled (#27921)
This is especially needed for instances that had already upgraded to 1.12.0 onwards, those instaces skipped the migration cherry-picked in 1.12.6
2026-05-08 15:39:29 +00:00
Teddy
219c5683fa
ISSUE #3032 (#27912)
* feat: move flat sampling to sampling config + dynamic sampling option

* feat: move flat sampling on the backend to sample profile conifg object

* feat: fix circular import

* feat: align UI with new profiler config

* feat: fix json schema

* feat: align python imports with new schema path

* feat: update migration to look at extension

* feat: remove enable

* feat: remove enable

* feat: added titles to sample config

* feat: generated ts classes

* feat: addressed comments

* feat: change sample config instantiation to match new structure

* feat: removed backward compatible fields

* feat: ran java linting

* feat: updated imports to point to generated files

* feat: added dynamic sampler resolution logic

* feat: ran python linting

* feat: remove duplicate migration

* chore: merge upstream and clean conflicts

* feat: update logic to support dynamic and static sampling

* feat: adjust sample config call

* feat: test for statis, dynamic, row count and tier methods

* feat: more sample config unit tests

* feat: added tests for metric and sampling

* feat: added tests to validate fallback is not called i nmetric computers

* feat: strengthen profiler validation tests

* feat: fix sampling config

* feat: fix sampling config

* feat: fix sampling config

* feat: generated typescript models

* feat: fixed missing dq pipeline migration

* feat: fixed static check

* feat: fixed ci failures

* feat: fixed ci failures

* feat: fixed unit tests faioure and linting

* feat: fixed integration tests failures

* chore: fixe burstiq refactor

* chore: fix trino ci failures

* chore: revert baseline.json file

* chore: fix sampler availabl burst iq changes

* feat: added smart sampling radio button

* feat: ignore static checks errors

* feat: ran ts linting

* feat burstiq infinite recursion issue with dynamic as default

* feat: translate i8n keys

* feat: fix failing tests
2026-05-07 09:01:18 -07:00
Vishnu Jain
d02051a941
Fix/mcp oauth databricks (#27922)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
* Fix DCR rejection of client_secret_basic auth method

* fix(mcp): tolerate duplicate OAuth client creds and widen mcp_state column

* test(mcp): cover client_secret-only matching duplicate case

* gitarbot fix
2026-05-06 07:30:42 +02:00
Mayur Singal
60a2e6546e
Migrate Databricks from sqlalchemy-databricks to databricks-sqlalchemy (#26896)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Update Databricks Dependency to databricks-sqlalchemy

* Update generated TypeScript types

* address comments and pyformat

* pyformat

* fix log filtering

* address comments

* fix static unit tests

* fix rule for static type

* pyformat

* update baseline

* revert basepyright changes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
2026-05-04 18:53:24 +05:30
Sriharsha Chintalapani
8cec97b52c
Containers: FQN-driven hierarchy listings + cascade-delete orphan fix (#27878)
* Containers: FQN-driven hierarchy listings + cascade-delete orphan fix

Stops `?root=true&service=...` and `/containers/.../children` from leaking
deeply-nested orphans, fixes the source bug that produced them, and corrects
the 1.13.0 fqnHash pattern index opclass.

Listing path
- ListFilter.getFqnPrefixCondition now binds both <param>Hash and
  <param>HashChild ('<hash>.%' and '<hash>.%.%') so depth-aware listings
  can require "exactly one segment below the prefix" via a single LIKE +
  NOT LIKE pair on fqnHash. Same shape works at any tree depth.
- ContainerDAO.listRoot{Before,After,Count} swap the NOT EXISTS anti-join
  on entity_relationship for fqnHash NOT LIKE :serviceHashChild. The FQN
  is the canonical hierarchy in OpenMetadata; the relationship table is
  no longer consulted for hierarchical listings.
- ContainerRepository.listChildren rewritten: no parent-by-name lookup, no
  findToWithOffset/countFindTo on entity_relationship, no second-hop
  hydration. Single SQL roundtrip + slim projection via
  listDirectChildSummariesByParentHash. Orphans whose parent CONTAINS row
  is missing are now correctly placed under their FQN-implied parent.
- Both endpoints honour ?include=non-deleted|all|deleted; ChildrenPageCache
  key includes the include tag so toggling the UI Deleted switch doesn't
  return a stale page from the other side.
- ContainerResource.listChildren accepts ?include= for parity with the
  root listing.

Cascade-delete orphan source (EntityRepository.processDeletionBatch)
- Removed the redundant pre-batch-delete of relationships and the
  swallow-all try/catch in the per-child loop. cleanup() per entity now
  owns row removal AND relationship deletion atomically; exceptions
  propagate so the loop stops on first failure with per-child atomicity.
  Stops the orphan-without-relationships pattern that the listing change
  defends against.

Migration correction (1.13.0 postgres fqnHash pattern indexes)
- Recreate 23 idx_*_fqnhash_pattern indexes with text_pattern_ops instead
  of varchar_pattern_ops. The planner casts the column to text when the
  LIKE RHS is text-typed (every JDBC setString call), so
  varchar_pattern_ops doesn't match the resulting (fqnhash)::text ~~
  expression. Confirmed via EXPLAIN ANALYZE on a 580k-row table: the same
  query drops from ~470ms cold (Parallel Seq Scan) to <1ms (Index Scan).

Tests
- ListFilterTest: 3 unit tests covering both binds, dotted/quoted service
  name special-char handling, and include= flowing through alongside the
  service prefix.
- ContainerResourceIT: 8 integration tests covering depth correctness at
  every level (5-level chain), orphan exclusion at root, orphan
  discoverability under FQN-implied parent, sibling subtree isolation,
  the include toggle on both endpoints, and large-batch hard-delete
  leaving no orphan rows or relationships.

Closes #27870 (subset of its listing-side intent shipped here as a single
FQN-depth predicate; PR's cascade fix and both new tests picked up
verbatim).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address review comments on #27878

- ContainerDAO.listRoot* override now defaults :serviceHashChild to '%.%.%'
  via rootListingParams() when ?service= is absent. Previous code
  unconditionally referenced the bind, so ?root=true without a service
  filter crashed at runtime with a missing-named-parameter error.
- Migration 1.13.0/postgres/schemaChanges.sql now DROP INDEX CONCURRENTLY
  IF EXISTS before each CREATE so already-upgraded environments (which
  have the original varchar_pattern_ops indexes) get the index recreated
  with text_pattern_ops on next deploy. Fresh installs see the DROP as
  a no-op. Comment block updated to record the recreate intent.
- ChildrenPageCache include tag for ALL changed from "all" to "a" so the
  CacheKeys.childrenPage Javadoc's "1-2 char" promise holds (now nd/a/d
  are all <=2 chars).
- ContainerRepository.includeToBindString Javadoc corrected: it described
  the SQL as a CASE expression, but listDirectChildSummariesByParentHash
  actually uses a three-branch OR chain.
- ListFilterTest: added test_noServiceFilter_doesNotBindServicePatterns
  as a regression guard for the missing-bind bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix java style

* Address second review pass on #27878

- EntityRepository.processDeletionBatch wraps per-child cleanup exceptions
  with entityType + entityId context before re-throwing. The exception
  still propagates (so the loop still stops, failure-semantics contract
  unchanged); operators now get a stack trace that names the row that
  blocked a large recursive delete instead of an opaque error.
- CacheKeys.childrenPage Javadoc now lists the actual include tags
  ("nd" / "a" / "d") and points at ChildrenPageCache.includeTag as the
  authoritative source. Earlier comment still mentioned "all" after the
  switch to single-letter tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Test: ?root=true without service filter end-to-end (#27878 review)

Adds test_rootListing_withoutServiceFilter_returnsRootsAcrossAllServices
to ContainerResourceIT. Creates two distinct storage services, each with
a root container and a child container, then asserts that GET
/containers?root=true (no service filter):

- Succeeds (rootListingParams() defaults :serviceHashChild to '%.%.%' so
  the SQL has its bind even when ListFilter.getServiceCondition didn't
  add it).
- Includes root containers from both services (cross-service listing
  works without a service prefix narrowing the candidate set).
- Excludes child containers from either service (depth check still
  applied via the default bind).

Regression guard for the bug Copilot's review pass flagged at
CollectionDAO.java:784: 'GET /containers?root=true (no service) crashes
at runtime due to a missing named parameter.'

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Use generated name column instead of JSON extract in container summary queries

storage_container_entity has 'name' as a STORED generated column derived
from json->>'name' (see bootstrap/sql/schema/postgres.sql). Both slim
projection queries (findContainerSummaryRows and listDirectChildSummariesByParentHash)
were redundantly extracting it via JSON_UNQUOTE(JSON_EXTRACT(...)) on MySQL
and json->>'name' on Postgres — work the database had already done at insert
time.

Reading 'name' as a column directly:
  - Saves one JSON op per row on every page fetch
  - Lets ORDER BY name sort on the indexed generated column rather than a
    per-row JSON-extracted expression

displayName, fullyQualifiedName, and description stay as JSON extracts —
they aren't generated columns. (description in particular shouldn't be:
free-text fields can be many KB and a STORED generated column would
double the row size on disk.)

Row mapper unchanged — column labels in the SELECT list still match.

* Fix inaccurate ListFilterTest comment and Javadoc link to private method

ListFilterTest: the prefix-pattern comment said the LIKE patterns 'exclude'
direct/grandchildren — patterns themselves match, the SQL's NOT LIKE is
what excludes. Rewrote to show how ContainerDAO.listRoot* combines LIKE
and NOT LIKE on the two binds.

CacheKeys.childrenPage: the @link pointed at ChildrenPageCache#includeTag
which is private static; Javadoc tooling renders that as an unresolved
link. Redirected to the public Include enum the tag is derived from.

* Log original exception in recursive batch delete catch before wrapping

Wrapping the caught RuntimeException into a new one (with entity context
in the message) preserves the original via the cause chain, but the outer
exception mapper sees the wrapper and renders a generic 500 — the original
type information doesn't surface to operators investigating a failed
delete.

Adds a LOG.error before the wrap so the original exception (with full type
and stack) lands in the logs adjacent to the entity context, giving
operators enough signal to diagnose what actually blocked the delete.

* Restore failure-semantics comment block on recursive batch delete wrap

* use Entity.SEPARATOR instead of hard-coding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix check style

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-04 18:44:42 +05:30
Sriharsha Chintalapani
5620121e50
SearchIndex: tunable index settings + per-stage latency metrics (#27865)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* SearchIndex: configurable index settings + per-stage latency metrics

Adds two diagnostic and operational improvements to the distributed search
indexing pipeline so operators can both tune cluster behavior per
installation and pinpoint where reindex latency is being spent.

Configurable index settings (per-installation, no code changes needed)
- New SearchIndexing app config fields: liveIndexSettings (post-promote),
  bulkIndexSettings (during reindex), and per-entity overrides.
- DefaultRecreateHandler applies bulk overrides on staged-index creation
  (e.g. refresh=-1, replicas=0, async translog) and reverts to live values
  before alias swap. Optional force-merge before swap.
- Safety revert ensures the promoted index never inherits a disabled
  refresh interval, even if the admin only configured bulk overrides.
- Live UX is preserved: refresh defaults to 1s so users and agents that
  read-after-write see near-real-time results.
- New IndexManagementClient methods (updateIndexSettings, forceMerge)
  with implementations for OpenSearch and Elasticsearch.

Per-stage latency metrics (consumer-vs-producer attribution)
- StageStatsTracker accumulates per-stage wall-clock time alongside
  existing counters; added timing-only addStageTime() so per-record
  callbacks and per-batch wall-clock don't double-count.
- DB migration 1.13.0 adds readerTimeMs / processTimeMs / sinkTimeMs /
  vectorTimeMs columns to search_index_server_stats. Existing rows get
  DEFAULT 0; aggregation queries SUM the new columns.
- Reader timing wraps PartitionWorker.readEntitiesKeyset (DB latency).
  Process timing wraps the doc-build join in OpenSearch and Elasticsearch
  bulk sinks (CPU/serialization). Sink timing wraps client.indices().bulk
  (pure search-cluster latency), attributed per participating tracker.
- DistributedJobStatsAggregator surfaces totalTimeMs on each StepStats so
  the UI can compute avg latency = totalTimeMs / successRecords and
  throughput = successRecords / (totalTimeMs / 1000) on every WebSocket
  push without server-side derivation.
- New per-server aggregation query (getStatsByServer) for distributed
  visibility, fed into SearchIndexJob.ServerStats with timing fields.

UI: each of the four stage cards (Reader / Process / Sink / Vector) shows
"Latency: X ms · Y r/s" when timing is available; per-entity table gains
Sink avg + Sink throughput columns. Docs panel updated. New SearchIndexing
config section added with sane defaults that preserve current behavior.

Tests: 6 new StageStatsTracker timing tests, new aggregator test that
asserts StepStats.totalTimeMs is populated at job and per-entity level.
All existing tests updated for new arg shapes; 60 unit tests pass.

The pattern operators see: Reader avg climbing means DB-side issue
(cache/autovacuum); Sink avg climbing means OS-side issue (segments/
back-pressure); only one entity's row climbing identifies the offender.
2026-05-02 20:11:06 -07:00
Sriharsha Chintalapani
b118a87df2
Add text_pattern_ops index on entity-table fqnHash for Postgres listings (#27868)
* Add text_pattern_ops index on entity-table fqnHash for Postgres listings

Service-filtered listings (`?service=` / `?database=` / `?databaseSchema=` /
`?parent=` / `?apiCollection=` / `?spreadsheet=` / `?testSuite=`) compile
to `<table>.fqnHash LIKE 'prefix%'` via ListFilter.getFqnPrefixCondition.
The unique B-tree on `fqnHash` uses default `text_ops` opclass and the
column inherits the database default collation (`en_US.UTF-8` on managed
Postgres / RDS), neither of which lets the planner satisfy LIKE prefix
from the index. Cold count(*) and the page query both fall back to a
parallel seq scan over the JSONB heap — measured at ~3s on a ~580k-row
storage_container_entity even after VACUUM/ANALYZE tuning and an RDS
upsize. The unfiltered listing (`?limit=15`) clears the same dataset in
~215ms because it uses `idx_storage_container_entity_deleted_name_id`
from 1.8.2, which the LIKE predicate cannot.

Append a `text_pattern_ops` partial index on `fqnHash` for every entity
table that hits getFqnPrefixCondition (24 tables: chart_entity through
worksheet_entity). The `text_pattern_ops` opclass supports LIKE prefix
regardless of column collation, switching the cold count(*) plan from
parallel seq scan to bitmap index scan.

MySQL is unaffected: every entity-table `fqnHash` column already ships
with `CHARACTER SET ascii COLLATE ascii_bin`, a binary collation that
lets the existing unique B-tree answer LIKE prefix predicates directly.
The MySQL counterpart gets a documentation-only comment explaining the
asymmetry so the next migration audit doesn't have to re-derive it.
2026-05-02 17:25:56 -07:00
Sriharsha Chintalapani
ecc4b17579
Redis caching for container ancestors and children-page (#27858)
* Cache resolved ancestor chains in Redis

The /containers/name/{fqn}/ancestors endpoint runs on every detail-page
render to populate breadcrumbs. The resolution itself is one indexed
findReferencesByFqns call (already slim) plus FQN string walking, but the
DB round-trip and JSON deserialization are repeated for every navigation.
Bundle this behind Redis with the same shape as CachedReadBundle.

Cache key: om:anc:container:{fqnHash} → JSON List<EntityReference>, TTL =
entityTtlSeconds (default 5 min).

Invalidation:
- Writer drops its own key on update/delete (EntityRepository.invalidateCache)
- Cross-instance: the existing CacheInvalidationPubSub handler now also
  drops the ancestors key for the published FQN.
- Renames are self-healing: the new FQN is a different key, the old key
  TTL-expires.
- Display-name drift on a remote ancestor is bounded by TTL — acceptable
  since breadcrumb metadata is cosmetic.

The cache is wired into ContainerRepository.getAncestors only — generalising
to other hierarchical entity types is straightforward when more /ancestors
endpoints land.
2026-05-01 18:52:15 -07:00
sonika-shah
52548550e8
fix migration: update legacy relatedTerms in glossaryTerm version history after the glossary term realtion changes (#27770)
* fix: strip stale relatedTerms from glossary term version snapshots

Extends PR #26586. That fix cleaned glossary_term_entity but not the
version snapshots in entity_extension, so GET /versions/{v} still
500s on any pre-1.13 term whose relatedTerms had legacy shape:

  UnrecognizedPropertyException: Unrecognized field "id"
  (class TermRelation, has only "term" and "relationType")

Predicate matches only legacy snapshots — first item has bare `id`
(EntityReference) instead of `term` (TermRelation). Skips correctly-
shaped snapshots written on 1.13+.

Stripping is safe: relatedTerms is loaded from entity_relationship at
read time post-#25886.

* v1130: transform legacy relatedTerms in version snapshots instead of stripping

Replace the SQL UPDATE that stripped relatedTerms from entity_extension
version snapshots with a Java migration that wraps each legacy
EntityReference[] item as TermRelation[] (term + relationType="relatedTo").

Version reads deserialize entity_extension JSON directly without
rehydrating from entity_relationship, so a strip would lose history per
version. The transform preserves it.

Designed for tables with millions of rows: keyset paginated by
PK (id, extension), batched updates, idempotent on re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mysql): remove leftover entity_extension strip in v1130 post-migration

The previous edit added the comment pointer above the legacy
UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block
without removing it. On MySQL that SQL would have stripped relatedTerms
from version snapshots BEFORE the Java transform runs, defeating the
migration and losing related-term history. Postgres was already correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 03:35:45 +00:00
Eugenio
88c44502ae
feat: Add auto-classification support for storage service containers (#26495)
Some checks failed
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Has been cancelled
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Has been cancelled
Publish Package to Maven Central Repository / publish-maven-packages (push) Has been cancelled
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Has been cancelled
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Has been cancelled
* Add schema support for container auto-classification

Extend container entity schema to support sample data storage, enabling
PII detection and classification workflows on storage service containers.

Changes:
- Add sampleData field to container.json for storing sample data
- Create storageServiceAutoClassificationPipeline.json schema defining
  configuration for storage service auto-classification pipelines
- Update workflow.json to include StorageServiceAutoClassificationPipeline
  as a supported pipeline type

This provides the schema foundation for running auto-classification
workflows on S3, GCS, and other storage service containers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add backend support for container sample data and classification

Implement Java backend functionality to handle sample data ingestion,
storage, and PII masking for container entities.

Changes:
- ContainerRepository: Add sample data retrieval and storage operations
- EntityRepository: Extend sample data support to container entities
- ContainerResource: Add REST endpoint for container sample data ingestion
- PIIMasker: Extend PII masking to support container entities

This enables the backend to process and store sample data from storage
service containers and apply PII masking rules during data retrieval.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Extend classifiable entity types to include containers

Add Container to the ClassifiableEntityType union, enabling PII detection
and auto-classification workflows to process storage service containers
alongside database tables.

Changes:
- Update ClassifiableEntityType from Table-only to Union[Table, Container]
- Import Container entity type
- Update module docstring to reflect current support

This type extension allows the PII processor to handle both database
tables and storage containers uniformly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add container sample data ingestion to OpenMetadata API

Implement container-specific API mixin for sample data operations and
integrate it into the main OpenMetadata client.

Changes:
- Add OMetaContainerMixin with ingest_container_sample_data method
- Handle binary data encoding (base64) and serialization errors
- Register mixin in OpenMetadata class hierarchy
- Mirror table sample data ingestion patterns for consistency

This provides the Python API layer for ingesting sample data from
storage service containers into OpenMetadata.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Implement storage service samplers for S3 and GCS

Add sampler implementations for storage services to extract sample data
from structured containers (Parquet, CSV) for auto-classification.

Changes:
- Create base StorageSamplerInterface for storage service sampling
- Implement S3Sampler for AWS S3 containers with structured file support
- Implement GCSSampler for Google Cloud Storage containers
- Support column extraction and data sampling for structured formats
- Handle dataModel-based column definitions from containers

Storage samplers read container metadata, fetch file contents, and
generate sample datasets for downstream PII detection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update PII processor to support container entities

Extend the base PII processor to handle both Table and Container
entities with unified column extraction logic.

Changes:
- Add _get_entity_columns helper to extract columns from Table or Container
- Handle Container entities with optional dataModel.columns structure
- Improve column matching with safe fallback for missing columns
- Use generic entity reference in error reporting
- Add early return when entity has no columns to process

This enables PII detection to run on storage containers the same way
it processes database tables.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add storage service support to sampler processor

Extend the sampler processor to handle both database and storage service
entities with appropriate sampler class selection.

Changes:
- Detect service type from source config (Database vs Storage)
- Import StorageServiceAutoClassificationPipeline
- Handle both Table and Container entity types in _run method
- Add column validation for Container entities (via dataModel.columns)
- Create storage-specific sampler interfaces for S3 and GCS
- Update sampler_interface to support Container entities
- Improve error messages with entity type context

The processor now dynamically selects database or storage samplers based
on the pipeline configuration type.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add storage fetcher strategy for container classification

Implement fetcher strategy pattern for storage services to retrieve
containers for auto-classification workflows.

Changes:
- Add StorageFetcherStrategy to handle storage service entity fetching
- Update EntityFetcher to select appropriate strategy based on service type
- Support both DatabaseService and StorageService in strategy selection
- Import StorageService type for service detection
- Improve error messages with specific service type information

The fetcher now dynamically creates database or storage-specific
strategies to retrieve entities based on pipeline configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Register auto-classification pipeline in storage service specs

Add AutoClassification pipeline support to S3 and GCS storage service
specifications, enabling UI and workflow registration.

Changes:
- Add AutoClassification to S3ServiceSpec supported pipelines
- Add AutoClassification to GCSServiceSpec supported pipelines
- Import StorageServiceAutoClassificationPipeline in both specs

This registers the auto-classification workflow type for storage
services in the ingestion framework's service registry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add container support to metadata sink and patch operations

Extend metadata sink and patch mixin to handle container entities,
enabling sample data ingestion and tag updates for containers.

Changes:
- Add Container to MetadataRestSink entity type handling
- Implement container sample data ingestion in sink._run
- Add Container to PatchMixin tag operations
- Import Container entity type in both modules

This completes the metadata ingestion pipeline by allowing the sink
to persist sample data and classification tags for container entities.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update classification workflow for storage service support

Extend the auto-classification workflow to handle both database and
storage service pipelines with unified step orchestration.

Changes:
- Import StorageServiceAutoClassificationPipeline
- Add type checking for both Database and Storage pipeline configs
- Remove unnecessary cast, use direct type checks
- Add validation warning for unsupported config types
- Preserve enableAutoClassification flag behavior for both types

The workflow now supports running PII detection and classification
on both database tables and storage containers based on config type.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add unit tests for container classification components

Add test coverage for container-specific fetcher and sampler components.

Changes:
- Add test_container_fetcher.py for StorageFetcherStrategy tests
- Add test_container_sampler_processor.py for container sampler tests

Tests validate:
- Storage service fetcher strategy selection and instantiation
- Container sampler processor initialization and execution
- Proper handling of Container entities vs Table entities

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Reorganize integration tests by entity type

Restructure auto-classification integration tests into separate
directories for databases and containers to improve organization.

Changes:
- Move database classification tests to databases/ subdirectory
- Move conftest.py, init.sql, and test_tag_processor.py into databases/
- Container tests already organized in containers/ subdirectory
- Remove old flat test structure

This organization makes it clearer which tests target database entities
vs storage container entities in classification workflows.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Properly retrieve sample data

* Update generated TypeScript types

* Apply Gitar bot

* Fix tests

* feat: Add supportsProfiler to storage connection schemas

Add supportsProfiler field to storage connection schemas (S3, GCS, ADLS,
Custom Storage) to enable auto-classification pipeline support for storage
services. This aligns with the backend changes in PR #26495 that added
container auto-classification functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: Add UI support for storage service auto-classification

- Update IngestionWorkflowUtils to route storage services to storage-specific
  auto-classification schema
- Modify getSupportedPipelineTypes to filter pipeline types based on service
  category (storage services only show AutoClassification, not Profiler)
- Update AddIngestionButton to pass serviceCategory parameter
- Add unit test to verify storage services only get AutoClassification option

This enables users to configure and run auto-classification agents on storage
services (S3, GCS, ADLS) for PII detection on containers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add BucketArn field to S3BucketResponse model

AWS S3 API now returns a BucketArn field in list_buckets() responses.
Add this optional field to prevent Pydantic extra_forbidden validation errors.

Error: BucketArn Extra inputs are not permitted [type=extra_forbidden]

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Add Container permissions to AutoClassificationBotPolicy

Add Container entity permissions to AutoClassificationBotPolicy to allow the
autoClassification-bot to apply tags and sample data to storage containers.
Previously, the bot only had permissions for Table entities, causing
permission denied errors when running auto-classification on storage services.

Changes:
- Add Container rule with EditAll and ViewAll operations to policy seed data
- Create migrations for MySQL and PostgreSQL to update existing installations

Error fixed: Principal: CatalogPrincipal{name='autoclassification-bot'}
operations [EditTags] not allowed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update generated TypeScript types

* fix: Add fallback for storage service type detection in sampler

Add fallback logic to detect storage services by source type name when
the pipeline config type check fails. This handles cases where the Airflow
environment might not have the updated schema/package with
StorageServiceAutoClassificationPipeline.

Changes:
- Add fallback detection for s3, gcs, azuredatalake, customstorage
- Add debug logging for service type detection
- Preserve primary instanceof check for proper type detection

This fixes the "No module named 'metadata.ingestion.source.database.gcs'"
error when running storage auto-classification pipelines.

* Guide to support new entities in classification agent

* docs: Update auto-classification guide with debugging learnings

Add critical troubleshooting information discovered during container
classification debugging:

1. storeSampleData defaults to false
   - Sample data NOT ingested unless explicitly enabled
   - Document why this is by design (avoid large datasets)
   - Add troubleshooting steps to verify flag is set

2. Service type detection fallback pattern
   - Explain why fallback is needed (Airflow package caching)
   - Show complete implementation with source type lists
   - Add debug logging pattern

3. Troubleshooting section
   - Sample data not appearing: check storeSampleData, database, logs
   - Module import errors: service type detection issues
   - PII tags not applied: config and data issues

4. Common pitfalls additions
   - Emphasize storeSampleData default value
   - Service type detection in cached environments

These updates reflect real debugging scenarios and will help future
developers avoid the same issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Apply gitar bot suggestions

* Fix suggestions, linting, and SonarCloud issues

* More gitar bot suggestions

* Fix compile error

* Fix linting

* Fix broken tests

* Fix unorganized import

* Improve config parsing

This is so that we rightly discover polymorphic properties of `source` when the config does not provide enough fields for Pydantic to correctly discriminate between models (e.g: confusing database source config with storage source config)

* Gitar bot comment

* Fix s3 source test

* Apply comments from reviews

* Extract cantidate column logic in samplers

* Fix tests

* Fix container customization test

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-04-24 06:29:16 -07:00
Teddy
47c88d49ce
ISSUE #3031 - Dynamic Sampling Config (#27184)
* feat: move flat sampling to sampling config + dynamic sampling option

* feat: move flat sampling on the backend to sample profile conifg object

* feat: fix circular import

* feat: align UI with new profiler config

* feat: fix json schema

* feat: align python imports with new schema path

* feat: update migration to look at extension

* feat: remove enable

* feat: remove enable

* feat: added titles to sample config

* feat: generated ts classes

* feat: addressed comments

* feat: change sample config instantiation to match new structure

* feat: removed backward compatible fields

* feat: ran java linting

* UI fixes, tests and locale changes

* fix failing test

* fix ui check style

* fix failing profiler test

* feat: fix ci failures

* feat: generated ts classes

* feat: fix ci failure

* fix: failing ci

* address comments

* fix failing test

* fix: ci failure

---------

Co-authored-by: Harshit Shah <dinkushah169@gmail.com>
2026-04-17 10:46:06 -07:00
sonika-shah
077982c348
Move ontology/glossary relation migration from 1.14.0 back to 1.13.0 (#27431)
* Move ontology/glossary relation migration from 1.14.0 back to 1.13.0

Ontology feature will ship in 1.13.0, not 1.14.0. Move the glossary term
relation migrations (relationType backfill, settings insert, stale
relatedTerms strip, conceptMappings backfill) back to the 1.13.0
postDataMigrationSQLScript for both MySQL and PostgreSQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Restore empty 1.14.0 SQL migration files for Java migration framework

The V114 MigrationUtil.java package requires the 1.14.0 migration
directory to exist with SQL files for the migration to be picked up.
Keep them as empty files (matching convention of other versions with
no post-data SQL).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add schemaChanges.sql and comment all 1.14.0 SQL migration files

Add both schemaChanges.sql and postDataMigrationSQLScript.sql for
mysql and postgres with a comment explaining the directory is required
for the V114 Java migrations to be picked up by the migration framework.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix missing trailing newline in postgres postDataMigrationSQLScript

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* address feedback

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
2026-04-16 16:45:01 +00:00
Sriharsha Chintalapani
bb0daa180e
RDF, cleanup relations and remove unnecessary bindings, add distributed mode for RDF reindex (#26902)
* RDF, cleanup relations and remove unnecessary bindings, add distributed mode for RDF reindex

* Update generated TypeScript types

* Address comments from copilot

* Update generated TypeScript types

* fix test issues

* Fix minor UI bugs

* Add the missing filters

* Fix RDF export API error

* Add export functionality

* Fix ui-checkstyle

* Fix java checkstyle

* Fix unit tests

* Fix and increase the coverage for KnowledgeGraph.spec.ts

* Fix tests

* Remove rdf as default in playwright and local docker

* fix ui-checkstyle

* Address comments

* Potential fix for pull request finding 'CodeQL / Artifact poisoning'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Address copilot comments

* Address copilot comments

* FIx tests

* FIx docker

* Update openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/rdf/distributed/DistributedRdfIndexCoordinator.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address copilot review comments: license headers, JSON escaping, type safety, border-color, stop semantics

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/c026e52e-162b-4c9a-9874-43791d4aaac1

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

* Show error toast for unsupported export format in KnowledgeGraph

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/c026e52e-162b-4c9a-9874-43791d4aaac1

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

* Fix docker

* Fix docker for playwright

* Fix docker for playwright

* Fix tests

* Fix tests

* Fix docker

* Fix docker

* Fix glossary and pagination spec flakiness

* update the missing translations

* Fix docker

* Fix docker

* Fix integration test

* Fix fuseki not starting

* Fixed the run local docker script

* worked on comments

* Fix flakiness in knowledge graph tests

* Fix checkstyle

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
2026-04-14 13:24:41 -07:00
Rajdeep Singh
5e1416447f
fix(sampler): Respect randomizedSample flag at 100% percentage sampling (#26966)
* fix(sampler): respect randomizedSample flag at 100% percentage sampling

When profileSample is 100% with PERCENTAGE type, the sampler
short-circuits and returns the raw dataset without any randomization,
even when randomizedSample is True (the default).

Split the combined condition so:
- No profileSample set -> return raw dataset (no sampling configured)
- 100% PERCENTAGE + randomizedSample=False -> return raw dataset (optimization)
- 100% PERCENTAGE + randomizedSample=True -> go through normal sampling path
  which applies RandomNumFn/df.sample for proper row shuffling

Fixes #21304

* Address review: use 'is False' for Optional[bool] and add unit tests

- Fix randomizedSample check from 'not' to 'is False' in both SQASampler
  and DatalakeSampler to correctly handle None (Optional[bool] default=True)
- Add unit tests verifying 100%% PERCENTAGE behavior for randomizedSample
  values True, False, and None

* Add ORDER BY on random column in fetch_sample_data for true randomization

The get_dataset() fix ensures 100% PERCENTAGE + randomizedSample routes
through get_sample_query() which produces a CTE with a random column.
Now fetch_sample_data() detects that column and applies ORDER BY before
LIMIT, so each call returns a different subset of rows.

Also add real-DB integration tests using SQLite for the 100% PERCENTAGE
edge case (True, False, None).

* Address review: remove stale comment, unused import, add return assertions

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review: move ORDER BY to get_sample_query, clean up fetch_sample_data

- Move ORDER BY rnd.c.random into get_sample_query() PERCENTAGE branch,
  gated on randomizedSample is not False (mirrors ABSOLUTE branch pattern)
- Revert fetch_sample_data() to original: remove ds_columns variable,
  random_column detection, and ORDER BY logic (ordering now handled in CTE)
- Remove duplicate assertions in DatalakeSampler100Pct tests

* Address review: None defaults to False for randomizedSample

Per TeddyCr's feedback, randomization is computationally heavy and
should not be the default. Changed from 'is False'/'is not False' to
truthiness checks so None (unset) behaves the same as False.

Only explicit randomizedSample=True triggers ORDER BY and skips the
100% fast path. This is consistent with the ABSOLUTE branch which
already uses truthiness checks.

* Fix integration test: None should skip sample_query (matches truthiness semantics)

* fix(tests): update BigQuery view sampling expected queries with ORDER BY

BigQuery views fall through to SQASampler.get_sample_query() which now
adds ORDER BY rnd.random when randomizedSample is enabled. Update the
expected SQL strings in test_sampling_for_views and
test_sampling_view_with_partition to match.

* refactor: use explicit is False for randomizedSample checks

Address review comments: SampleConfig.randomizedSample defaults to True,
so only an explicit False should disable randomization. Using is False
/ is not False instead of truthiness ensures None follows the model
default (enabled) rather than being incorrectly treated as disabled.

* ci: re-trigger checks after SIGSEGV flake

* refactor: only explicit True randomizes, add non-determinism tests

* test: increase non-determinism iterations to reduce flakiness

* chore: added randomize as false

* fix: align randomizedSample defaults with schema (false)

* fix: remove ORDER BY from BigQuery test expectations

BigQuery sampling tests create SampleConfig without setting
randomizedSample, which now defaults to False. Since ORDER BY
is only added when randomizedSample is True, the expected query
strings should not include ORDER BY.

Also fix inaccurate docstring in test_sample.py.

* test: increase non-determinism test iterations to reduce flakiness

Increase fetch_sample_data loop from 10 to 20 iterations to further
reduce the theoretical probability of a false failure in the
randomized ordering test.

---------

Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-14 10:28:54 -07:00
sonika-shah
733921f510
Fix: align glossary term relation type colors with design system (#27142)
* Fix: align glossary term relation type colors with design system

System-defined relation types (relatedTo, synonym, antonym, etc.) were
initialized with old Ant Design palette colors (#1890ff, #722ed1, …) while
the frontend RELATION_META constants had been updated to the new design
system colors (#1570ef, #b42318, …). Because renderColorBadge used
record.color (from the backend) unconditionally, the stale Ant Design
colors were always displayed instead of the intended ones.

- Frontend: renderColorBadge now treats RELATION_META as authoritative for
  system-defined types so the correct design-system color is always shown,
  regardless of what color value is stored in the backend.
- Backend (SettingsCache.java): default colors updated for new installs.
- DB migration (2.0.0): postDataMigrationSQLScript added for MySQL and
  PostgreSQL to update colors in existing deployments without touching
  user-added custom relation types.
- Tests: unit tests for renderColorBadge color-resolution logic; integration
  test asserting all ten system-defined types return the expected hex values
  from the API.

Fixes #openmetadata/OpenMetadata

* Remove dev-only MySQL 2.0.0 migration script

* Remove dev-only PostgreSQL 2.0.0 migration script

* Fix: align glossary term relation settings colors and remove duplicate 1.13.0 migration; Remove glossary term relation migrations mistakenly re-added in 1.13.0 and update relation type colors in the 1.14.0 migration INSERT to use design system tokens instead of old Ant Design colors.

* fix lint

* add more test

* address feedback

* fix prettier formatting in test file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* remove GlossaryTermRelationSettings test file from branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:03:35 +00:00
Suman Maharana
a06b7e74cc
Chore: Remove iceberg standalone connector (#26365)
* Chore: Remove iceberg standalone connector

* add migration scripts

* Update generated TypeScript types

* py_format

* address comments

* Addressed changes

* add tests

* migrate to custom database

* fix tests

* fix tests

* fix migrations

* hard delete exising ingestion pipelines for iceberg

* Update generated TypeScript types

* Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/services/ingestionPipelines/ingestionPipeline.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/automations/workflow.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/api/automations/createWorkflow.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/api/services/ingestionPipelines/createIngestionPipeline.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/api/services/createDatabaseService.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/automations/testServiceConnection.ts

* Update generated TypeScript types

* Update bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-02 14:55:23 +00:00
Sriharsha Chintalapani
ed58077197
MCP services (#23623) 2026-04-01 22:15:20 +05:30
Ram Narayan Balaji
10cf2f9ea0
Move ontology/glossary relation migration from 1.13.0 to 1.14.0 (#26755)
The glossary term relation migration (relationType backfill, default
glossaryTermRelationSettings insert, relatedTerms cleanup, conceptMappings
backfill) was accidentally placed in the 1.13.0 migration scripts. This
commit moves it to the correct 1.14.0 slot, restoring 1.13.0 to its
original content (computeMetrics profiler pipeline cleanup only).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 14:53:10 +05:30
sonika-shah
aff1343643
fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter (#26586)
* fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter

Pre-1.13.0, relatedTerms was stored as EntityReference[] directly in the
glossary_term_entity JSON column. PR #25886 changed relatedTerms to TermRelation[]
and moved storage to entity_relationship table, but missed adding a migration to
clean up the old EntityReference data still present in existing rows.

When listAfter() deserializes the entity JSON, Jackson fails with:
  UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation)

The existing migration already backfilled entity_relationship rows with
relationType="relatedTo", so stripping relatedTerms from entity JSON is safe —
the data is already in entity_relationship and will be loaded from there.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter

Pre-1.13.0, relatedTerms was stored as EntityReference[] directly in the
glossary_term_entity JSON column. PR #25886 changed relatedTerms to TermRelation[]
and moved storage to entity_relationship table, but missed adding a migration to
clean up the old EntityReference data still present in existing rows.

When listAfter() deserializes the entity JSON, Jackson fails with:
  UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation)

The existing migration already backfilled entity_relationship rows with
relationType="relatedTo", so stripping relatedTerms from entity JSON is safe —
the data is already in entity_relationship and will be loaded from there.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-03-20 10:29:26 +05:30
Vishnu Jain
0e8de77dd0
Mcp impersonation (#26488)
* fix MCP bot impersonation and app registration

* add MCP audit log impersonation and change event publishing

* add unit tests for MCP audit log and impersonation context

* fix getMcpBotName startup race and remove unused WEBSOCKET_HANDLER

* Fix: enforce limits in CreateTestCaseTool like other create tools

* Fix: add migration for McpApplicationBot impersonation

* Move allowBotImpersonation to app definition schema instead of hardcoding

* Update generated TypeScript types

* Fix McpAuthFilter error handling

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-03-19 00:19:18 +05:30
Sriharsha Chintalapani
6d99ba2dc0
Glossary relations (#25886)
* Glossary Term Relations

* Add GlossaryTerm Relations

* Add GlossaryTerm Relations, Add custom relations, onotolgoy explorer

* Add Translations

* Update generated TypeScript types

* Address comments

* Address comments

* Address comments

* Update generated TypeScript types

* Update yarn.lock after merging cytoscape dependencies from glossary_relations

* fix zoom in and out functionality and added missing translate keys

* fix test

* Remove unwanted changes

* nit

* nit

* nit

* Remove conflict test

* nit

* fix test

* Add test for ontology explorer

* New yarn lock and 2.0.0 schema changes missed during merge conflicts

* Revamped glossary term relation settings

* Refactor code

* Addressed comments

* nit

* Update generated TypeScript types

* Java Checkstyle and Yarn lock

* Update generated TypeScript types

* fix unit test

* Remove 2.0.0 migration folders placed at wrong loc

* Merge main

* fix navigation to relation graph in glossary

* fix ontology explorer spec

* Added filter support in the data mode

* Fix glossary term relation CI failures

### Canonical Relation Storage (GlossaryTermRepository)

* Introduced `computeCanonicalRelationType()` to normalize relation direction
  using UUID ordering (lower UUID is always treated as "from")
* Prevents duplicate and inconsistent relation rows when created from either side
* Updated `setTermRelations()` and `addRelation()` to store canonical relation types
* Fixed `setFields()` read logic:

  * Invert relation type for `fromRecords` (entity is the TO side)
  * Keep `toRecords` unchanged
* Updated `deleteBidirectionalRelatedTo()` to match canonical storage format
* Added `RequestEntityCache.invalidate()` after relation mutations to ensure consistency

### Lazy RDF Resource Initialization

* Added `RdfRepository.getInstanceOrNull()` for null-safe access without throwing
* Refactored `RdfResource` constructor to avoid eager `RdfRepository.getInstance()` call
* Enabled resource registration even when Fuseki is not initialized
* Introduced lazy getters:

  * `getRdfRepository()`
  * `getSemanticSearchEngine()`
* Updated all endpoints to guard with null checks before `isEnabled()`

  * Return `503 Service Unavailable` when RDF is not ready

### Graceful Test Degradation (Fuseki-dependent tests)

* Added `TestSuiteBootstrap.isFusekiEnabled()` to detect Fuseki availability
* `GlossaryOntologyExportIT`:

  * Falls back to Testcontainers-based local Fuseki when bootstrap Fuseki is unavailable
* `GlossaryTermRelationIT`:

  * Skipped via `assumeTrue` when Fuseki is unavailable
* `MetricResourceIT`:

  * Skips RDF-specific tests when Fuseki is unavailable

* fix package conflicts

* nit

* Fix merge conflicts, Python test, RDF reliability, and VectorDocBuilder tests

- Fix Python test_patch_glossary_term_related_terms to use TermRelation
  instead of EntityReferenceList (schema changed relatedTerms type)
- Rewrite VectorDocBuilder tests for current buildEmbeddingFields API
- Improve JenaFusekiStorage retry logic to retry on all HTTP errors
- Increase Fuseki tmpfs size to prevent disk space exhaustion in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix pycheck

* Address all 8 PR review findings

1. Add authorization check on getTermRelationGraph endpoint
2. Add null guard on getBaseUri() to prevent NPE
3. Add React key prop on RelatedTermTagButton in map renders
4. Mark RdfResource lazy-init fields as volatile for thread safety
5. Replace exception messages with generic errors in API responses
6. Unify DEFAULT_RELATION_TYPES between CSV and repository (10 types)
7. Add jitter backoff to deadlock retry in CollectionDAO
8. Replace N+1 queries in prefetchGraphTerms with batch fetch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix Fuseki tmpfs exhaustion and GlossaryTermRelationIT double init

- Remove tmpfs size limit on Fuseki container to prevent disk exhaustion
- Guard RdfUpdater.initialize() in GlossaryTermRelationIT to skip if
  already initialized by bootstrap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix duplicate edges, null term NPE, and silent exception in graph builder

- Deduplicate edges in buildGraph() using edgesSeen set
- Skip TermRelation entries with null term references to prevent NPE
- Add warning log when glossary term relation settings fail to load

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix cardinality count after canonical swap and double-checked locking

- getRelationCount now matches inverse relation type for fromRecords
  where the term is the target, fixing cardinality bypass after
  bidirectional UUID canonicalization
- Use double-checked locking in RdfResource.getSemanticSearchEngine()
  to prevent duplicate instance creation under concurrency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anuj-kumary <anujf0510@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-03-18 10:51:03 +05:30
Sriharsha Chintalapani
12b364313c
Fix Metrics collection; reduce no.of metrics; improve slow request lo… (#25751)
* Fix Metrics collection; reduce no.of metrics; improve slow request logging

* Move sync calls to search & rdf to async

* Improve slow request tracking

* Improve slow request tracking

* Add clear breakdown in slow request

* Batch TestCaseRepository calls

* Batch API calls

* Initial Implementation of ReadEngine

* Improvements with ReadEngine/WriteEngine

* Improvements with ReadEngine/WriteEngine

* Improvements with ReadEngine/WriteEngine

* Improve by removing unnecessary ser/de

* Additional improvements with PatchFieldsPlanner

* Further performance improvements

* Further performance improvements

* Address comments

* Merge from main

* Address comments

* Address comments

* Address latest feedback - 2/21

* fix merge conflict

* Address Slow Request review

* Address the comments

* Address comments; Fix tests

* Fixes to the failing tests

* Fix bugs in tests

* Fix checkstyle

* Address playwright tests

* Fix tests

* Fix bugs

* Fix tests

* address comments

* Fix issues from playwright

* Fix playwright tests

* Fix tests for playwright

* Address comments

* Fix glossary test

* fix checkstyle

* Fix playwright issues

* Fix playwright issues - incrementalChagneDesc

* Restore ApprovalTaskWorkflow in GlossaryTerm and TestCase repositories

The slow_request branch accidentally removed entity-specific ApprovalTaskWorkflow
overrides, causing the generic parent to use checkUpdatedByTaskAssignee instead of
checkUpdatedByReviewer. This broke Glossary approval and TestCase approval Playwright tests.

- GlossaryTermRepository: restore ApprovalTaskWorkflow with checkUpdatedByReviewer
- TestCaseRepository: restore ApprovalTaskWorkflow, preDelete guard, updateReviewers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix base ApprovalTaskWorkflow to use reviewer check instead of task assignee

The centralized ApprovalTaskWorkflow in EntityRepository was using
checkUpdatedByTaskAssignee instead of checkUpdatedByReviewer, breaking
approval workflows for all entity types. Added verifyReviewer() as a
top-level static method on EntityRepository and restored missing
updateReviewers() and preDelete IN_REVIEW guards in DataContract,
DataProduct, Metric, and Tag repositories. Removed now-redundant
entity-specific ApprovalTaskWorkflow overrides from GlossaryTerm and
TestCase repositories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix regression introduced in backend tests; make the playwright tests stable

* Stabilize the playwright tests

* Stabilize the playwright tests

* Improve playwright tests

* Improve playwright tests

* Fix team playwrights

* Fix merge from main

* Fix playwrigt tests

* Fix playwright tests

* Batch domain/data product asset counts into single ES aggregation queries

Replace N individual ES count queries with single aggregation query per
entity type. Domain counts roll up child counts to parent domains.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Improve Playwright test reliability and expand CI shards

Add polling waits for async ES indexing, fix lineage edge selectors,
use API-based setup for domain/data product widget tests, and expand
CI from 6 to 8 shards with dedicated graph/landing projects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Improve test reliability with response checks and guards

- Add API response status checks in create() for Domain, DataProduct,
  Glossary, TableClass, and UserClass — silent API failures now throw
  immediately with status code and response body
- Add guards in selectDataProduct() and addAssetsToDataProduct() for
  undefined name/fqn — clear error messages instead of cryptic
  "locator.fill: value: expected string, got undefined"
- Fix GlossaryPermissions double navigation — remove redundant
  redirectToHomePage + sidebarClick before glossary.visitEntityPage()
- Increase OnlineUsers timeout from 5s to 15s for CI resource pressure
- Increase Tour badge timeout from 10s to 20s
- Fix visitGlossaryPage: wait for loader before clicking menuitem
- Remove chromium testIgnore for graph/landing/stateful test files
  (these must run in chromium project for 6-shard CI workflow)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Remove all networkidle waits and improve CI reliability

- Remove ~780 networkidle waits across 144 test/utility files — these
  hang or resolve prematurely under CI load causing false negatives
- Add polling.ts with waitForSearchIndexed and waitForPageLoaded helpers
- Convert checkAssetsCount and search functions to expect.poll() for
  async ES indexing tolerance
- Increase expect timeout to 15s for CI environments
- Split CI into 8 shards with dedicated projects (stateful/graph/landing)
  to reduce thread contention
- Fix GITHUB_STEP_SUMMARY size overflow (base64 screenshots → table)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix genuine test failures from networkidle removal

- GlossaryPagination: Fix waitForResponse race conditions - register
  listener BEFORE the triggering action, add **/ URL prefix
- LanguageOverride: Fix selector from getByText('EN') to
  getByText('English - EN') matching actual dropdown text
- NestedColumnsExpandCollapse: Fix URL glob pattern, use dispatchEvent
  to avoid inner Link navigation, add waitForResponse for filtered search
- lineage.ts: Revert dragConnection hover approach that broke React
  Flow connection mode, keep direct dispatchEvent
- customizeLandingPage.ts: Remove waitForURL that hangs after page.goto
- Teams.spec.ts: Add isJoinable: false for private team creation
- UserDetails.spec.ts: Revert Escape/clickOutside save flow that
  dismissed edit mode before saving roles
- Users.spec.ts: Revert Data Consumer permissions test to original
  simple approach using fixtures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Relax OnlineUsers activity time assertion

The "Online now" exact match fails under CI load because the activity
timestamp may show as "X seconds ago" or "X minutes ago" by the time
the page renders. Changed to accept any recent activity format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix 4 genuine test failures from CI run

1. saveCustomizeLayoutPage: Use response predicate matching both
   POST (create) and PUT (update) patterns instead of glob that
   only matched updates. Fixes 180s timeout in drag-and-drop test
   when layout doesn't exist yet (fullyParallel=true).

2. GlossaryMiscOperations: Add test.slow(true) — test does 9
   sequential page navigations that exceed the 60s timeout.

3. DomainDataProductsWidgets "Assign Widgets": Add test.slow(true)
   — calls addAndVerifyWidget twice, each with multiple navigations.

4. DomainFilterQueryFilter: Add waitForAllLoadersToDisappear before
   clicking domain-dropdown after search operations that trigger
   page re-renders.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix AutoPilot test — reload page after API status poll

The AutoPilot status banner never appeared because:
1. checkAutoPilotStatus polls the workflow API directly via apiContext
   (outside the browser), not through page network requests
2. The UI uses WebSocket for live updates, but the socket connection
   is only established when the page loads with status=RUNNING
3. Since the page loaded before the workflow started, the socket was
   never connected, so the UI never received the completion event

Fix: reload the page after checkAutoPilotStatus confirms the workflow
finished, so the UI renders with the current state. Also increase the
banner visibility timeout to 30s for CI environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix flaky tests — entity collisions, missing cleanup, expect timeout

- Replace Date.now() with uuid() for entity names in CustomProperties tests
  to prevent collisions when parallel workers execute within the same millisecond
- Fix FollowingWidget: move shared adminUser create/delete to top-level
  base.beforeAll/afterAll to prevent duplicate user creation across 11
  parallel test.describe blocks
- Add missing afterAll cleanup to OnlineUsers, Metric, CustomPropertyAdvanceSearch,
  and CustomProperties tests to prevent entity/user leaks between runs
- Replace hardcoded metric name in MetricSearch with uuid-based name
- Add global expect timeout of 15s (up from 5s default) for CI resilience

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Playwright CI: include UI in build-once Maven build

The build-once optimization (#26423) used -DonlyBackend -pl !openmetadata-ui
which produces a tar.gz without the compiled React app. The Docker container
starts but cannot serve the login page, causing auth.setup.ts to timeout
on all 6 shards waiting for input[id="email"] to appear.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix CodeQL security warnings

- Replace Math.random() with crypto.randomUUID() for test data generation
- Escape backslash characters in CSS selectors for glossary FQN values
- Use page.getByTestId() instead of raw CSS selectors in entity utils
- Increase RSA key size from 512 to 2048 bits in JwtFilterTest
- Skip archive entries containing '..' in JsonUtils.getResourcesFromJarFile

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix user cleanup to prevent 'Email Already Exists' failures

- Glossary.spec.ts: Fix typo user3.create→delete in afterAll, add missing adminUser.delete
- Teams.spec.ts: Add afterAll cleanup hooks for 3 nested describe blocks that were missing them (EditUser, DataConsumer, Owner)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Add afterAll cleanup hooks and fix test reliability

- InputOutputPorts.spec.ts: Add afterAll for domain/tables/topics/dashboards
- Users.spec.ts: Add top-level afterAll for all shared entities
- Entity.spec.ts: Add afterAll for shared + per-entity-type cleanup
- Pagination.spec.ts: Add afterAll for 13 describe blocks (services, DBs, etc.)
- DataProductRename.spec.ts: Add afterAll cleanup
- TestCaseIncidentPermissions.spec.ts: Add afterAll for users/roles/policies/table
- ImpactAnalysis.spec.ts: Add afterAll for all 7 entity types
- NestedColumnsExpandCollapse.spec.ts: Add afterAll for 4 describe blocks
- DataProductPermissions.spec.ts: Add afterAll cleanup
- ServiceEntityPermissions.spec.ts: Add afterAll for testUser + per-entity
- ServiceForm.spec.ts: Add afterAll for adminUser
- domain.ts: Replace waitForTimeout(2000) with proper loader/tab waits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Trigger Playwright CI

* Playwright: Fix 2 failures and 26 flaky tests with proper waits

Fix remaining 2 genuine failures:
- DomainDataProductsWidgets: add test.slow(true) for ES indexing lag
- Users.spec.ts: add test.slow(true) and loader waits for owner search

Fix 26 flaky tests by addressing 5 root cause patterns:
- Response listener after trigger: MetricCustomUnitFlow, DomainUIInteractions
- Missing loader wait after navigation: 16 tests across CustomizeDetailPage,
  DataProductPersonaCustomization, DataContracts, ExploreTree, and others
- Element not rendered after API response: EntityVersionPages, ODCSImportExport
- DOM not settled after loader: Domains nested rename
- Permission cache propagation: GlossaryPermissions

Shared utility improvements:
- waitForPatchResponse uses entity-specific URL pattern
- openColumnDetailPanel accepts entityEndpoint param with API response wait
- Entity.spec.ts uses dynamic entity.endpoint instead of hardcoded tables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix addOwner retry to wait for search API response

The owner search retry loop was refilling the search input but not
waiting for the API response before checking item visibility. This
caused the poll to repeatedly check stale/empty results.

Fix: await search response and loader detach in each retry iteration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix owner listitem selector — remove exact match

The owner selection list items include avatar initials (e.g., "G") in their
accessible name, making exact: true fail since the accessible name is
"G UserName" not just "UserName". Switching to substring matching fixes
the Users.spec.ts persistent failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix 10 remaining flaky tests with proper waits

- ColumnLevelTests: loader wait after visiting test case panel
- DataQualityPermissions: loader wait after visiting test suite page
- IncidentManagerDateFilter: loader wait after page reload
- InputOutputPorts: wait for warning alert before asserting
- Lineage: replace 5 hardcoded waitForTimeout(500) with loader waits
- CustomizeDetailPage: dialog close waits, fix missing await on expect
- DataProductPersonaCustomization: loader wait + modal visibility check
- GlossaryPermissions: increase permission propagation wait, loader wait
- GlossaryHierarchy: loader waits after modal close and glossary select
- ExploreTree: loader waits after API response before UI interaction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix CodeQL security alerts: incomplete escaping and Zip Slip

1. entity.ts: Use JSON.stringify().slice(1,-1) for proper escaping of
   both backslashes and double quotes in filter values, replacing the
   incomplete .replace(/"/g, '\\"') approach.

2. JsonUtils.java: Strengthen Zip Slip protection by normalizing paths
   via Paths.get().normalize() and rejecting entries starting with "/"
   or resolving to parent traversal after normalization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix tests

* Fix tests

* Fix recordChange field name mismatches and CodeQL alert

- ServiceEntityRepository: recordChange("ingestionAgent") → "ingestionRunner"
  to match the JSON property name. The shouldCompare() gate in PATCH flow
  was silently dropping ingestionRunner changes because the field name
  didn't match patchedFields.
- DataContractRepository: compareAndUpdate("status") → "entityStatus"
  to match the JSON property name, same root cause.
- JsonUtils: Simplify Zip Slip check to string-based validation to
  satisfy CodeQL taint analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove serial mode from Users.spec.ts to prevent cascade failures

A single flaky test failure was causing ~19 tests across 5 unrelated
describe blocks to be skipped. Matches main branch behavior (parallel).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix flaky tests — missing awaits, hardcoded waits, silent catches

- DataProductPersonaCustomization: add missing await on expect() calls
- TestCaseIncidentPermissions: poll for incident creation instead of one-shot query
- TestCaseResultPermissions: add loader wait after Data Quality tab click
- GlossaryPermissions: replace waitForTimeout(3000) with toPass() retry
- BulkImport: remove 4 unnecessary waitForTimeout calls
- importUtils/testCases: replace waitForTimeout(500) with grid visibility assert
- GlossaryAssets: add loader wait, remove silent .catch(() => false) pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix CodeQL Zip Slip alert with Path.normalize() sanitization

CodeQL doesn't recognize String.contains("..") as proper Zip Slip
mitigation. Use Path.normalize() + isAbsolute/startsWith checks which
CodeQL's taint analysis model understands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Playwright flaky tests: modal visibility, toast race, query card assertion

- DataProductPersonaCustomization: wait for dialog close before clicking add-widget-button
- entity.ts restoreEntity: dismiss stale toast before restore to avoid race condition
- QueryEntity: replace page.$$() with auto-retrying expect().toBeVisible()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix flaky TableResourceIT by preventing parallel multi-domain rule mutation

Both test_multipleDomainInheritance (TableResourceIT) and
test_csvImportEntityRuleValidation (DatabaseServiceResourceIT) toggle
the global "Multiple Domains are not allowed" rule. When running
concurrently, one overwrites the other's setting causing spurious
failures. Add @ResourceLock("MULTI_DOMAIN_RULE") to serialize only
these two tests while keeping all others concurrent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:38:31 -07:00
Pere Miquel Brull
62c12a133d
Fix 1.13.0 preview→enabled migration for event subscriptions (#26473)
* Fix preview→enabled migration for event_subscription_entity and QRTZ tables

The 1.13.0 migration renamed `preview` to `enabled` in `apps_marketplace`
and `installed_apps`, but missed the `event_subscription_entity` table.

The ReverseMetadata app stores the full App entity as an escaped JSON
string inside `event_subscription_entity.json -> config -> app`. Since
it's a string value (not a nested JSON object), standard JSON path
operations can't reach the `"preview"` field — string replacement is
needed instead.

Also truncates QRTZ tables to clear stale Quartz job data that may
contain old App JSON. Both schedulers re-create their jobs from the
database on startup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use DELETE instead of TRUNCATE for QRTZ cleanup to respect FK constraints

TRUNCATE fails on tables referenced by foreign keys in MySQL (and
without CASCADE in PostgreSQL). Switch to DELETE FROM with correct
FK ordering (children before parents) and add missing child tables
(QRTZ_SIMPROP_TRIGGERS, QRTZ_BLOB_TRIGGERS).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 15:17:59 +01:00
Pere Miquel Brull
f890e004ce
Move preview-to-enabled migrations from 1.11.13 to 1.13.0 (#26281)
The migrations renaming the 'preview' property to 'enabled' in apps
were incorrectly placed under 1.11.13. Move them to 1.13.0 where they
belong, since this change targets the next major release.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 12:36:32 +01:00
Teddy
40bf82f604
Minor move 20 migrations (#26236)
* FIX - Redshift converter (#26229)

(cherry picked from commit ce8e1e5b5b)

* chore: move 2.0 migration to 1.13.0

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2026-03-05 08:11:15 -08:00