Commit graph

60 commits

Author SHA1 Message Date
Sriharsha Chintalapani
09af9fc801
Fixes #4003: bulk + async restore for large entity hierarchies (#27997)
* fix(restore): bulk + async restore for large entity hierarchies

EntityRepository.restoreEntity walked descendants synchronously, taking
4+ minutes on a 12k-table database and exceeding typical proxy timeouts.
restoreChildren now groups CONTAINS children by type and dispatches one
bulkRestoreSubtree per type, batching DB writes, version history,
change events, and cache invalidation; the existing ES cascade handles
descendant index updates in one update_by_query.

Adds an async option (?async=true) on the deep-hierarchy restore
endpoints that returns 202 Accepted with a job id and runs the restore
on AsyncService, emitting WebSocket notifications on
restoreEntityChannel. Java SDK adds .restore().async().execute() fluent
builders on Tables/Databases plus restoreServerAsync on
EntityServiceBase; Python SDK mirrors this with
restore_request().with_async().execute() and restore_async() helpers
on BaseEntity, exposing a new AsyncJobResponse type.

Tests: EntityRepositoryRestoreTest verifies the per-type grouping and
bulk dispatch path; RestoreFluentAPITest covers the Java SDK fluent
behavior; RestoreHierarchyIT exercises sync and async restore against a
real DB→schemas→tables tree end-to-end; test_restore_async.py covers
the Python SDK paths.

Fixes #4003
2026-05-20 17:57:40 -07:00
Sriharsha Chintalapani
86e4127e9b
feat(context-center): /v1/contextCenter namespace + Java/Python SDK support (#28237)
* refactor(context-center): move endpoints under /v1/contextCenter namespace

Renames the Context Center API paths to disambiguate from the external Drive
Service connector. Singular /v1/drive/* and plural /v1/drives/* were trivially
easy to confuse; the new prefix makes namespace ownership obvious.

  /v1/drive/files     -> /v1/contextCenter/files
  /v1/drive/folders   -> /v1/contextCenter/folders
  /v1/knowledgeCenter -> /v1/contextCenter/pages

Drive Service endpoints (/v1/drives/*, /v1/services/driveServices) and the
generic /v1/attachments endpoint are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(context-center): nest drive files/folders under /drive sub-namespace

Keeps the Drive concept visible in the URL while staying under the
contextCenter prefix.

  /v1/contextCenter/files   -> /v1/contextCenter/drive/files
  /v1/contextCenter/folders -> /v1/contextCenter/drive/folders

Pages stays at /v1/contextCenter/pages (not a Drive concept).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(context-center): add Java + Python SDK support for files, folders, pages

Wires the Context Center entities into both SDK fluent surfaces so callers do
not have to hit /v1/contextCenter/drive/files, /v1/contextCenter/drive/folders,
or /v1/contextCenter/pages directly.

Java (openmetadata-sdk):
  * ContextFileService + PageService alongside the existing FolderService.
    Standard CRUD comes from EntityServiceBase; entity-specific methods
    expose move (files), vote / follower / hierarchy (pages), and folder
    contents (folders).
  * Folders, ContextFiles, Pages fluent wrappers with Creator and Finder
    builders, plus static convenience methods.
  * OpenMetadataClient.folders() / contextFiles() / pages() accessors.
  * OM.Folder / OM.ContextFile / OM.Page entry points.

Python (ingestion):
  * Folders, ContextFiles, Pages entity facades over BaseEntity for CRUD,
    list, retrieve, and search via the existing ometa client.
  * Top-level metadata.sdk re-exports + lowercase aliases.

Binary download and multipart upload are intentionally not exposed yet — those
endpoints need streaming / multipart support that the SDK HTTP layer does not
currently provide. Page voting / follower / hierarchy operations are Java-only
for the same reason (no underlying ometa methods).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: apply spotless line wrap in DriveFileUploadIT

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix playwright test

* fix(context-center): route entity delete + playwright regex through new path

DeleteWidgetClassBase still returned the old 'knowledgeCenter' URL segment for
KnowledgePage / KnowledgeCenter entity deletes, so the UI was issuing
DELETE /api/v1/knowledgeCenter/{id}?hardDelete=... — which the server no
longer serves, returning 404. ContextCenter playwright spec also had a leftover
escaped regex (/\/api\/v1\/knowledgeCenter\/.../) matching the same legacy URL.

Point both at /api/v1/contextCenter/pages so the Knowledge Center / Context
Center delete flows reach the server again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(context-center): re-align playwright delete matchers with new path

A prior fix-up commit (844a8736d2) reverted two Playwright waitForResponse
matchers back to /api/v1/knowledgeCenter, but the UI now correctly issues
the request against /api/v1/contextCenter/pages (after the DeleteWidgetClassBase
fix in c34552217f). Restoring the matchers so they line up with the live
server endpoint and the tests can observe the response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(context-center): widen hard-delete poll window to 30s

testHardDeleteRemovesObjectFromMinIO polled for the file to disappear within
10 seconds, but the hard-delete chain (soft-delete -> async background worker
-> search/relationship cleanup -> row drop -> MinIO unlink) regularly exceeds
that window in CI. Awaitility was returning the entity still present at
status 200 inside the deadline.

Bump atMost to 30s with a 200ms poll interval (matching FolderResourceIT's
hardDeleteEntity pattern) so the test reflects the real async budget instead
of a tight, machine-dependent guess.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix playwright tests

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Rohit0301 <rj03012002@gmail.com>
Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com>
2026-05-20 08:28:10 -07:00
Ram Narayan Balaji
fc74d08cb0
fix(deps): remove explicit jersey version pins to inherit 3.1.11 from BOM (#28106)
jersey-client and jersey-apache-connector were pinned to 3.1.9 in
openmetadata-sdk and openmetadata-integration-tests. The root pom.xml
jersey-bom already manages all org.glassfish.jersey.* artifacts at 3.1.11.
Removing the explicit pins lets both modules inherit 3.1.11 from the BOM,
consistent with the rest of the project.
2026-05-19 17:19:04 +00:00
Pere Miquel Brull
7485c5b421
feat: add ContextMemory entity (Context Center memories) (#28224)
* feat(spec): add ContextMemory + CreateContextMemory JSON schemas

* feat(jdbi3): add ContextMemoryDAO

* feat: register contextMemory entity type constant

* feat(service): add ContextMemory repository, resource, mapper

* feat(bootstrap): add context_memory table DDL

* test(service): ContextMemory resource CRUD test

* fix(context-memory): address review (relationship types, stable FQN, status msg, test name)

- storeRelationships: rootMemory -> Relationship.CONTAINS, parentMemory -> Relationship.HAS
  so the root-ancestor and direct-parent hierarchies are distinguishable.
- setFullyQualifiedName: derive from the immutable name only (drop mutable
  primaryEntity/owner derivation that destabilized nameHash on update).
- validateStatusTransition: separate "no transitions defined" from "disallowed transition".
- Rename ContextMemoryResourceTest -> ContextMemoryStatusTransitionTest (pure unit test).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(context-memory): add ContextMemoryIT + SDK ContextMemoryService

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(spec): register contextMemory in EntityLink.g4 ENTITY_TYPE grammar

EntityLinkGrammarTest.testAllEntityTypesHaveGrammarOrExclusion enumerates every
Entity.java constant and requires each to be in the EntityLink grammar or the
test's exclusion list. ContextMemory is a normal EntityRepository-backed
top-level entity (like learningResource / contextFile), so it belongs in the
ENTITY_TYPE rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(context-memory): override owner ITs for creator-as-owner default

ContextMemoryMapper.defaultOwners() intentionally assigns the creating
user as owner when the create request omits owners. BaseEntityIT's
patch_entityUpdateOwner_200 and patch_entityUpdateOwnerFromNull_200
assert "no owner initially" for any supportsOwners entity, so both
failed for ContextMemory.

Override both in ContextMemoryIT: keep the PATCH-replace-owner contract,
change only the precondition to expect the creator as the sole initial
owner (asserted by count, not a hardcoded principal). Mapper unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update generated TypeScript types

Add the generated ContextMemory TS types (entity/context/contextMemory.ts,
api/context/createContextMemory.ts). The schemas were on the branch but their
generated types were missing, failing the TypeScript Type Generation check on
this fork PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(context-memory): address review (relationship cleanup, owner scope, validations)

Copilot review on the ContextMemory entity:
- #1 record primaryEntity/relatedEntities/root/parent/source*/machineRepresentation
  in version history; usageCount/lastUsedAt documented as untracked telemetry
- #2 clear stale HAS/RELATED_TO/CONTAINS edges before re-adding in storeRelationships
- #4 default creator as owner only on create; PUT without owners no longer
  silently replaces previously set owners
- #5 schema documents that any status is allowed at creation; transitions
  enforced only on update
- #6 setFullyQualifiedName via FullyQualifiedName.build with skip-if-set guard
- #7 validate shared principal type is user/team/domain
- #8 reject self-reference for parentMemory/rootMemory
- #10 inline Entity.CONTEXT_MEMORY, drop redundant constant

Regenerate ContextMemory TS types for the schema doc change; add IT coverage
for the self-reference and invalid-shared-principal validations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(context-memory): don't blanket-delete relationships (domain data loss)

The #2 cleanup via deleteTo(memory, CONTEXT_MEMORY, HAS, null) also matched the
framework's domain --HAS--> memory edge (storeDomains runs before
storeRelationships in storeRelationshipsInternal, on every create and update),
silently dropping domain assignments.

storeRelationships is now add-only (addRelationship upserts, so re-running on
update is idempotent). Stale-edge cleanup moved to ContextMemoryUpdater using
the framework's updateFromRelationship(s) helpers, which delete only the
specific changed refs and record the version change. parentMemory now uses
Relationship.PARENT_OF (distinct from primaryEntity's HAS and the framework's
domain HAS) so the parent edge can be maintained without collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(bootstrap): move context_memory DDL from 2.0.1 to 2.0.0

The context_memory table belongs in the 2.0.0 migration. Relocated the
MySQL and Postgres DDL verbatim; the 2.0.1 schemaChanges.sql files are
restored to their original task_migration_mapping-only content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(bootstrap): add ENGINE=InnoDB to context_memory MySQL DDL

Explicit engine clause, consistent with the task/search-index tables in the
same migration and robust to any server default change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(context-memory): preserve sanitized/validated fields; validate relatedEntities

Review follow-ups:
- ContextMemoryMapper no longer re-sets description/owners/domains/tags/displayName
  after copy(). copy() sanitizes description (stored-XSS) and validates owners and
  domains; re-setting the raw request values bypassed both. Only ContextMemory-
  specific fields are set now.
- prepare() now assigns the result of EntityUtil.populateEntityReferences back onto
  relatedEntities so orphaned/invalid refs are filtered instead of persisted.
- ContextMemoryIT Javadoc now references ContextMemoryRepository#setCreatorAsDefaultOwner
  (the defaultOwners mapper method no longer exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 18:10:46 +02:00
Sriharsha Chintalapani
5c53151d16
Fixes #24294: support re-parenting a Container via PATCH (#28201)
* feat(container): support re-parenting a Container via PATCH (#24294)

Allow the PATCH API to update a Container's `parent`, cascading the FQN
change to every descendant container, nested column FQN, tag-usage row,
entity-link, policy condition, and search-index document — same shape as
GlossaryTerm re-parenting. Scoped to same-StorageService moves; cross-service
parents are rejected with HTTP 400. Adds parent-aware fluent SDK methods in
Java (`Containers.under(...)`, `FluentContainer.withParent(...)`/`withoutParent()`)
and Python (`Containers.set_parent`, `Containers.clear_parent`), unit tests
for the validation logic, 11 integration tests covering the cascade and
rejection paths, and 9 + 2 SDK tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(container): reject re-parent PATCH on oversized subtrees (#24294)

The single-transaction cascade in updateParent locks every descendant row
in storage_container_entity, rewrites their JSON, renames their tag_usage
rows, and issues an Elasticsearch updateByQuery — all while holding
row locks that block any concurrent write on the subtree. At ~10k+
descendants this becomes a multi-minute outage on the cluster.

Add an indexed COUNT(*) preflight in ContainerUpdater.updateParent that
short-circuits with HTTP 400 when the moved subtree exceeds
openmetadata.container.maxReparentDescendants (default 10000), pointing
the operator at the system property if they have measured the impact
and accept it. Run BEFORE invalidateCacheForRenameCascade so a rejected
request pays no cache-eviction cost.

Tests: 5 new unit tests in ContainerRepositoryParentValidationTest cover
under/at/over-limit and the system-property override; 2 new IT methods
in ContainerResourceIT exercise the end-to-end reject path with a
test-scoped low threshold and confirm at-limit moves still succeed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(container): address PR #28201 review — perf, cycle detection, IT race (#24294)

Four fixes from the gitar-bot review:

1. Performance — validateContainerParent short-circuits when the proposed
   parent id matches the original. Skips an unnecessary Entity.getEntity
   round-trip on every container PATCH/PUT that doesn't touch the parent
   field (description edits, tag adds, etc.).

2. Cycle detection — second-line ID-based ancestor-chain walk added in
   ContainerUpdater.validateAncestorChainCycle. Uses
   relationshipDAO.findFrom (direct DB) so a descendant with a briefly
   stale FQN can't bypass the FQN-prefix check. Visited-set bounded.

3. IT race condition — drop System.setProperty in
   patch_containerParent_rejectsOversizedSubtree_400 and
   patch_containerParent_allowsMoveAtConfiguredLimit_200. Add a
   package-accessible test-override field (set/clear via public static
   methods) plus @ResourceLock(MAX_REPARENT_DESCENDANTS_TEST_LOCK) to
   serialize any test that mutates the override, even though the class
   runs methods concurrently.

4. SQL build comment — document why ContainerDAO.updateFqn interpolates
   values via String.format (mirrors EntityDAO pattern, FQN values are
   server-computed, escapeApostrophe handles the only SQL metacharacter
   that can appear in a validated entity name).

Tests: ContainerRepositoryParentValidationTest extended to 12 cases
(adds parent-unchanged short-circuit assertion + override-priority
coverage). Full ContainerResourceIT still 255/255.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(container): address PR #28201 copilot review (#24294)

Two findings from copilot-pull-request-reviewer:

1. setFields perf regression — restored the conditional
   fields.contains(FIELD_PARENT) ? getContainerParent(c) : c.getParent()
   guard. The unconditional load forced an extra entity_relationship
   lookup on every container GET, which is a measurable regression on
   the hot path. The PATCH flow still loads parent because
   CONTAINER_PATCH_FIELDS includes parent (so fields.contains is true
   there). Full ContainerResourceIT still 255/255.

2. updateEntityLinks shallow walk — previously only iterated direct
   children, leaving deep descendants' (grandchildren+) legacy feed
   thread entityLinks pointing at the old FQN and breaking activity-
   feed navigation after a multi-level move. Now takes the
   renamedContainers snapshot captured by invalidateCacheForRenameCascade
   and rewrites each descendant's entityLink by swapping the FQN
   prefix (oldFqn → newFqn) — consistent with the same prefix-substitution
   ContainerDAO.updateFqn applies to the JSON/fqnHash.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 07:01:40 -07:00
sonika-shah
a160fcb145
fix(entity): null-safe updateColumns for entities without columns (#28047)
* fix(entity): null-safe updateColumns for entities without columns

PATCH on a File without columns (PDF/image/etc.) NPEs in
ColumnEntityUpdater.updateColumns because FileRepository.entitySpecificUpdate
unconditionally invokes the columns updater. recordListChange already
null-coalesces internally, but the subsequent `for (Column updated :
updatedColumns)` iteration does not — any updater calling updateColumns
with a null list hits the same path.

Null-coalesce origColumns and updatedColumns at the top of
ColumnEntityUpdater.updateColumns so any optional-columns entity is safe.

* test(entity): generic PATCH-add-tag and PATCH-add-glossary-term coverage

Add two generic tests to BaseEntityIT that any entity inheriting from it
must pass:

- patch_addClassificationTag_200_OK
- patch_addGlossaryTerm_200_OK

Both create a minimal entity, set a single tag/glossary-term label,
PATCH, and assert the label is present. Gated on supportsTags &&
supportsPatch, so an entity that opts out keeps doing so.

This gives every new EntityRepository subclass (e.g. the next entity
type someone adds) automatic defense against the class of bug where
the tag/glossary PATCH path NPEs on an unrelated optional field
(updateColumns with null columns being the original instance).

* test(it): migrate drive entity ITs to BaseEntityIT, add FolderResourceIT

Migrate FileResourceIT, DirectoryResourceIT, and SpreadsheetResourceIT
onto BaseEntityIT<T, K> so they automatically inherit the ~60 generic
entity tests (CRUD, owners, tags PATCH, glossary PATCH, soft-delete,
versions, custom extensions, etc.). Add a brand-new FolderResourceIT
covering the previously untested folder entity type.

The previous standalone harnesses were a migration gap from the bulk
"Faster tests" PR (#24948) — only WorksheetResourceIT got the full
BaseEntityIT plumbing; the rest of the drive family did not. This meant
bugs like the updateColumns NPE fixed in this PR did not surface in
the IT suite for File and friends.

Each entity sets feature flags conservatively:
- supportsFollowers/Domains/DataProducts/CustomExtension/BulkAPI/
  DataContract = false (matches the existing minimal surface area)
- Folder additionally sets supportsVersionHistory/GetByVersion = false
  since the FolderResource doesn't expose /versions

Entity-specific tests (column handling, directory hierarchy, root
filter, FQN structure, etc.) are preserved. Existing redundant smoke
tests (createMinimal, deleteById, getByName, ID/name not-found) are
removed since BaseEntityIT covers them.

Also add openmetadata-sdk/.../drives/FolderService.java so Folder has
SDK access (basePath /v1/drive/folders).

* fix(it): make FolderResourceIT pass inherited BaseEntityIT tests

Three real issues surfaced by the inherited generic tests after the
drive-IT migration:

1. FolderResource.list was missing @Min(0) / @Max(1000000) on the limit
   query param, so negative or excessive values were silently accepted.
   Add validation to match DirectoryResource/WorksheetResource. This
   fixes BaseEntityIT.get_entityListWithInvalidLimit_4xx.

2. FolderResource hard-delete is asynchronous — it kicks off
   deleteByIdAsync and returns 200 before the row is gone. The generic
   delete_entityAsAdmin_hardDelete_200 test asserts the entity is no
   longer fetchable immediately. Override hardDeleteEntity in
   FolderResourceIT to poll on include=deleted until the async delete
   completes.

3. BaseEntityIT.get_deletedEntityVersion_200 calls getVersion(...) but
   only gated on supportsSoftDelete/supportsPatch, missing the
   supportsGetByVersion gate. For entities like Folder that don't expose
   /versions, the test threw UnsupportedOperationException from the
   subclass override. Add the missing gate.

* chore: address gitar-bot review on PR #28047

- SpreadsheetResourceIT.test_listSpreadsheetsWithRootParameter was
  vacuously passing if the list came back empty (the for-loop body
  silently runs zero times). Add an explicit
  assertFalse(getData().isEmpty()) so a regression in the ?root=true
  filter actually fails the test.

- FolderResource.list parameter description said "1 to 1000000" but
  the annotation is @Min(0) — 0 is a valid limit (returns empty page).
  Align the description with the annotation.

* test(it): restore entity-specific drive tests dropped during BaseEntityIT migration

The previous migration commit was too aggressive in deleting tests it
assumed BaseEntityIT covers. Restore the entity-specific tests:

SpreadsheetResourceIT (8 restored):
- test_createSpreadsheetWithOptionalFields (displayName + description)
- test_updateSpreadsheet (Spreadsheet-specific path/size fields)
- test_spreadsheetWithWorksheets (@Disabled — worksheet relationship)
- test_listSpreadsheetsByService (?service filter)
- test_spreadsheetFQNPatterns (nested directory FQN construction)
- test_spreadsheetsWithAndWithoutDirectory (directory-presence variations)
- test_listSpreadsheetsWithRootParameterAndPagination (root + pagination)
- test_listSpreadsheetsWithRootParameterEmptyResult
- test_listSpreadsheetsWithRootParameterAcrossMultipleServices (@Disabled)

FileResourceIT (2 restored):
- test_createFileWithDisplayName
- test_fileWithAllOptionalFields

DirectoryResourceIT (1 restored):
- test_createDirectoryWithAllFields

BaseEntityIT still covers the genuinely redundant tests that stayed
deleted (CRUD smoke tests, get-by-id, get-by-name with fields, delete,
non-existent ID/FQN, fluent-SDK find variants).

* chore: address remaining gitar-bot + copilot review threads

FolderResource:
- Grammar: "Limit the number folders" -> "Limit the number of folders"

FolderResourceIT:
- Javadoc said Folder "supports ... followers" but supportsFollowers=false.
  Align doc with actual capability flags.
- Awaitility hard-delete poll was catching `Exception` and treating any
  failure as success (would mask transient 500s). Narrow to ApiException
  with statusCode==404; re-throw everything else so Awaitility surfaces
  real errors.

FileResourceIT:
- Pass String IDs to service.update for consistency with the rest of the
  IT suite (createdFile.getId().toString()).
- Compare service references via getFullyQualifiedName() instead of
  getName(); the latter only matches for top-level services and would
  silently break if the reference schema changes.

SpreadsheetResourceIT:
- Import @Disabled and use the short annotation form instead of
  @org.junit.jupiter.api.Disabled (project standards prohibit FQNs in
  annotations).
- Strengthen test_listSpreadsheetsWithRootParameter: assert the root
  spreadsheet we created appears in ?root=true results AND the child
  spreadsheet does NOT, so a broken root filter actually fails the test
  instead of passing on an empty list.

* test(it): restore all remaining originally-deleted drive tests

Per reviewer request, restore the rest of the tests that the migration
commit removed under the (incorrect) assumption they were fully
redundant with BaseEntityIT.

FileResourceIT (+8): test_createFileMinimal, test_createFileWithDescription,
  test_deleteFile, test_findFileById, test_findFileByNameWithFields,
  test_getFileByNameWithFields, test_getFileWithNonExistentId_shouldFail,
  test_getFileByNameWithNonExistentFQN_shouldFail.

DirectoryResourceIT (+10): test_createDirectoryMinimalRequest, test_getByName,
  test_getByNameWithFields, test_deleteDirectory, test_findDirectoryById,
  test_findDirectoryByName, test_findDirectoryWithFields,
  test_createMultipleDirectories, test_getNonExistentDirectory_fails,
  test_getByNameNonExistent_fails.

SpreadsheetResourceIT (+10): test_createSpreadsheet, test_createSpreadsheetMinimal,
  test_getSpreadsheetById, test_getSpreadsheetByName, test_deleteSpreadsheet,
  test_finderWithFields, test_finderByNameWithFields, test_getByNameWithFields,
  test_createMultipleSpreadsheetsUnderSameService, test_patchSpreadsheetAttributes.

BaseEntityIT generic coverage stays intact — the subclass tests and the
inherited tests now coexist (deliberate overlap, opted in by reviewer).

Counts vs ab535900da (original): File 14→18 (+4 column tests added),
Directory 15→15, Spreadsheet 27→27.

* test(it): restore exact original test bodies (assertNotNull lines included)

Prior commits restored test methods by name but trimmed bodies — e.g.,
dropped `assertNotNull(created)`, `assertNotNull(driveService)`, and other
redundant-but-original assertions. Reviewer asked for the tests "as is",
so this commit replays each restored method body byte-for-byte from
ab535900da (the introducing commit, PR #24948).

Preserved on top of the verbatim bodies:
- FileResourceIT: getFullyQualifiedName() comparison instead of getName()
  in test_createAndGetFile / test_fileWithAllOptionalFields (gitar-bot
  review fix).
- SpreadsheetResourceIT: @Disabled (short form) instead of
  @org.junit.jupiter.api.Disabled (gitar-bot review fix). The skip state
  and reasons on test_spreadsheetWithWorksheets and
  test_listSpreadsheetsWithRootParameterAcrossMultipleServices match
  the original — they were disabled in PR #24948 due to backend gaps,
  not by this PR.

* fix(entity): secondary NPE + logic bug in ColumnEntityUpdater.updateColumns

Copilot review caught two pre-existing bugs in the same method this PR
already touches:

1. NPE on added column tags. `added.getTags().stream()` NPEs when a
   column has no tags (Column schema defaults tags to null). Wrap with
   listOrEmpty(added.getTags()) so the stream is safe.

2. Inverted carry-forward condition. The original
       if (nullOrEmpty(addedColumn.getTags()) && nullOrEmpty(deleted.getTags()))
           addedColumn.setTags(deleted.getTags());
   only copied when BOTH sides were empty — a no-op. The intent is "if
   the added column has no tags but the deleted column did, preserve
   them". Flip to !nullOrEmpty(deleted.getTags()).

Both fit the scope of this PR (null-safety in updateColumns) so
including them rather than spinning a separate PR.

* revert: keep ColumnEntityUpdater tag-carry-forward condition untouched

Earlier commit (4a9a639) flipped
  nullOrEmpty(addedColumn.getTags()) && nullOrEmpty(deleted.getTags())
to
  nullOrEmpty(addedColumn.getTags()) && !nullOrEmpty(deleted.getTags())
based on a reviewer-bot suggestion. That logic change is out of scope
for this PR (the PR is null-safety for updateColumns, not semantics of
the carry-forward branch).

The line dates back to the original ColumnEntityUpdater implementation
and changing its behaviour silently — without a dedicated test or
release-note — could affect any flow where a column is "redefined"
(same name, different dataType/ordinal) on PUT or PATCH. Reverting to
the pre-existing form. If the carry-forward really is broken, that
deserves its own PR with regression coverage.

The NPE fix (listOrEmpty(added.getTags()) on the .stream() call)
remains.
2026-05-18 03:27:45 +00:00
Sriharsha Chintalapani
1352d67cf4
feat(dar): Granted lifecycle, filters, sort, and self-service create policy (#28044)
* feat(dar): add Granted lifecycle, filters, sort, and self-service create policy

Splits the Data Access Request lifecycle into Approved (awaiting grant) and
Granted (active access) so the UI can show an "approved – awaiting grant"
banner that clears once an admin marks the request as granted. Adds an
indexed approvedBy/approvedById/approvedAt on Task, captured at the approve
transition through a new direct-persist helper. Introduces a dedicated
/v1/tasks/dataAccessRequests endpoint pre-scoped to category=DataAccess with
DAR filters (dataset, service, status, requestedBy, approver, accessType)
and an asc/desc sort on createdAt; generic /v1/tasks gains service/approver
filters too. DataConsumerPolicy now grants Create on resource=task so
authenticated non-admins can file a DAR (fixes "operations [Create] not
allowed"). Reworks the workflow handler so transitions whose targetTaskStatus
is non-terminal (Approved, Granted) don't close the task, and updates
CreateTask.isTerminalTaskStatus to allow advancing between Approved →
Granted stages. Adds a new "active" statusGroup that includes the DAR
lifecycle states while preserving the existing open/closed semantics that
Glossary-style workflows depend on. Includes a Postgres + MySQL migration
for the indexed approvedById generated column and integration coverage in
DataAccessRequestIT spanning the new lifecycle, filters, sorting, approver
capture, and the non-admin policy path.

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anuj-kumary <anujf0510@gmail.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2026-05-16 07:35:15 -07:00
Ram Narayan Balaji
02615a3aaf
fix(deps): remove explicit jersey-client version pin in openmetadata-sdk (#28067)
Commit 97b8d998 bumped jersey-client from 3.1.9 to 3.1.10 in openmetadata-sdk/pom.xml,
but the root pom.xml BOM already pins all jersey artifacts at 3.1.11. The explicit
<version>3.1.10</version> override caused the SDK module to resolve an older version
than every other module in the repo. Removing the pin lets the SDK inherit 3.1.11
from the BOM, consistent with the rest of the project.
2026-05-13 12:41:44 +00:00
dependabot[bot]
97b8d998ce
chore(deps): bump org.glassfish.jersey.core:jersey-client (#28001)
Bumps org.glassfish.jersey.core:jersey-client from 3.1.9 to 3.1.10.

---
updated-dependencies:
- dependency-name: org.glassfish.jersey.core:jersey-client
  dependency-version: 3.1.10
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Harsh Vador <58542468+harsh-vador@users.noreply.github.com>
2026-05-12 21:46:41 -07:00
Sriharsha Chintalapani
22a6c10072
Context center (#27558)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Add Context Center: Migrate Knowledge Center , Images/ PDFs document support

* Add Context Center: Migrate Knowledge Center , Images/ PDFs document support

* Address PR #27558 review comments

- KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch
  and getHierarchyWithSearchForActivePage so the /search/hierarchy
  endpoint no longer NPEs when the pageType query param is omitted. The
  ES/OS client helpers already skip the pageType term when the value is
  null or empty, so this is a pure null-guard.
- ContextFileResource.uploadFile: when a failure happens after the
  ContextFileContent row is created (e.g. inside extractionService.submit),
  the cleanup path now hard-deletes that content row so the DB is not
  left with an orphaned record.
- ContextFileResource: replace the raw Content-Disposition string with a
  buildContentDisposition helper that emits both the legacy quoted
  filename= and the RFC 5987 filename*=UTF-8'' parameter with
  percent-encoded bytes, so international filenames round-trip while
  staying header-injection safe. sanitizeFileName also falls back to
  "download" on null/blank input.
- ContextFileResourceTest: new cases for sanitizeFileName null/blank
  fallbacks and for buildContentDisposition ASCII/unicode/space/injection
  behaviour (18 tests, all passing).

* Address copilot review comments on PR #27558

- AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix)
  matches the DAO signature — previous ordering always missed the index.
- FolderResource / ContextFileResource getEntitySpecificOperations: return
  List.of() instead of null so callers iterating the returned list cannot
  NPE.
- SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid
  helper that returns null for missing/malformed values and logs a warning
  instead of failing the whole hierarchy response.
- DaoListFilter: qualify the pageType column with the caller-provided
  tableName, rename getArticleCondition to getPageTypeCondition (legacy
  no-arg method kept as @Deprecated wrapper for compatibility).
- Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit
  getChildrenCountForPage search (N+1) with a single pass over the batch
  that derives childrenCount from pages whose parent is in the same
  result set. Also drops the now-unused helper and its throws clause.
- openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient,
  jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional>
  so they don't leak into SDK consumers that only use the core client.
- InMemoryAssetService: use the shared AsyncService executor for upload
  /read/delete instead of the JVM common ForkJoinPool.
- sample-pricing.xlsx: replace the plain-text placeholder with a real
  minimal XLSX workbook so detection-based and extraction-based code
  paths see a valid Microsoft Excel 2007+ file.

* Use one filters aggregation for page hierarchy childrenCount

Follow-up to b8458e2868. The previous fix derived childrenCount from
pages whose parent appeared in the same batch — that worked for
listPageHierarchyForActivePage (which fetches all depths) but always
returned 0 on the plain listPageHierarchy path (which only fetches one
depth), so top-level listings lost the count semantically.

Replace with a single `filters` aggregation keyed by page id: each
named bucket matches descendants via a fullyQualifiedName prefix query
against the page's FQN. That gives accurate direct-descendant counts
for every returned page in one aggregation round-trip, still O(1)
additional search requests regardless of batch size.

* Add allowedFields entries for contextFile, folder, page

Fixes SearchSettingsHandlerTest.testEveryAssetTypeHasCorrespondingAllowedFields.

searchSettings.json already had assetTypeConfigurations for contextFile,
folder, and page but no matching allowedFields entries, so the test that
asserts every assetType has a corresponding allowedFields block failed
with 'Asset type contextFile has no corresponding allowedFields entry'.

Adds the three missing blocks with the fields that each index actually
exposes — name / displayName (with .keyword and .ngram variants),
description, fqn, fqnParts, tags/tier/domains/dataProducts, plus
entity-specific fields (fileType/contentType/extractedText for
contextFile, parent.displayName for folder/page, pageType for page).

* Fix ui checkstyle

* Fix Java checkstyle

* Address PR #27558 copilot review round 2

- ES/OS populateChildrenCounts: add fqnDepth == parentDepth + 1 to the
  per-page filter so childrenCount is direct children only, matching the
  field name and the UI's isLeaf check semantics. Previously matched all
  descendants.
- ES/OS buildPageNestedSearchHierarchy: filter out hits with a null id
  before Collectors.toMap, which would otherwise NPE when SearchUtils
  drops a malformed UUID.
- SearchUtils.getPageHierarchy: wrap PageType.fromValue in a parsePageType
  helper that logs and returns null on unknown values, so a single bad
  hit can no longer break the whole hierarchy response.
- TestSuiteBootstrap.setupMinIO: pin minio/minio to
  RELEASE.2024-01-16T16-07-38Z instead of :latest so a newly-published
  image cannot break integration tests without a code change.
- createContextFile.json: rewrite the assetId description to be provider
  agnostic (S3 / Azure Blob / in-memory / no-op) and flag it as the legacy
  path, preferring headContentId / ContextFileContent.

* Update generated TypeScript types

* Address PR #27558 copilot review round 3

- bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql:
  - asset_entity: add PRIMARY KEY (id); mark all generated columns STORED
    for consistency with the other drive/knowledge tables in the same
    migration; compute deleted as a real boolean via
    IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) so the boolean index
    behaves correctly.
  - knowledge_center: mark name, updatedAt, updatedBy, pageType as STORED
    and apply the same deleted expression so the existing indexes on
    name and (fqnHash, deleted) are reliable on fresh installs.
  - drive_folder / context_file / context_file_content: update the
    deleted generated column to use the same boolean-safe expression.
- ElasticSearch/OpenSearch hierarchy search: add an explicit sort on
  fullyQualifiedName ASC with _id ASC as tiebreaker so from/size
  pagination is deterministic and cannot skip/duplicate pages between
  requests.

* Fix UI checkstyle

* Address PR #27558 copilot review round 4

- createPage.json: rewrite the field descriptions for name, displayName,
  owners, reviewers, and entityStatus. They were copy/pasted from other
  schemas ('query', 'tag') and were misleading in generated docs and
  clients.
- NoOpAssetService.generateDownloadUrlWithExpiry: return asset.getUrl()
  instead of a synthetic 'https://cdn.example.com/...' URL. The old
  behaviour let clients attempt downloads that would never resolve when
  object storage was disabled; returning the asset's own (empty) URL
  surfaces the misconfiguration cleanly.
- AzureAssetService: normalize the prefix path the same way S3 does.
  Previously a null/blank prefix produced the literal 'null/' prefix,
  writing blobs under the wrong key. New formatPrefix returns "" for
  null/blank and ensures exactly one trailing '/' for a real prefix.
- AssetRepository.getByFQN: treat null *or* empty list as 'not found',
  matching getByFqnPrefix. Callers previously received an empty list
  silently when the DAO returned [] instead of a 404.

* Update generated TypeScript types

* Fix UI checkstyle

* Address PR #27558 copilot review round 5

- AssetDAO.update / AssetRepository.update: switch the UPDATE target from
  fqnHash to id. Two assets can share the same fullyQualifiedName (e.g.
  successive revisions of the same context file), so the old SQL could
  silently update sibling rows.
- ContextFileExtractionService: run the extraction pipeline on a
  dedicated fixed thread pool instead of AsyncService.getExecutorService.
  process() blocks on assetService.read(...).join(), and S3/Azure reads
  are themselves scheduled on AsyncService — sharing the same bounded
  pool risks starving those reads (and deadlocking) once every thread is
  busy running extractions.
- postgres/schemaChanges.sql: wrap the generated deleted column in
  COALESCE((json ->> 'deleted')::boolean, false) (and the asset_entity
  CAST variant) so an absent 'deleted' key is stored as FALSE, not NULL.
  Otherwise "non-deleted" filters based on the boolean index drop rows
  silently. Matches the MySQL IFNULL(..., FALSE) side of the migration.
- ContextFileUploadSupport.sanitizeEntityName: treat null/blank input as
  'file' instead of NPE-ing on replaceAll. Multipart uploads can arrive
  without filename metadata; the upload should still succeed with a
  stable generated name.

* Remove macOS-only @rollup/rollup-darwin-arm64 dev dep

I pinned this during local troubleshooting to get a Vite dev server
running on macOS (rollup's optional native binary was missing). CI runs
on Linux, where yarn install --frozen-lockfile refuses the package
('The platform \"linux\" is incompatible with this module'), which
broke license-header, lint-src, lint-playwright, i18n-sync, app-docs,
and ui-coverage-tests for PR #27558.

rollup re-resolves its native binary per platform — there's no reason
to pin the darwin one. Remove it from package.json and drop the
matching '@rollup/rollup-darwin-arm64@^4.60.2' block from yarn.lock.

* Re-declare optional SDK test deps on integration-tests classpath

KnowledgeCenterIT failed in CI with
'java.lang.NoClassDefFoundError: org/glassfish/jersey/apache/connector/ApacheConnectorProvider'
after I marked the JAX-RS client stack in openmetadata-sdk as
<optional>true</optional> during review round 2. That change stops the
deps from leaking to every SDK consumer, but integration-tests actually
uses org.openmetadata.sdk.test.util.RestClient, so the optional deps
must be re-declared on its own classpath.

Adds jakarta.ws.rs-api, jersey-client, jersey-apache-connector,
httpclient, jakarta.json-api, and parsson to
openmetadata-integration-tests/pom.xml as <scope>test</scope>.

* Fix IT failures from CI integration-tests-mysql-elasticsearch

1. MySQL deleted column: revert the IFNULL wrapper to plain
   (json -> '$.deleted'). My earlier
   IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) hit
   'Incorrect integer value: false for column deleted' on fresh installs
   because MySQL cannot coerce the resulting JSON scalar into TINYINT(1)
   when the column is STORED. The bare '(json -> '$.deleted')' form is
   what other OM tables already use, and MySQL converts JSON true/false
   to 1/0 directly for the BOOLEAN column. STORED + PRIMARY KEY stay
   in place.
2. DriveFileUploadIT: raise the four short atMost(5s) awaits to 20s
   with explicit pollDelay(ZERO) + pollInterval(200ms).
   K8sOMJobOperatorIT sets a global Awaitility pollInterval of 5s at
   class setup; any subsequent test with atMost <= 5s hits
   'Timeout must be greater than the poll delay'. Overriding the
   per-call poll settings insulates these asserts from the global
   leak.

* Document SDK test-utility optional deps

In review round 2 we marked jersey-client, jersey-apache-connector,
jakarta.ws.rs-api, httpclient, jakarta.json-api, parsson, java-jwt, and
junit-jupiter-api as <optional>true</optional> on openmetadata-sdk so
that core SDK consumers don't inherit a heavy JAX-RS + JUnit stack.
openmetadata-integration-tests hit this immediately with
NoClassDefFoundError from RestClient; its own pom now re-declares the
deps.

Add a "Test utilities" section to the SDK README that lists the
optional deps downstream test-utility consumers must re-declare (with
the concrete <scope>test</scope> XML snippet) and explains the error
they'd otherwise see.

* NoOpAssetService: never return null from generateDownloadUrlWithExpiry

In review round 4 I changed this method to return asset.getUrl() when
the asset is non-null. But Asset.url is optional in the schema, so
asset.getUrl() itself can be null — which breaks the implied "never
returns null" contract downstream callers rely on (AttachmentResource
only null-checks defensively).

Normalize null and blank URLs to an empty string so the method's
non-null, non-blank contract holds even when storage is disabled and
the asset was never populated with a URL.

* AssetServiceFactory: swap to NoOp when re-initialized with storage off

init(...) previously only assigned NoOpAssetService when instance was
null. On a re-init with object storage toggled off (config reload, test
teardown, etc.), the previously wired S3/Azure/InMemory provider stayed
live and kept serving real IO against a backend the operator thought
was disabled.

Replace the instance with a fresh NoOp when storage is disabled unless
the instance is already a NoOp (idempotent on repeated disabled
inits).

* Type create-request domains arrays as fullyQualifiedEntityName

The three new KC/Drive create schemas (createFolder, createContextFile,
createPage) had domains as an array of unconstrained strings. The rest
of the OM API models domain references as FQNs, and the shared
basic.json#/definitions/fullyQualifiedEntityName is the convention for
this.

Point all three items refs at fullyQualifiedEntityName so generated
clients see a consistent FQN type and requests get validated for
non-empty length/format rather than any string.

* Update generated TypeScript types

* Address PR #27558 copilot review 4144965142

- ContextFileExtractionService: switch the default thread pool to
  a static final DEFAULT_EXECUTOR, so every production instance of the
  service reuses the same pool instead of leaking a fresh fixed pool
  per construction (tests especially create multiple instances).
  Threads remain daemons, so the pool never blocks JVM shutdown.
- ObjectDeleteQueueService: when queueCapacity is 0, use a
  SynchronousQueue so "reject-if-all-workers-busy, no buffering" holds.
  Previous Math.max(1, queueCapacity) silently allocated a 1-slot
  ArrayBlockingQueue, contradicting the caller's stated capacity and
  potentially buffering one task past the semaphore's accounting.

Not fixing:
- SearchUtils @Slf4j 'LOG' vs 'log'. OM's openmetadata-service/lombok.config
  sets 'lombok.log.fieldName = LOG', so @Slf4j correctly generates
  'LOG' for every class in this module. The reviewer's concern only
  applies to projects without that directive. Verified clean compile.

* Address PR #27558 copilot review 4144917449

- knowledgeCenterTags.json: change mutuallyExclusive from the string
  "false" to the JSON boolean false. The Classification schema declares
  this as `"type": "boolean"`; jackson's lenient string->boolean
  coercion masked it until now, but strict validators would reject and
  the other OM bootstrap tag files that use the correct boolean
  (piiTagsWithRecognizers.json) model what this should look like.

- ContextFileExtractionService.process: guard the updateContent
  updater with the same head-content check already used in
  updateFile. Previously, if headContentId flipped between the
  initial check and the status writes, updateFile would no-op while
  updateContent still marked the now-stale content "Analyzing",
  leaving it stuck once the later early-return fires.

- AzureAssetService.upload: stream the InputStream straight to the
  blob using the known asset.getSize() instead of reading the whole
  payload into a byte[] via IOUtils.toByteArray. Matches the S3
  streaming behaviour and avoids full-file heap pressure / OOM risk
  on larger files. Buffered fallback retained when size is unknown.

- Size fields modeled as integer: flip fileSize / size on
  createContextFile.json, contextFile.json, asset.json,
  createAsset.json, and contextFileContent.json from
  "type": "number" to "type": "integer" with "format": "int64" and
  "minimum": 0. Byte counts are inherently whole numbers; floating
  point loses precision above 2^53 and makes validation murky.
  Update the (double) call sites in ContextFileResource,
  ContextFileUploadSupport, and AttachmentResource to match.

Not fixing:
- ContextEntityPromptService "unused Authorizer import" — false
  positive, the class uses it in the constructor.
- NoOpAssetService.generateDownloadUrlWithExpiry null return — already
  fixed earlier in commit a4a2dcc91d (returns "" when url is
  null/blank).

* AssetService.read: run inline instead of hopping through AsyncService

Every caller of AssetService.read(...) immediately .join()s on the
returned future:

- ContextFileExtractionService.process reads + extracts
- ContextFileResource.downloadFile reads + streams back
- AttachmentResource.serveAsset reads + streams back
- QueuedDeleteAssetService just delegates

None of them exploit the async nature, but the S3/Azure/InMemory
implementations all wrapped the blocking fetch in
AsyncService.executeAsync or CompletableFuture.supplyAsync on a
bounded pool. That created a starvation path when any caller thread
was already running on AsyncService (or could monopolize it under
load) — join() would block the caller while the submitted read
task fought for a free worker.

Switch S3, Azure, and InMemory read() to execute on the caller's
thread and return CompletableFuture.completedFuture(...). Interface
is unchanged so existing .join() callers keep working; the extra
thread hop and the potential for AsyncService starvation are both
gone. Combined with the dedicated context-file-extraction pool, the
extraction pipeline no longer touches AsyncService for any
asset-read step.

* Address PR #27558 copilot review 4151211562

- FolderIndex / ContextFileIndex: stop re-setting entityType, deleted,
  owners, totalVotes inside buildSearchIndexDocInternal. Those common
  fields are populated by populateCommonFields in the SearchIndex
  template method (Phase 1) before Phase 3 calls the entity-specific
  internal builder, so the explicit puts were redundant and silently
  overrode the template output. Aligns with PageIndex convention and
  updates the unit tests to assert the internal builder sets only
  entity-specific fields.

- ContextFileTextExtractor: bound the Tika BodyContentHandler at
  MAX_CANONICAL_TEXT_LENGTH instead of passing -1 (unbounded) so a
  pathological image cannot drive OCR to accumulate arbitrary output
  on the heap.

- ContextFileExtractionService: replace the unbounded
  Executors.newFixedThreadPool backing queue with a ThreadPoolExecutor
  using an ArrayBlockingQueue + AbortPolicy. Without a bounded queue
  the RejectedExecutionException handling in submit(...) was dead
  code; with it, an overloaded server surfaces a "retry later"
  failure status instead of silently accumulating work.

- S3AssetService / AssetService / AssetServiceFactory /
  QueuedDeleteAssetService: make AssetService extend AutoCloseable
  with a default no-op, override close() in S3AssetService to release
  the S3Client and S3Presigner connection pools, and register a
  shutdown hook in AssetServiceFactory that closes the current
  provider on JVM exit (and on re-init when the provider changes).

- bootstrap 2.0.0 MySQL schemaChanges: change the deleted generated
  column from (json -> '$.deleted') to
  (JSON_EXTRACT(json, '$.deleted') IS TRUE) so rows where the JSON
  key is absent resolve to FALSE instead of NULL. Avoids filter misses
  on the composite (fqnHash, deleted) index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Java checkstyle

* Fix integration test compile + S3 generateDownloadURL

ContextFileIT / DriveFileUploadIT compile failures came from the
fileSize schema switch to integer/int64 — the generated setter/getter
is now Integer. Replace the double literals with ints and the
assertEquals(double, ...) sites with intValue() so the (int, int)
overload resolves unambiguously.

Also override S3AssetService.generateDownloadURL to return a
short-lived presigned URL (mirroring AzureAssetService) instead of
inheriting the default, which would return the raw S3 key from
asset.url. Addresses review 4151282021.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert MySQL deleted column back to bare json -> expression

The JSON_EXTRACT(...) IS TRUE form broke integration tests — GET after
create started returning 404, consistent with MySQL evaluating the
IS TRUE predicate against the JSON scalar in a way that stored 1
instead of 0 for freshly-created rows (deleted=false).

Restoring the bare (json -> '$.deleted') expression used pre-review.
Rows with the key missing will store NULL on the generated column,
which is a theoretical concern the review flagged but does not affect
current code paths (all inserts write json.deleted explicitly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Transi18next import path in KnowledgeCenter components

Two KnowledgeCenter files imported Transi18next from
'utils/CommonUtils', which is where Collate's UI re-exports it from.
OpenMetadata core exports Transi18next from 'utils/i18next/LocalUtil'
(same path every other core file uses). The Collate-style import
broke the production Vite/Rollup build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Harden ContextFileIT.testFileAppearsInSearch against async indexing

The test used a fixed Thread.sleep(2000) then a single assertEquals
on the status code. That was flaky two ways: ES indexing is async
and the 2s window is not always enough, and on a fresh cluster the
context_file_search_index itself may not exist yet at first query
(yielding 500).

Replace with an await() loop that polls every 200ms for up to 30s
and asserts both status==200 AND that the newly-created file's UUID
appears in the response. Matches the assertSearchContainsFile
helper in DriveFileUploadIT.

Also URL-encode the namespaced query string so the uniqueName
does not break the query parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Make playwright editor shortcuts platform-aware

The SHORTCUTS constant in playwright/constant/KnowledgeCenter.constant.ts
hard-coded "Meta+b" / "Meta+z" / etc. On macOS Meta is Cmd and those
shortcuts trigger bold / undo / copy as expected, but on the Linux CI
runners Meta is the Super (Windows) key — so every ProseMirror
formatting and history test just pressed Super+b, which does nothing,
and the test then fails waiting for the <strong>…</strong> element
(or for the undone text to disappear).

Detect the runner platform and use Meta on macOS, Control everywhere
else — matching the same pattern in src/constants/KnowledgeCenter.constant.ts.

Unblocks the 6 KnowledgeCenterTextEditor failures across Admin / Data
Consumer / Data Steward roles (Text Formatting + Undo/Redo). Slash
commands keep passing because they don't depend on modifier keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Run prettier on DateTimeUtils.ts

CI's lint-src job fails because ESLint+Prettier --fix produces a
non-empty diff against the committed tree. Local prettier pass
trimmed the indentation and added a trailing comma in the imports
block. No behavioral change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Knowledge Page entity-link + DAO filter regressions from the port

Downloaded the failing playwright traces from the PR's postgres e2e run
and walked each one. Three distinct bugs, all present because the
Collate-side overrides (overrides/EntityUtilClassCollate.ts and the
DaoExtension.KnowledgeExtensionDAO custom SQL) were not carried over
into OpenMetadata core when KnowledgeCenter was merged up.

1) CollectionDAO.KnowledgePageDAO: override listCount / listBefore /
   listAfter (plus helper SQL queries) so that
   `GET /v1/knowledgeCenter?entityId=X&entityType=topic` actually INNER
   JOINs entity_relationship and returns only pages whose
   relatedEntities contains the target entity. Without this the base
   EntityDAO ignored entityId/entityType entirely and returned every
   page, which is why the "Knowledge Articles" widget on a data asset
   page showed the 15 fixture articles instead of the one just attached
   — and why updateDataAsset timed out waiting for the linked article.
   Uses OWNS relation for user/team filters (same semantics Collate
   uses) and HAS for every other entity type.

2) EntityUtilClassBase + EntityUtils.getEntityLinkFromType: add
   EntityType.KNOWLEDGE_PAGE cases that route to getKnowledgePagePath.
   Before this, mention notifications for Knowledge Pages fell through
   to the default `/table/<fqn>` branch (confirmed in the captured
   page-snapshot: the mention link pointed at `/table/Article_eEqrWeeU`),
   which 404'd on the Table API and rendered an error page — so the
   entity-header-display-name textarea never appeared and the User
   Mentions test timed out. Search results on Explore had the same
   problem, rendering every Knowledge Page result card with href="/".

3) EntityUtilClassBase.getEntityByFqn / ENTITY_PATCH_API_MAP /
   getResourceEntityFromEntityType: handle KNOWLEDGE_PAGE end-to-end so
   the detail-page fetch, patches, and policy lookups all route through
   the knowledgeCenter REST API rather than falling back to the generic
   entity utilities (which don't know about the 'page' entity type).

Verified against the real trace artifacts from CI run 24790718035:
- shard 3 Knowledge Center page test — widget shows 10 unrelated
  "Article_*" fixture items instead of the created one → root cause
  is the missing DAO JOIN (#1).
- shard 3 User Mentions test — notification link is /table/, not
  /knowledge-center/ (#2).
- shard 3 Reviewer Workflow — data consumer's knowledge-center goto
  renders "No data available" because getEntityByFqn fell back to a
  table fetch for a page FQN (#3).
- shard 5 ExplorePageRightPanel_KnowledgeCenter (22 failures) —
  search result card links are "/explore/" (empty), same root cause
  as (#2) inside getEntityLinkFromType default branch.

Compiles: mvn -pl openmetadata-service -q -DskipTests compile passes;
tsc --noEmit reports no new errors in the touched files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address remaining PR #27558 review feedback

Seven actionable fixes drawn from the still-open review threads; the
rest of the open threads in copilot's bot reviews are either already
addressed in earlier commits or stale against the current code and
are being resolved on the review UI alongside this commit.

- AssetRepository.getByFQN: the LOG.error message said "asset with id"
  but was printing the FQN. Relabel to "asset with FQN" for accurate
  troubleshooting (thread #42).

- KnowledgePageMapper.createToEntity: stop mutating the inbound
  CreatePage by calling create.withRelatedEntities(...). Build the
  effective list as a local variable and pass it to copy(...). Prevents
  the Organization fallback from leaking into the caller's request
  object, which is surprising when the request is re-used or logged
  (thread #43).

- FolderIndex: default childrenCount to 0 when the entity hasn't yet
  had its children recomputed (e.g. a freshly created folder). Prevents
  the numeric field from being indexed as missing, which broke range
  and sort queries that assume it is always present (thread #46).

- NoOpAssetService and InMemoryAssetService: override
  generateDownloadURL to delegate to generateDownloadUrlWithExpiry,
  matching S3/Azure. Without this, callers using the non-expiry API
  got asset.getUrl() (often empty for these providers), yielding broken
  download links (threads #39, #45).

- ObjectDeleteQueueService: register a JVM shutdown hook in the
  singleton's initializer that calls stop(). The service already
  implements Dropwizard Managed, but nothing currently wires it into
  the application lifecycle, so non-daemon delete-worker threads were
  at risk of keeping the JVM alive after ungraceful termination. The
  hook is a belt-and-suspenders fallback to the Managed path
  (threads #52, #53).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add java-checkstyle skill for Claude + Codex agents

CI keeps surfacing "Java checkstyle failed — please run mvn spotless:apply"
comments on PRs (including this branch). CLAUDE.md and AGENTS.md already
mentioned the command, but a one-line prose note in the middle of each file
wasn't enough to make it a reliable habit.

This commit:

- Adds a dedicated invocable skill at .claude/skills/java-checkstyle/SKILL.md
  (for the Claude Code harness) and a mirror at
  .agents/skills/java-checkstyle/SKILL.md (for Codex-style agents). Both
  describe the same procedure: when / why to run spotless, the `-pl <module>`
  scoping option, the verify-only `spotless:check` form, the expected
  diff shape, and the rule to never hand-edit formatting around a plugin
  error.

- Promotes the existing one-liners in CLAUDE.md and AGENTS.md to explicit
  "run before finishing any Java task" instructions, pointing at the skill so
  agents have a reusable procedure to invoke rather than improvising.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Harden AttachmentResource upload/download against three regressions

Carried over from the latest AttachmentResource review. Three issues:

1. Content-Disposition header injection (security) — downloadAsset() built
   the header by direct string interpolation of asset.getFileName(). A
   filename containing double-quotes or CRLF could inject arbitrary HTTP
   headers. ContextFileResource already has a sanitize + RFC-5987 encode
   helper; rather than duplicate it, promote
   ContextFileUploadSupport.sanitizeFileName / buildContentDisposition to
   public, delete the duplicates from ContextFileResource (now delegators),
   and reuse the shared helpers from AttachmentResource.

2. Unbounded upload buffering (performance / DoS) — createAssetFromUpload
   read the full multipart body into a byte[] via IOUtils.toByteArray
   before checking against MAX_FILE_SIZE. An attacker could send an
   arbitrarily large body and exhaust heap before the validation ran.
   Replace with ContextFileUploadSupport.bufferUpload(), which streams to
   a bounded temp file and throws MaxFileSizeExceededException the moment
   the configured limit is passed; translate that into the same
   AttachmentException size-validation error the previous code raised.
   Promoted BufferedUpload and MaxFileSizeExceededException to public so
   the attachments package can consume them.

3. Startup NPE when objectStorage is null (bug) — initialize() called
   config.getObjectStorage().getMaxFileSize() without a null guard, so a
   deployment that doesn't configure object storage would NPE on server
   start. Added the same guard ContextFileResource.initialize() already
   uses, gave MAX_FILE_SIZE a safe 5 MiB default, and also null-guarded
   the S3-configuration branch of the CDN URL lookup so a pure-Azure or
   pure-NoOp setup doesn't fall off the end of the ternary.

Ran mvn spotless:apply — picks up formatting-only changes in
CollectionDAO.java and FolderIndex.java as a side effect of the shared
helper additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add ui-checkstyle skill + fix residual import-order drift

CI's UI Checkstyle workflow has three per-area jobs (lint-src,
lint-playwright, lint-core-components) that reformat the files changed
in the PR and fail if the reformat produces a diff. CLAUDE.md and
AGENTS.md didn't previously document this flow, so re-running the fix
was a guessing game — the two lint-core-components and lint-playwright
failures on this branch came from stale import order left over from the
main→context_center merge.

This commit:

- Adds a dedicated invocable skill at .claude/skills/ui-checkstyle/SKILL.md
  (Claude Code harness) and a mirror at .agents/skills/ui-checkstyle/SKILL.md
  (Codex-style agents). Both describe the exact three-command sequence CI
  runs — organize-imports-cli → eslint --fix → prettier --write — the
  per-area file scoping, the `--check` dry-run mode, and the rule that
  organize-imports must run BEFORE prettier (otherwise the indentation /
  trailing-comma round-trip leaves a dirty diff).

- Promotes the existing one-liner in CLAUDE.md and AGENTS.md to an explicit
  "run before finishing any UI task" instruction that points at the skill.

- Fixes two residual import-order drifts (KnowledgePagesHierarchy.tsx,
  EntityUtilClassBase.ts) surfaced by running the skill's sequence locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix UI checkstyle on EntityUtilClassBase.ts

ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the
fallback return in getEntityByFqn. Committing the formatted version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix ContextFileIT.testFileAppearsInSearch flaky 500 from query_string parsing

The previous polling search used the namespaced unique name as a free-text
q= argument. The namespace prefix contains '-' which the ES 9.x query_string
parser treats as a NOT operator, producing a deterministic 500 across the
full 30s polling window even when the document was indexed.

Switch to the direct get-by-id endpoint (/v1/search/get/{index}/doc/{id}),
which performs a real-time ES GET with no query_string parsing and no
analyzer involvement — the most reliable signal that the document was
indexed. Bump the timeout to 60s and capture the response body on any
non-200 so future regressions surface the real ES error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix knowledge center icon

* update knowledge center to context center

Co-authored-by: Copilot <copilot@github.com>

* Revert "update knowledge center to context center"

This reverts commit f0cca5fd65.

* Fix UI checkstyle: sort tag*-related imports in SearchClassBase

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Jest coverage failures in KnowledgeCenter Layout and right panel

KnowledgeCenterLayout was importing i18n directly from LocalUtil, but the
global setupTests mock for that module only exposes t/on. Switch to the
useTranslation() hook so it picks up the react-i18next mock that already
provides i18n.dir(), matching how LeftSidebar and RichTextEditor use the
direction.

EntityRightPanelClassBase.getKnowLedgeArticlesWidget now returns the
KnowledgePages component instead of null. Update the corresponding test
case to assert the new return value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix playwright tests and bugs

Co-authored-by: Copilot <copilot@github.com>

* Fix checkstyle

* Fix /knowledgeCenter/search/hierarchy 500 by removing _id sort

ES 9.x and OpenSearch 3.x reject sorts on the _id field by default
(indices.id_field_data.enabled is false), causing every call to
listPageHierarchy{,ForActivePage} to fail the search_phase_execution_exception
"all shards failed" we see in the screenshot. The _id sort was added
in 4a75852a7e as a tiebreaker for from/size pagination, but
fullyQualifiedName is already a keyword field with doc_values and is
unique per page (name is unique within a parent's children) — so no
tiebreaker is needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Cascade hard-delete to descendant pages in search index

KnowledgeCenter pages are nested via FQN (parent.fqn -> parent.fqn.child),
not via a parent.id field on the child doc. The default deleteOrUpdateChildren
case for entity type "page" uses page.id field matching, which doesn't exist
on child page docs — so a recursive hard-delete on the parent removed the
parent from search but left every descendant orphaned in the index. Stale
docs only disappeared on a full reindex.

This logic was overridden in the collate fork's SearchRepositoryExt; it was
lost during the migration when the override class was removed. Fold the
override into the base SearchRepository as a Page-specific case that calls
deleteEntityByFQNPrefix, which deletes by fullyQualifiedName.keyword prefix
match — covering every descendant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Add page/folder/contextFile/securityService to SearchIndexingApp picker

The Search Indexing Application's "Entities" picker shows "No data" when
typing "Page" because the enum in src/utils/ApplicationSchemas/SearchIndexingApplication.json
does not include the Knowledge Center / Drive entity types added on this
branch. The collate fork carried these in SearchIndexingApplication-collate.json
(included page); folder, contextFile and securityService are new on this
branch and never made it into the picker enum during the migration.

Without them in the enum, users cannot select these entity types for
targeted reindex, even though every other reindex code path supports them.

src/jsons/applicationSchemas/* is generated by parseSchemas.js from
src/utils/ApplicationSchemas/* at build time and is gitignored, so only
the source schema is updated here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Restore live index settings on per-entity distributed-promote path

DefaultRecreateHandler exposes two finalization paths:

  - finalizeReindex(...)        — centralized end-of-job promotion. Calls
                                  applyLiveServingSettings + maybeForceMerge
                                  before the alias swap, reverting the bulk
                                  overrides (refresh_interval=-1, replicas=0,
                                  async translog) back to live values
                                  (refresh=1s, replicas=1, durable translog).

  - promoteEntityIndex(ctx, ok) — per-entity promotion. Used by the distributed
                                  search-indexer's "promote as soon as all
                                  partitions for an entity complete" callback
                                  (DistributedSearchIndexExecutor.promoteEntityIndex).
                                  Swaps the alias and cleans up old indices —
                                  but never restored live settings.

When an entity finishes its partitions before the final reconciliation
(typically the smallest entities — e.g. knowledge `page` with ~11 rows),
its index is promoted via the per-entity path, the alias swap succeeds,
and the bulk-build overrides become the new live settings. refresh_interval
stays at -1 in production, so live writes after the reindex are buffered in
the translog and never reach searchable segments until a manual _refresh.
Externally this surfaces as "create an article, hierarchy is empty until I
re-trigger reindex" — exactly the user-reported bug.

Mirror the finalizeReindex sequence by calling applyLiveServingSettings
(and maybeForceMerge for parity) at the top of the promote block in
promoteEntityIndex, before the alias swap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Wire jobData into per-entity reindex promotion handler

DefaultRecreateHandler.applyLiveServingSettings reads from the handler's
jobData field (live + bulk index-settings overrides on the EventPublisherJob).
The per-entity distributed-promotion path in DistributedSearchIndexExecutor
created its own DefaultRecreateHandler instance and never called
withJobData(jobData) on it. With jobData=null, buildRevertJson returns null
and applyLiveServingSettings silently no-ops — meaning the previous fix
(b272de85f9) never actually re-applied live settings on the per-entity
promote path, even though the call was reached.

currentJob.getJobConfiguration() is the EventPublisherJob the strategy
created. Wire it into the new handler at construction time, mirroring the
withJobData call DistributedIndexingStrategy already makes on the strategy's
own handler instance.

With this change, the per-entity promote path now logs

  "Applying live serving settings to staged index '...' for entity 'page':
   {\"number_of_replicas\":1,\"refresh_interval\":\"1s\", ...}"

before the alias swap, and post-promotion `_settings` show
refresh_interval=1s instead of the stuck -1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix delete failure

* Fix java checkstyle

* Fix article deletion issue

* refactor(test): streamline Knowledge Center List setup and teardown processes

* Fix GlossaryTags

* Add missing pieces in knowledge articles

* Fix checkstyle

* Remove reviewer workflow spec

* remove unused util

* Fix the localization changes

* Fix unit tests

* deleted unused svg

* added missing svg

* improved ux of save button & autofocus on title

* lint fixes

* Update page index

* Make calculateFqnDepth static

* fixed the kc imports

* import fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com>
Co-authored-by: Rohit0301 <rj03012002@gmail.com>
Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com>
Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>
2026-05-08 10:56:04 -07:00
Anujkumar Yadav
80375a7dc6
Add data access request support (#27879)
* Add DAR tasks

* Removed UI related changes of DAR

* nit

* Update generated TypeScript types

* fix linting issue

* Removed all languages changes

* nit

* removed white space

* add request data access button with owner/status conditions

* fix lint issue

* fix minor validation for data access button

* fix lint issue

* fix data access button visiable condition

* fix java lint checks and fix test cases

* nit

* fix test

* fix(tasks): model CreateTask.about as entityLink, validate target entity

Replace `about` (FQN string) + `aboutType` (string) with a single
`about` field of type entityLink (`<#E::{entityType}::{fqn}>`). The
resource layer parses the link and resolves it via
`Entity.getEntityReferenceByName(type, fqn, NON_DELETED)`, which
guarantees the target asset exists and is not soft-deleted.

Why: long-FQN data assets were rejected with `[query param name size
must be between 1 and 256]` because the modal was constructing a Task
`name` from the FQN. The `about` was modelled as a free string with
no schema validation that the target was a real, non-deleted entity.
The Threads API already uses entityLink for this exact purpose; tasks
now align with that pattern. The link is supplied as a hidden field
by the UI — users never see it.

Also fixes the missing `@ExtendWith(TestNamespaceExtension.class)` on
`DataAccessRequestIT` that caused four test failures in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix unit test failure

* fix(test): await workflow stage transition in DataAccessRequestIT

The workflow advances the task from pending-workflow-start to review
asynchronously. Asserting on the object returned by create() was a
race condition. Use Awaitility to poll until the stage is review,
matching the pattern in IncidentTaskIntegrationIT.

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-05-07 17:56:44 +05:30
Sriharsha Chintalapani
ad9e1b7823
Containers: batch container data-model column tag retrieval to avoid subtree fan-out (#27836)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Containers with deep nesting causing performance issues due to tag fetch

* Batch derived-tag fetch across data-model columns

populateDataModelColumnTags previously called addDerivedTagsGracefully
once per flattened column, which internally batches across that column's
own tags but issues a separate derived-tag DB lookup for every column.
On data models with many columns (or struct types with deep nesting)
this becomes an N+1 pattern.

Refactor:
- Pre-compute Map<String, Column> hashToColumn once (LinkedHashMap to
  preserve column order) so we no longer hash each FQN twice — once
  for the target-hash list and again on lookup.
- After fetching tags by target hash, flatten all returned TagLabels
  into a single list and call TagLabelUtil.batchFetchDerivedTags(...)
  once for the whole data model.
- Per column, use addDerivedTagsWithPreFetched(columnTags, derivedMap)
  to avoid further DB lookups.
- Fall back to the per-column addDerivedTagsGracefully path if the
  batch derived-tag fetch raises, preserving existing semantics.

Net effect: total derived-tag DB queries drop from O(N) to 1 regardless
of column count or nesting depth.


Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
2026-04-30 20:55:55 -07:00
Ram Narayan Balaji
368fae160b
Revert "Feature #18173: Version API Improvements" (#26307) (#27837)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Revert "Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration (#26307)"

This reverts commit e4d3e423e1.

* fix: apply ruff formatting after conflict resolution in Python files
2026-04-30 11:23:42 +00:00
Sriharsha Chintalapani
51ecf4502f
Task redesign (#25894)
* Task Redesign: Add Task entity & tests

* Task Redesign: Add Task entity & tests

* Task Redesign: Add Permissions checks for Task APIs

* Task UI changed to the new APIs

* Migrate UI and APIs to new tasks system inlcuding suggestions

* Add Suggestions integration

* Activity Feed Refactor

* ActivityFeed -> ActivityStream publisher

* Activity Feed redesign

* Activity Feed redesign, adding tests

* Incident Manager update

* Migrate Incidents to new tasks

* Migrate Incidents to new tasks

* Update generated TypeScript types

* Update generated TypeScript types

* feat(tasks): add domain-aware task cutover and workflow v2 migration

* test(tasks): cover domain filters and task feed visibility flows

* Address comments

* Fix workflow tests to use new Task entity API and fix UserApprovalTaskV2 candidate transformation

Migrated 9 WorkflowDefinitionResourceIT tests from legacy Feed/Thread API to the new
Task entity API (UserApprovalTaskV2 creates Task entities, not Thread entities). Fixed
a bug in UserApprovalTaskV2 where candidates were passed as raw EntityReferences instead
of being transformed into users/teams FQN arrays for SetApprovalAssigneesImpl.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix tests

* refactor: stabilize task entity workflows

* refactor: finish task entity cutover and activity migration

* refactor: migrate legacy thread feed during cutover

* refactor: split legacy thread rename and archive migrations

* Merge main; fix tests

* Update generated TypeScript types

* feat: advance task redesign through phase 2

* Merge main; fix tests

* Update generated TypeScript types

* Fix failing tests

* Update generated TypeScript types

* fininsh phase 6 of the design, configurable task forms

* Update generated TypeScript types

* Update generated TypeScript types

* Fix linting

* Address gitar comments

* Address gitar comments

* Fix build

* Address giar comments

* fix build

* Add task custom forms

* Fix tests

* Address tests

* Apply UI lint autofixes

* Fix tess

* Fix linter

* Fix task patching

* Fix tests

* Fix playwright tests

* fix java checkstyle

* Add python sdk support for tasks, annoucements

* Fix playwright tests

* Fix playwright tests

* Fix playwright tests

* Fix python tests

* Fix python tests

* Fix linting workflows

* fix pycheck

* fix pycheck

* Fix tests

* Fix build

* Address deviations from main and fix tests

* Fix integration tests

* Fix integration tests

* Fix integration tests

* Update generated TypeScript types

* Fix Playwright tests

* Fix Playwright tests

* feat(incident): wire incident manager to task-first architecture (#27369)

* feat(incident): wire incident manager to task-first architecture

Connect the incident manager to the task redesign so it works
end-to-end: resolve data persistence, backward transitions,
reopen from resolved, and incident discovery via TCRS.

* Update generated TypeScript types

* refactor: single-query incident task lookup with parameterized statuses

Replace two sequential queries (Open, InProgress) in
getOrCreateIncident with one findByAboutAndTypeAndStatuses
query using @BindList for status IN (...).

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix Playwright tests

* Update generated TypeScript types

* Fix linter

* Fix tests

* Fix tests

* Fix checkstyle

* Fix tests

* Fix checkstyle

* Update FeedResourceIT.java

* Update TableRepository.java

* fix tests

* Update ActivityFeedProvider.tsx

* fix tests

* fix tests

* Address Task comments

* Fix unit test

* Fix the feed summary panel showing on landing page

* Fix comment functionality

* Fix pytests

* Fix failing playwright tests

* Fix test flakiness

* Fix ui-checkstyle

* Fix advanced search spec failure

* Fix playwright tests

Co-authored-by: Copilot <copilot@github.com>

* Fix checkstyle

* Fix the flaky tests

Co-authored-by: Copilot <copilot@github.com>

* fix checkstyle

* Reduce the workflow polling

* Update generated TypeScript types

* skip failing tests

Co-authored-by: Copilot <copilot@github.com>

* Fix ui-checkstyle

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: IceS2 <pablo.takara@getcollate.io>
Co-authored-by: karanh37 <karanh37@gmail.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
2026-04-23 15:52:30 +02:00
Sriharsha Chintalapani
e4d3e423e1
Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration (#26307)
* Feature #18173: Improve Version API, through paginatio, get x latest versions, specifict time, specific metadata changes

* Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration

* Update generated TypeScript types

* address comments

* fix py check

* Address comments

* Address comments

* Fix tests

* Fix tests

* Fix tests

* Better way to lookup versions

* Fix pytests

* Fix tests

* Address comments

* chore(migrations): move version API schema additions from 1.13.0 to 1.12.7

Moves the PR's new entity_extension columns (versionNum, changedFieldKeys),
indexes, and backfill scripts from the 1.13.0 migration directory into a
new 1.12.7 directory. Keeps 1.13.0 identical to upstream main; only this
PR's additions land in 1.12.7.

Also updates MigrationSqlStatementHashTest to exercise the relocated files.

* fix(versions): address CI failures and review feedback

- testAPI.test.ts: update getTestCaseVersionList mock expectation to include
  the new params argument (APIClient.get is called with { params } since the
  function now supports limit/offset/fieldChanged).

- PaginatedVersionHistory.spec.ts: replace banned networkidle waits and
  waitForSelector with web-first assertion on version-button visibility
  (satisfies playwright/no-networkidle and playwright/no-wait-for-selector).

- EntityVersionTimeLine.tsx: implement infinite scroll via IntersectionObserver
  on a sentinel element at the bottom of the version list. Hooks up the
  onLoadMore/hasMore/isLoadingMore props that were in the interface but
  previously unused.

- EntityVersionPage.component.tsx: fix stale-closure bugs in fetchMoreVersions
  (gitar-bot review). Use versionListRef for currentOffset and
  isLoadingMoreRef to gate concurrent invocations so IntersectionObserver
  double-firing does not cause duplicate appends.

- EntityResource.java: accept offset > 0 with default limit when no
  fieldChanged is provided, so pagination params are no longer silently
  ignored (Copilot review).

- datamodel_generation.py: raise explicit errors if generated files or
  expected replacement targets are missing, instead of silently succeeding
  when the generator output drifts (Copilot review).

* fix(checkstyle): format Java, ESLint/Prettier on UI, relax datamodel_generation strict check

- Java: spotless:apply on EntityResource.java (line-break formatting).
- Python: relax datamodel_generation.py DIRECT_IMPORT_FIXES check — replacement
  targets are alternative forms the generator may or may not emit. Only
  require the final marker ('from .paging import Paging') is present after
  replacements; the prior strict per-target check broke 'make generate'.
- UI lint: organize-imports, ESLint --fix, Prettier on all version-related
  files touched by the PR (resolves lint-src + lint-playwright CI checks).
- EntityVersionTimeLine: guard IntersectionObserver effect with isLoadingMore
  so the observer is torn down while a fetch is in flight (Copilot review).
- EntityVersionTimeline.test.tsx: add unit tests covering sentinel rendering
  conditions (hasMore, onLoadMore) and the isLoadingMore observer-guard
  (Copilot review).

* fix(ui-checkstyle): prettier+eslint on EntityVersionTimeline.test.tsx

Collapse import line and reorder JSX props (callbacks last) per repo
lint rules. Reruns ui-checkstyle-changed caught these in the new test
file from the previous commit.

* test(playwright): address @aniketkatkar97 review on PaginatedVersionHistory spec

- Add waitUntil: 'domcontentloaded' to every page.goto() call.
- Wait for loaders (waitForAllLoadersToDisappear) before asserting the
  version-button to avoid racing the initial entity render.
- Replace the manual { timeout: 15_000 } on versionSelectors.nth(1) with
  an explicit waitForResponse on the second paginated /versions call
  (offset > 0). This deterministically synchronises on the infinite-scroll
  fetch instead of a wall-clock timeout.

* fix: address Copilot review — one-shot observer + local SQL splitter

1. EntityVersionTimeLine.tsx: call observer.unobserve(entry.target) as
   soon as the sentinel first intersects so onLoadMore fires only once
   per attached observer. The effect reattaches a fresh observer after
   isLoadingMore flips back to false, so subsequent pages still load
   — we just no longer rely on the parent's in-flight ref as the sole
   stopgap against repeated fires for the same page.

2. MigrationSqlStatementHashTest.java: replace Flyway's non-public
   org.flywaydb.core.internal.* parser classes with a small, local SQL
   statement splitter. Handles line (--) and block comments, single-,
   double-, and backtick-quoted strings, backslash escapes, and doubled-
   quote escapes. Removes a brittle dependency on Flyway internals that
   could break on upgrades.

Tested:
- mvn test -pl openmetadata-service -Dtest=MigrationSqlStatementHashTest
  → 2 tests pass.
- yarn test EntityVersionTimeline.test.tsx → 8/8 tests pass.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: sonika-shah <sonika-shah@users.noreply.github.com>
Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com>
Co-authored-by: sonika-shah <sonikashah94@gmail.com>
Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>
2026-04-23 12:17:40 +02:00
Mohit Yadav
7bb8e40b65
Fix column filtering on Lineage (#25353)
* Fix Column Filtering and add path preserve

* Preserve only column with matching filter

* Add Test

* update param

* Add UI work

* Lanaguage

* Add proper translations for column-filter locale keys (#25360)

* Initial plan

* Add proper translations for column-filter locale keys across all 18 languages

Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com>

* fix filtering

* Fix ui : Dropdown filters (Domains, Owners, Tag, Tier, Service, etc.) were not showing in the Impact Analysis view and normal lineage view.

* put back searchbox for column level

* Fix query_filter not working for tag/domain/tier in lineage APIs -> table level filtering

* fix: hasNodeLevelFilters bypassing ES filters causing empty results

* fix: tag filter incorrectly sent to column_filter on table-level page

* Fix Impact Analysis search and filtering with path preservation
  Summary of changes:
  - Backend: Path preservation for search, accurate pagination counts, wildcard query parsing, OR logic for name/displayName
  - Frontend: Column-level search now matches both table names and column names

* Table level: Search → query_filter (matches table names)                    Column level: Search → column_filter only (matches column names)

* Fix column impact analysis: depth-aware filtering, tag aggregation, and nested column support

* address gitar bot feedback : lineage filter — add service to path preservation, fix OR semantics, rename preserve_paths, guard NPE on fromEntity

* fix: use unfiltered depth counts in lineage pagination info, remove 10k doc fetch

* fix: Impact Analysis — fix upstream BFS, always run BFS unfiltered and apply query filter as in-memory post-filter to support multi-depth traversal, fix column
  filter OR-within-type semantics, rename preserve_paths param, and add integration tests

   instead of passing queryFilter into the BFS (which blocked traversal through non-matching intermediate nodes), we now run BFS with no
  filter to discover the full graph topology, then apply the filter after all nodes and edges are collected using the existing
  applyInMemoryFiltersWithPathPreservationForEntityCount.

* fix: lineage Impact Analysis — unfiltered BFS with post-filter for multi-depth traversal, upstream BFS direction fix, remove dead ES query column filter code, fix stale useCallback deps, add SDK methods and integration tests

* fix: remove column_filter from UI calls where backend doesn't support it (exportAsync, platformLineage, dataQualityLineage, paginationInfo), fix stale useCallback deps in LineageProvider

* fix: Impact Analysis — unfiltered BFS for multi-depth filter traversal, upstream direction fix, table/column tag separation, dead code cleanup, stale UI deps, node depth dropdown fix

* fix: remove dead columnFilter plumbing from CustomControls, clear column filters on Table mode switch, fix QueryFilterParser search+filter OR logic, add search combo integration tests, log warn on tag fetch failure

* fix: depth-based pagination sort

* ui: performance optimization — avoids redundant lookups

* handle matchesMultipleFiltersWithMetadata

* fix: upstream/downstream count not updating in table view

* fix UI changes

* fix api issues

* fix: Impact Analysis — move to ES-native filtering with unfiltered BFS, filtered pagination counts, tag name enrichment

* address comments

* fix: Impact Analysis — ES-native filtered traversal, batch tag enrichment, depth filtering with filters, SDK entityType support

* fix tests

* fix failing tests

* fix backend test

* add tests for code coverage

* add tests for code coverage

* fix: add id.keyword sub-field to  ES index mappings to fix lineage filter dropdowns for topics, dashboards, and other non-table entities

* address comments

* fix service type filter case

* address gitar bot feedback

* fix tests

* fix build

* Fix the bugs

* Fix the bugs

* Fix all things related to  Lineag, Impact Analysis

* Update generated TypeScript types

* Fix all things related to  Lineag, Impact Analysis

* Fix Mapping for ids for container and test suite

* test: enhance lineage spec to cover all the missing cases (#26796)

* test: enhance lineage spec to cover all the missing cases

* fix searchIndex mapping

* fix tests

* added filter spec

* fix filter issues

* fix lineageSearchSelect

* update database service filter tests

* iterate over all the entity for service filter

* update impact analysis fixes

* update tests management

* add missing test case

* fix tests

* fix column level lineage tests

* fix apiEndpoint issue

* improved lineage connection assertion

* fix tests

* fix column level linage issues

* fix missing import

* update test import from pages

* fix mlModel spell issue

* fix node pagination and right panel spec

* refactor lineage tests to improve entity creation and visibility checks

* fix license header

* fix build

* fix tests

* fix tests

* UI linter fixes

* address comments

* fix unit tests

* remove redundant method

* improve tests

* fix impact analysis tests

* fix impact analysis

* Fix Export via Async and add tests

* update tests

* fix issues

* Spotless fix

* fix impact analysis

* Fix issue with lineage export

* Fix serviceType filtering

* fix multiple calls issue

* fix lint issues

* fix uni tests

* fix test issues

* fix lineage settings spec

* fix all the tests

* Remove fix me

* fix lint issue

* fix failing specs

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com>
Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-04-06 09:01:15 -07:00
Sriharsha Chintalapani
ed58077197
MCP services (#23623) 2026-04-01 22:15:20 +05:30
Sriharsha Chintalapani
410c852f4a
Add Json Logging (#26357)
* Add Json Logging

* Fix comments

* Fix tests

* Centralize junit.platform.version in root pom

* Fix test-config-mcp.yaml - update to JSON logging

* Fix logback.xml to use LOG_LEVEL for backward compatibility

* Reverted to text format for test env  test-config-mcp.yaml

* Add the ability to switch between text/json logging

* Fix comments

* Fix json logging

* Address Comments

* Address Comments

---------

Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>
2026-03-31 16:15:07 -07:00
Laura
6f23854edd
Add queryText field in aggregations (#26828)
* Add queryText field in aggregations

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-03-30 22:29:36 +02:00
Sriharsha Chintalapani
b7797fe3ef
Airflow 3.x API based connector (#26624)
* Add Airflow Connector with API integration

* Add Airflow Connector with API integration

* Update generated TypeScript types

* Add Airflow Connector with API integration improvements

* fix: username password flow for airflow 3, example yaml file, & sidebar docs

* fix type in UI

* Fix integration tests, fixed UI rendering and docs, improved OpenLineageResolver

* Fix pytests

* move connector

* Update generated TypeScript types

* fix: response parsing for astronomer airflow

* feat: added service account auth for airflow rest connection when composer managed airflow along with token

* fix: airflow rest api connection class converter and airflow.md

* feat: add mwaa config support for authentication

* s3 & column lineage

* Update generated TypeScript types

* fix: test airflow mwaa client

* fix: removed unused method, and extra code for parsing response

* fix: git pr checks

* fix: removed airflowapi integration tests that requires real host instance and added test with mocking

* fix test

* improve test coverage

* push coverage

* fix: gitar comments

* fix: removed redundant files

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Keshav Mohta <68001229+keshavmohta09@users.noreply.github.com>
Co-authored-by: Keshav Mohta <keshavmohta09@gmail.com>
Co-authored-by: ulixius9 <mayursingal9@gmail.com>
2026-03-26 17:15:41 +01:00
Sriharsha Chintalapani
6d99ba2dc0
Glossary relations (#25886)
* Glossary Term Relations

* Add GlossaryTerm Relations

* Add GlossaryTerm Relations, Add custom relations, onotolgoy explorer

* Add Translations

* Update generated TypeScript types

* Address comments

* Address comments

* Address comments

* Update generated TypeScript types

* Update yarn.lock after merging cytoscape dependencies from glossary_relations

* fix zoom in and out functionality and added missing translate keys

* fix test

* Remove unwanted changes

* nit

* nit

* nit

* Remove conflict test

* nit

* fix test

* Add test for ontology explorer

* New yarn lock and 2.0.0 schema changes missed during merge conflicts

* Revamped glossary term relation settings

* Refactor code

* Addressed comments

* nit

* Update generated TypeScript types

* Java Checkstyle and Yarn lock

* Update generated TypeScript types

* fix unit test

* Remove 2.0.0 migration folders placed at wrong loc

* Merge main

* fix navigation to relation graph in glossary

* fix ontology explorer spec

* Added filter support in the data mode

* Fix glossary term relation CI failures

### Canonical Relation Storage (GlossaryTermRepository)

* Introduced `computeCanonicalRelationType()` to normalize relation direction
  using UUID ordering (lower UUID is always treated as "from")
* Prevents duplicate and inconsistent relation rows when created from either side
* Updated `setTermRelations()` and `addRelation()` to store canonical relation types
* Fixed `setFields()` read logic:

  * Invert relation type for `fromRecords` (entity is the TO side)
  * Keep `toRecords` unchanged
* Updated `deleteBidirectionalRelatedTo()` to match canonical storage format
* Added `RequestEntityCache.invalidate()` after relation mutations to ensure consistency

### Lazy RDF Resource Initialization

* Added `RdfRepository.getInstanceOrNull()` for null-safe access without throwing
* Refactored `RdfResource` constructor to avoid eager `RdfRepository.getInstance()` call
* Enabled resource registration even when Fuseki is not initialized
* Introduced lazy getters:

  * `getRdfRepository()`
  * `getSemanticSearchEngine()`
* Updated all endpoints to guard with null checks before `isEnabled()`

  * Return `503 Service Unavailable` when RDF is not ready

### Graceful Test Degradation (Fuseki-dependent tests)

* Added `TestSuiteBootstrap.isFusekiEnabled()` to detect Fuseki availability
* `GlossaryOntologyExportIT`:

  * Falls back to Testcontainers-based local Fuseki when bootstrap Fuseki is unavailable
* `GlossaryTermRelationIT`:

  * Skipped via `assumeTrue` when Fuseki is unavailable
* `MetricResourceIT`:

  * Skips RDF-specific tests when Fuseki is unavailable

* fix package conflicts

* nit

* Fix merge conflicts, Python test, RDF reliability, and VectorDocBuilder tests

- Fix Python test_patch_glossary_term_related_terms to use TermRelation
  instead of EntityReferenceList (schema changed relatedTerms type)
- Rewrite VectorDocBuilder tests for current buildEmbeddingFields API
- Improve JenaFusekiStorage retry logic to retry on all HTTP errors
- Increase Fuseki tmpfs size to prevent disk space exhaustion in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix pycheck

* Address all 8 PR review findings

1. Add authorization check on getTermRelationGraph endpoint
2. Add null guard on getBaseUri() to prevent NPE
3. Add React key prop on RelatedTermTagButton in map renders
4. Mark RdfResource lazy-init fields as volatile for thread safety
5. Replace exception messages with generic errors in API responses
6. Unify DEFAULT_RELATION_TYPES between CSV and repository (10 types)
7. Add jitter backoff to deadlock retry in CollectionDAO
8. Replace N+1 queries in prefetchGraphTerms with batch fetch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix Fuseki tmpfs exhaustion and GlossaryTermRelationIT double init

- Remove tmpfs size limit on Fuseki container to prevent disk exhaustion
- Guard RdfUpdater.initialize() in GlossaryTermRelationIT to skip if
  already initialized by bootstrap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix duplicate edges, null term NPE, and silent exception in graph builder

- Deduplicate edges in buildGraph() using edgesSeen set
- Skip TermRelation entries with null term references to prevent NPE
- Add warning log when glossary term relation settings fail to load

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix cardinality count after canonical swap and double-checked locking

- getRelationCount now matches inverse relation type for fromRecords
  where the term is the target, fixing cardinality bypass after
  bidirectional UUID canonicalization
- Use double-checked locking in RdfResource.getSemanticSearchEngine()
  to prevent duplicate instance creation under concurrency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anuj-kumary <anujf0510@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-03-18 10:51:03 +05:30
Mohit Yadav
f65e55c81f
[Fix-25563] - Issue in Search Entity By Key (#26482)
* Fix Issue Entity Not found , entity missing from search

* Fix Tests

* Fix Broken relationship issue for upstreamEntityRelationship

* Add Exponential Retry

* Remove Entity Not found from query

* Address Reciew Comments

* Address more review

* Fix Missing Columns Index

* Fix Join add stats

* Column stats and Merged Main

* Add separate custom bulk processor for column

* Fix reindex job falsely killed after 1 hour by orphan monitor

  The OrphanJobMonitor uses job.updatedAt as a liveness signal, but
  updatedAt was only set during state transitions (READY, RUNNING),
  never refreshed during processing. After 10 minutes the job appeared
  orphaned; at the 1-hour recovery window it was force-failed.

  Fix: touch updatedAt alongside the lock refresh (every 60s) so the
  staleness check stays satisfied while the coordinator is alive.

* Set Failure callback for columnBulkProcessor

* fix review comments

* address review comments from claude

* Review Mocked tests

* Address review comments. Both fixed:

                           1. retryOnConflict(3) restored on all 4 call sites — updateEntity and upsertDocument in both ElasticSearchEntityManager and OpenSearchEntityManager. This handles shard-level
                           version conflicts independently from the 429 retry logic in SearchRetryUtil.
                           2. drainPendingColumnFutures race fixed in both bulk sinks — replaced iterate + clear() with poll() loop, which atomically removes each element from the deque so no
                           concurrently-added futures can be lost

* Get proper error message from Elastic and OpenSearch

* Fix Breaking Test
2026-03-16 18:24:25 +05:30
Sriharsha Chintalapani
12b364313c
Fix Metrics collection; reduce no.of metrics; improve slow request lo… (#25751)
* Fix Metrics collection; reduce no.of metrics; improve slow request logging

* Move sync calls to search & rdf to async

* Improve slow request tracking

* Improve slow request tracking

* Add clear breakdown in slow request

* Batch TestCaseRepository calls

* Batch API calls

* Initial Implementation of ReadEngine

* Improvements with ReadEngine/WriteEngine

* Improvements with ReadEngine/WriteEngine

* Improvements with ReadEngine/WriteEngine

* Improve by removing unnecessary ser/de

* Additional improvements with PatchFieldsPlanner

* Further performance improvements

* Further performance improvements

* Address comments

* Merge from main

* Address comments

* Address comments

* Address latest feedback - 2/21

* fix merge conflict

* Address Slow Request review

* Address the comments

* Address comments; Fix tests

* Fixes to the failing tests

* Fix bugs in tests

* Fix checkstyle

* Address playwright tests

* Fix tests

* Fix bugs

* Fix tests

* address comments

* Fix issues from playwright

* Fix playwright tests

* Fix tests for playwright

* Address comments

* Fix glossary test

* fix checkstyle

* Fix playwright issues

* Fix playwright issues - incrementalChagneDesc

* Restore ApprovalTaskWorkflow in GlossaryTerm and TestCase repositories

The slow_request branch accidentally removed entity-specific ApprovalTaskWorkflow
overrides, causing the generic parent to use checkUpdatedByTaskAssignee instead of
checkUpdatedByReviewer. This broke Glossary approval and TestCase approval Playwright tests.

- GlossaryTermRepository: restore ApprovalTaskWorkflow with checkUpdatedByReviewer
- TestCaseRepository: restore ApprovalTaskWorkflow, preDelete guard, updateReviewers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix base ApprovalTaskWorkflow to use reviewer check instead of task assignee

The centralized ApprovalTaskWorkflow in EntityRepository was using
checkUpdatedByTaskAssignee instead of checkUpdatedByReviewer, breaking
approval workflows for all entity types. Added verifyReviewer() as a
top-level static method on EntityRepository and restored missing
updateReviewers() and preDelete IN_REVIEW guards in DataContract,
DataProduct, Metric, and Tag repositories. Removed now-redundant
entity-specific ApprovalTaskWorkflow overrides from GlossaryTerm and
TestCase repositories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix regression introduced in backend tests; make the playwright tests stable

* Stabilize the playwright tests

* Stabilize the playwright tests

* Improve playwright tests

* Improve playwright tests

* Fix team playwrights

* Fix merge from main

* Fix playwrigt tests

* Fix playwright tests

* Batch domain/data product asset counts into single ES aggregation queries

Replace N individual ES count queries with single aggregation query per
entity type. Domain counts roll up child counts to parent domains.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Improve Playwright test reliability and expand CI shards

Add polling waits for async ES indexing, fix lineage edge selectors,
use API-based setup for domain/data product widget tests, and expand
CI from 6 to 8 shards with dedicated graph/landing projects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Improve test reliability with response checks and guards

- Add API response status checks in create() for Domain, DataProduct,
  Glossary, TableClass, and UserClass — silent API failures now throw
  immediately with status code and response body
- Add guards in selectDataProduct() and addAssetsToDataProduct() for
  undefined name/fqn — clear error messages instead of cryptic
  "locator.fill: value: expected string, got undefined"
- Fix GlossaryPermissions double navigation — remove redundant
  redirectToHomePage + sidebarClick before glossary.visitEntityPage()
- Increase OnlineUsers timeout from 5s to 15s for CI resource pressure
- Increase Tour badge timeout from 10s to 20s
- Fix visitGlossaryPage: wait for loader before clicking menuitem
- Remove chromium testIgnore for graph/landing/stateful test files
  (these must run in chromium project for 6-shard CI workflow)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Remove all networkidle waits and improve CI reliability

- Remove ~780 networkidle waits across 144 test/utility files — these
  hang or resolve prematurely under CI load causing false negatives
- Add polling.ts with waitForSearchIndexed and waitForPageLoaded helpers
- Convert checkAssetsCount and search functions to expect.poll() for
  async ES indexing tolerance
- Increase expect timeout to 15s for CI environments
- Split CI into 8 shards with dedicated projects (stateful/graph/landing)
  to reduce thread contention
- Fix GITHUB_STEP_SUMMARY size overflow (base64 screenshots → table)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix genuine test failures from networkidle removal

- GlossaryPagination: Fix waitForResponse race conditions - register
  listener BEFORE the triggering action, add **/ URL prefix
- LanguageOverride: Fix selector from getByText('EN') to
  getByText('English - EN') matching actual dropdown text
- NestedColumnsExpandCollapse: Fix URL glob pattern, use dispatchEvent
  to avoid inner Link navigation, add waitForResponse for filtered search
- lineage.ts: Revert dragConnection hover approach that broke React
  Flow connection mode, keep direct dispatchEvent
- customizeLandingPage.ts: Remove waitForURL that hangs after page.goto
- Teams.spec.ts: Add isJoinable: false for private team creation
- UserDetails.spec.ts: Revert Escape/clickOutside save flow that
  dismissed edit mode before saving roles
- Users.spec.ts: Revert Data Consumer permissions test to original
  simple approach using fixtures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Relax OnlineUsers activity time assertion

The "Online now" exact match fails under CI load because the activity
timestamp may show as "X seconds ago" or "X minutes ago" by the time
the page renders. Changed to accept any recent activity format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix 4 genuine test failures from CI run

1. saveCustomizeLayoutPage: Use response predicate matching both
   POST (create) and PUT (update) patterns instead of glob that
   only matched updates. Fixes 180s timeout in drag-and-drop test
   when layout doesn't exist yet (fullyParallel=true).

2. GlossaryMiscOperations: Add test.slow(true) — test does 9
   sequential page navigations that exceed the 60s timeout.

3. DomainDataProductsWidgets "Assign Widgets": Add test.slow(true)
   — calls addAndVerifyWidget twice, each with multiple navigations.

4. DomainFilterQueryFilter: Add waitForAllLoadersToDisappear before
   clicking domain-dropdown after search operations that trigger
   page re-renders.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix AutoPilot test — reload page after API status poll

The AutoPilot status banner never appeared because:
1. checkAutoPilotStatus polls the workflow API directly via apiContext
   (outside the browser), not through page network requests
2. The UI uses WebSocket for live updates, but the socket connection
   is only established when the page loads with status=RUNNING
3. Since the page loaded before the workflow started, the socket was
   never connected, so the UI never received the completion event

Fix: reload the page after checkAutoPilotStatus confirms the workflow
finished, so the UI renders with the current state. Also increase the
banner visibility timeout to 30s for CI environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix flaky tests — entity collisions, missing cleanup, expect timeout

- Replace Date.now() with uuid() for entity names in CustomProperties tests
  to prevent collisions when parallel workers execute within the same millisecond
- Fix FollowingWidget: move shared adminUser create/delete to top-level
  base.beforeAll/afterAll to prevent duplicate user creation across 11
  parallel test.describe blocks
- Add missing afterAll cleanup to OnlineUsers, Metric, CustomPropertyAdvanceSearch,
  and CustomProperties tests to prevent entity/user leaks between runs
- Replace hardcoded metric name in MetricSearch with uuid-based name
- Add global expect timeout of 15s (up from 5s default) for CI resilience

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Playwright CI: include UI in build-once Maven build

The build-once optimization (#26423) used -DonlyBackend -pl !openmetadata-ui
which produces a tar.gz without the compiled React app. The Docker container
starts but cannot serve the login page, causing auth.setup.ts to timeout
on all 6 shards waiting for input[id="email"] to appear.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix CodeQL security warnings

- Replace Math.random() with crypto.randomUUID() for test data generation
- Escape backslash characters in CSS selectors for glossary FQN values
- Use page.getByTestId() instead of raw CSS selectors in entity utils
- Increase RSA key size from 512 to 2048 bits in JwtFilterTest
- Skip archive entries containing '..' in JsonUtils.getResourcesFromJarFile

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix user cleanup to prevent 'Email Already Exists' failures

- Glossary.spec.ts: Fix typo user3.create→delete in afterAll, add missing adminUser.delete
- Teams.spec.ts: Add afterAll cleanup hooks for 3 nested describe blocks that were missing them (EditUser, DataConsumer, Owner)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Add afterAll cleanup hooks and fix test reliability

- InputOutputPorts.spec.ts: Add afterAll for domain/tables/topics/dashboards
- Users.spec.ts: Add top-level afterAll for all shared entities
- Entity.spec.ts: Add afterAll for shared + per-entity-type cleanup
- Pagination.spec.ts: Add afterAll for 13 describe blocks (services, DBs, etc.)
- DataProductRename.spec.ts: Add afterAll cleanup
- TestCaseIncidentPermissions.spec.ts: Add afterAll for users/roles/policies/table
- ImpactAnalysis.spec.ts: Add afterAll for all 7 entity types
- NestedColumnsExpandCollapse.spec.ts: Add afterAll for 4 describe blocks
- DataProductPermissions.spec.ts: Add afterAll cleanup
- ServiceEntityPermissions.spec.ts: Add afterAll for testUser + per-entity
- ServiceForm.spec.ts: Add afterAll for adminUser
- domain.ts: Replace waitForTimeout(2000) with proper loader/tab waits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Trigger Playwright CI

* Playwright: Fix 2 failures and 26 flaky tests with proper waits

Fix remaining 2 genuine failures:
- DomainDataProductsWidgets: add test.slow(true) for ES indexing lag
- Users.spec.ts: add test.slow(true) and loader waits for owner search

Fix 26 flaky tests by addressing 5 root cause patterns:
- Response listener after trigger: MetricCustomUnitFlow, DomainUIInteractions
- Missing loader wait after navigation: 16 tests across CustomizeDetailPage,
  DataProductPersonaCustomization, DataContracts, ExploreTree, and others
- Element not rendered after API response: EntityVersionPages, ODCSImportExport
- DOM not settled after loader: Domains nested rename
- Permission cache propagation: GlossaryPermissions

Shared utility improvements:
- waitForPatchResponse uses entity-specific URL pattern
- openColumnDetailPanel accepts entityEndpoint param with API response wait
- Entity.spec.ts uses dynamic entity.endpoint instead of hardcoded tables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix addOwner retry to wait for search API response

The owner search retry loop was refilling the search input but not
waiting for the API response before checking item visibility. This
caused the poll to repeatedly check stale/empty results.

Fix: await search response and loader detach in each retry iteration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix owner listitem selector — remove exact match

The owner selection list items include avatar initials (e.g., "G") in their
accessible name, making exact: true fail since the accessible name is
"G UserName" not just "UserName". Switching to substring matching fixes
the Users.spec.ts persistent failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix 10 remaining flaky tests with proper waits

- ColumnLevelTests: loader wait after visiting test case panel
- DataQualityPermissions: loader wait after visiting test suite page
- IncidentManagerDateFilter: loader wait after page reload
- InputOutputPorts: wait for warning alert before asserting
- Lineage: replace 5 hardcoded waitForTimeout(500) with loader waits
- CustomizeDetailPage: dialog close waits, fix missing await on expect
- DataProductPersonaCustomization: loader wait + modal visibility check
- GlossaryPermissions: increase permission propagation wait, loader wait
- GlossaryHierarchy: loader waits after modal close and glossary select
- ExploreTree: loader waits after API response before UI interaction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix CodeQL security alerts: incomplete escaping and Zip Slip

1. entity.ts: Use JSON.stringify().slice(1,-1) for proper escaping of
   both backslashes and double quotes in filter values, replacing the
   incomplete .replace(/"/g, '\\"') approach.

2. JsonUtils.java: Strengthen Zip Slip protection by normalizing paths
   via Paths.get().normalize() and rejecting entries starting with "/"
   or resolving to parent traversal after normalization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix tests

* Fix tests

* Fix recordChange field name mismatches and CodeQL alert

- ServiceEntityRepository: recordChange("ingestionAgent") → "ingestionRunner"
  to match the JSON property name. The shouldCompare() gate in PATCH flow
  was silently dropping ingestionRunner changes because the field name
  didn't match patchedFields.
- DataContractRepository: compareAndUpdate("status") → "entityStatus"
  to match the JSON property name, same root cause.
- JsonUtils: Simplify Zip Slip check to string-based validation to
  satisfy CodeQL taint analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Remove serial mode from Users.spec.ts to prevent cascade failures

A single flaky test failure was causing ~19 tests across 5 unrelated
describe blocks to be skipped. Matches main branch behavior (parallel).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Playwright: Fix flaky tests — missing awaits, hardcoded waits, silent catches

- DataProductPersonaCustomization: add missing await on expect() calls
- TestCaseIncidentPermissions: poll for incident creation instead of one-shot query
- TestCaseResultPermissions: add loader wait after Data Quality tab click
- GlossaryPermissions: replace waitForTimeout(3000) with toPass() retry
- BulkImport: remove 4 unnecessary waitForTimeout calls
- importUtils/testCases: replace waitForTimeout(500) with grid visibility assert
- GlossaryAssets: add loader wait, remove silent .catch(() => false) pattern

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix CodeQL Zip Slip alert with Path.normalize() sanitization

CodeQL doesn't recognize String.contains("..") as proper Zip Slip
mitigation. Use Path.normalize() + isAbsolute/startsWith checks which
CodeQL's taint analysis model understands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Playwright flaky tests: modal visibility, toast race, query card assertion

- DataProductPersonaCustomization: wait for dialog close before clicking add-widget-button
- entity.ts restoreEntity: dismiss stale toast before restore to avoid race condition
- QueryEntity: replace page.$$() with auto-retrying expect().toBeVisible()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix flaky TableResourceIT by preventing parallel multi-domain rule mutation

Both test_multipleDomainInheritance (TableResourceIT) and
test_csvImportEntityRuleValidation (DatabaseServiceResourceIT) toggle
the global "Multiple Domains are not allowed" rule. When running
concurrently, one overwrites the other's setting causing spurious
failures. Add @ResourceLock("MULTI_DOMAIN_RULE") to serialize only
these two tests while keeping all others concurrent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:38:31 -07:00
sonika-shah
7bf6677276
Fix Asset count mismatch for Teams across different views (#26168)
* Fix Asset count mismatch for Teams across different views

* add playwright

* Improve asset count aggregation by filtering on owners.type = team for teams asset count

* address gitar

* move test to integration tests

* move test to integration tests

---------

Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2026-03-05 14:49:47 +05:30
Sriharsha Chintalapani
48b6b7a804
Improvement #26033: StorageServices missing form the entities SDK in the python client (#26164)
* StorageServices missing form the entities SDK in the python client

* fix(sdk): address table fluent review comments and pyright warning

* Fix checkstyle

* fix(sdk): align table reference typing and tests for CI

* test(sdk): isolate TestCaseMockTest default client state

* Fix pycheck
2026-03-02 07:04:55 -08:00
Sriharsha Chintalapani
ffa8084629
Fix #23101: Persona derived via teams (#26052)
* Fix #23101: Persona derived via teams

* Update generated TypeScript types

* Address comments

* add UI support

* add support for more test

* add locale files

* add more verification

* add more tests & implement fixtures

* Address comments

---------

Co-authored-by: Sriharsha Chintalapani <harsha.ch@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>
Co-authored-by: Harsh Vador <58542468+harsh-vador@users.noreply.github.com>
2026-03-01 21:04:15 -08:00
Suman Maharana
7d437b17c2
Fix dbt tab disappears after metadata ingest (#26044) 2026-02-26 19:07:02 +05:30
Copilot
6ed4b28925
Add POST /api/v1/users/generateToken endpoint for simplified token generation (#25052)
* Initial plan

* Add POST /api/v1/users/generateToken endpoint with permission checks

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

* Address code review: improve authorization logic and add interface documentation

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

* Address code review: use JWTTokenExpiry enum for type safety and proper default

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

* Update generated TypeScript types

* Add more tests

* add appropriate test support for bot

* address gitar

* POST generate token to return actual token rather than fernet encrypted value

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>
Co-authored-by: aji-aju <ajithprasad770@gmail.com>
Co-authored-by: Ajith Prasad <37380177+aji-aju@users.noreply.github.com>
2026-02-24 16:53:42 +05:30
IceS2
c591a609b8
FIX: add support for offset and limit on listing aggregations (#25943)
* add support for offset and limit on listing aggregations

* add tests

* fix couple issues

* fix couple issues

* fix couple issues

* fix couple issues

* fix couple issues

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2026-02-19 17:31:21 +01:00
Teddy
dfa632d5bb
fix: remove unused path parameter (#25825)
* fix: remove unused path parameter

* fix: remove extra path parameter from feed resource

* fix: remove extra path parameter in team resource

* fix: remove duplicate resource

* fix: remove duplicate resource

* fix: test-container errror

* fix: testContainer version

* fix: remove unsued body
2026-02-16 08:36:34 -08:00
Pere Miquel Brull
e6958defd1
FIX - SDK 2.0 minor fixes (#25839)
* FIX - SDK 2.0 minor fixes

* Add integration tests for SDK bug fixes, remove temp unit tests

Move regression tests for #2906 from mocked unit tests into proper
integration tests that exercise the full SDK against a live server:
search filters, search_advanced, delete_lineage, custom properties
with Pydantic UUID, get_versions with Pydantic UUID, and CSV export
without ERROR logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix format

* fix format

* Fix missing required params in delete_lineage for basedpyright

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix

* java sdk

* fix sdk

* fix comments

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 17:16:43 +01:00
Sriharsha Chintalapani
b244798f22
Add bulk apis for pipeline status (#25731)
* Add bulk apis for pipeline status

* Update generated TypeScript types

* Fix gitar comments

* Update generated TypeScript types

* Fix pycheck

* Address comments

* Fix databricks test

* Move schema changes to 1.11.9

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: harshsoni2024 <harshsoni2024@gmail.com>
2026-02-10 18:14:06 +05:30
Sriharsha Chintalapani
7d2c4803f9
Add validations for input/output ports (#25601)
* Add validations for input/output ports

* Add validations for input/output ports

* Fix tests

* addresss comments

---------

Co-authored-by: Sid <30566406+siddhant1@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harsha.ch@gmail.com>
2026-02-08 10:17:49 -08:00
Sriharsha Chintalapani
4cbd28704a
BulkAPIs should use bulkWrite/bulkUpdate methods to reduce the no.of queries and db connections (#25709)
* Add 20% threashold on bulk api connections and semaphores to control it

* Address comments

* Add bulk apis to use bulkWrite/bulkUpdate methods to avoid using too many db connections

* Add batch updates and remove semaphores

* Fix test failures; address comments

* Fix test failures

* Fix test failures

* Fix test failures

* Add comment section for bulk API support in DatabaseSchemaResourceIT

* Add CsvImportResult import to multiple test classes

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2026-02-08 10:15:45 -08:00
Pere Miquel Brull
6a1fb712c0
TEST - Add Data Contract ODCS tests (#25588)
* TEST - Add Data Contract ODCS tests

* fix

* fix playwrights

* improve validations

* Update generated TypeScript types

* improve validations

* improve validations

* improve validations

* Add RBAC permission tests for ODCS Import/Export

- Add ODCSImportExportPermissions.spec.ts with comprehensive RBAC tests
- Test scenarios for Admin, Data Consumer, Data Steward roles
- Test scenarios for users with DataContract EditAll and ViewOnly permissions
- Test table owner permissions for importing contracts
- Test API-level permission enforcement (403 for unauthorized import)
- Verify export is allowed for users with view permissions
- Verify import buttons are hidden for users without Create/EditAll permissions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix schema validation type compatibility for ODCS imports

- Add type compatibility checking for schema validation that considers
  types within the same family as compatible (e.g., VARCHAR/STRING,
  BIGINT/INT, DOUBLE/NUMBER)
- Make type mismatches non-blocking for contract creation - they are
  tracked for informational purposes but don't fail validation
- Fix test assertions for ODCS type mapping (STRING logical type maps
  to ColumnDataType.STRING, not VARCHAR)
- Fix test assertions for null vs empty list handling in validation
  responses

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix ODCS Playwright test failures

- Fix SLA schema format in OM round trip test (timeUnit -> unit, lowercase values)
- Fix table ID reference (entity.id -> entityResponseData.id)
- Remove tests with incorrect assumptions about UI button visibility
  (permissions are enforced at API level, not by hiding UI elements)
- Remove Table Owner permission test (ownership doesn't grant DataContract Create permission)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Display type mismatch warnings in ODCS Import UI

When schema validation detects type mismatches (e.g., expected INT, got STRING),
the UI now shows these as warnings in the success panel instead of hiding them.
The chip displays "Passed with Warnings" with a warning color scheme, and
each type mismatch is listed with a warning icon.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* improve validations

* Fix OM contract import showing ODCS validation errors

- Changed parseOpenMetadataContent to check for 'name' field instead of 'entity'
- Made parse error panel format-aware: shows ODCS required fields (APIVersion,
  Kind, Status) for ODCS imports, and OM required field (name) for OM imports
- Added translation key for OM format required fields error message

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* improve validations

* fix test

* Fix ODCS roles fixture to use correct field name

The ODCS schema defines the role identifier field as 'role', not 'name'.
Using 'name' caused the parser to set role to null, which made the
export default to 'data-consumer'. Fixed all role definitions in test
fixtures to use the correct 'role' field.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix ODCS merge mode test to check for correct role name

The test was expecting 'data-consumer' in the exported YAML, but the
input ODCS_VALID_FULL_YAML contains roles 'data_admin' and 'analyst'.
Updated assertion to check for 'analyst' which matches the input.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 15:10:39 +01:00
Suman Maharana
8c2ed92330
Fix #2778 - deleted pipelines shown in observability widget (#25553)
* Fix #2778 - deleted pipelines shown in observability widget

* address gitar comms

* fix tests

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2026-02-03 11:18:38 +05:30
Sriharsha Chintalapani
b09f4828c4
Learning Resources (#25005)
* Add Learning Resources with-in product

* Translations

* Add Learning Resources in-line with-in product

* Add Learning Resources in-line with-in product

* Potential fix for code scanning alert no. 1844: Incomplete URL substring sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Update generated TypeScript types

* Update the design

* Update the design

* Add leanring resources

* Update generated TypeScript types

* Add learning resources

* Update generated TypeScript types

* Address comments

* Address comments

* fixed build issue

* fix java checkstyle

* fixed initital bugs

* fixed less file name

* resolve conflict

* fixed failing unit test

* Address update issues, add more playwright tests

* Address update issues, add more playwright tests

* fixed code quality and updated all the missed pages with leanrning icon

* fixed invalid translation

* Added icon for rules library

* fixed unit tests

* replaced string with constants

* addressed comments

* resolved backend merge conflict

* removed plural label

* fixed header actions position

* fixed git-r comment

* added fixme to a test

* fixed label

* fixed flaky test

* Update generated TypeScript types

* removed playwright config file

* hide column view

* playwright fixes

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Parmar <83108871+dhruvjsx@users.noreply.github.com>
Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
2026-01-25 07:20:14 -08:00
Sriharsha Chintalapani
23f82b7e4a
Bulk import for columns metadata across the assets (#24012)
* Column Bulk Operations

* Column Bulk Operations

* Update generated TypeScript types

* fix build issues

* Merge with main

* Update generated TypeScript types

* Refactor bulk column update page

* Fix bulk edit

* Updates to bulk import

* Updates to bulk import

* Updates to bulk import

* Address comments

* Address comments

* minor layout changes

* update filters

* fix tests

* Update pt-pt.json

* fix tests

* Add Filters, fix tests

* refactor tests

* fix tests

* fix tests

* fix tests

* Update generated TypeScript types

* fix tests

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anuj-kumary <anujf0510@gmail.com>
Co-authored-by: karanh37 <karanh37@gmail.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
2026-01-23 15:11:09 +05:30
Sriharsha Chintalapani
102b8bed38
Ordinal position (#25198)
* Add ordinal position sorting in API & Backend

* Add ordinal position sorting in API & Backend

* Add ordinal position sorting in API & Backend

* Add ordinal position sorting in API & Backend

* Addressed PR comments

* fixed unit tests

* Fix ordinal position

* fix tests

---------

Co-authored-by: Rohit0301 <rj03012002@gmail.com>
Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
2026-01-21 20:36:28 -08:00
IceS2
b5120addda
FIX #25285: Fix Import for Test Cases (#25408)
* Fix Import for Test Cases

* Fix Tests

* Add result as partial success

* Throw error if failure to attach tests to bundle suite

* Update TestCaseRepository.java
2026-01-21 13:10:48 +01:00
Sriharsha Chintalapani
c23591023f
ODCS 3.1 support, ability export/import from OM -> ODCS and ODCS -> OM, ability merge or replace existing contract (#25132)
* ODCS 3.1 support, ability export/import from OM -> ODCS and ODCS -> OM, ability merge or replace exisiting contract

* ODCS 3.1 support, ability export/import from OM -> ODCS and ODCS -> OM, ability merge or replace exisiting contract

* Update generated TypeScript types

* Improve UI and validations

* Improve UI and validations, add more test coverage

* Update generated TypeScript types

* New design for import ODCS data contacts using MUI

* Unit tests for odcs contract imports

* Fix playwright for ODCS import/export

* Fix typescript issues in playwright

* Addressed review comments

* Remove duplicate odcs files

* Remove duplicate odcs files

* Address review comments after design team review

* Create separate util for repeated actions

* fix playwright tests

* ODCSImportExport playwright tests

* Fix data contract playwright tests

* Address review comments

* fix failing  e2e tests

* fix test failures

* fix tests

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
Co-authored-by: Shrabanti Paul <shrabantipaul@Shrabantis-MacBook-Pro.local>
Co-authored-by: SumanMaharana <sumanmaharana786@gmail.com>
2026-01-20 20:13:00 -08:00
Sriharsha Chintalapani
e6e5c3024f
Add fields param for Data Product I/0 list API (#25305)
* Add fields param for Data Product I/0 list API

* Add entityType for I/O Ports

* fix java checkstyle

* fix gitar comments
2026-01-20 17:52:16 -08:00
Sriharsha Chintalapani
682d8d141c
SFTP Connector for Drive Service (#25304)
* SFTP Connector for Drive Service

* Update generated TypeScript types

* SFTP Connector for Drive Service

* Fix code stying

* Fix pycheck

* Address gitar comments

* SFTP - address comments

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2026-01-20 13:41:19 -08:00
Ram Narayan Balaji
628901b963
Show Relevant Exclude Fields in EventBased Workflows (#25272)
* initial - Exclude Fields for Workflows

* Fix tests and exclude filter match

* Same Thread Execution

* Remove WorkflowEventConsumer from yaml

* Disable tests

* fix java checkstyle

* Add workflow exclude field type

* Review Comments

* Robust AppResourceIT

---------

Co-authored-by: anuj-kumary <anujf0510@gmail.com>
2026-01-19 18:27:36 +05:30
Sriharsha Chintalapani
aaa72512ef
Improve Data Products IO Ports APIs (#25202)
* Improve Data Products IO Ports APIs

* Update generated TypeScript types

* Address gitar comments

* Address gitar comments

* Address gitar comments

* Address gitar comments

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sid <30566406+siddhant1@users.noreply.github.com>
2026-01-12 22:42:38 -08:00
Sriharsha Chintalapani
5c5d1dbe9b
SDK Improvements (#25017) 2026-01-02 11:07:22 -08:00
Sriharsha Chintalapani
ab535900da
Faster tests (#24948)
* Add Parallel tests using the new SDK

* Make tests faster and use new SDK

* Add SDK based parallel tests

* Add SDK based parallel tests

* Fix from main

* Add Fluen APIs for Tests

* Add Fluen APIs for Tests

* Add missing Fluent APIs for SDK

* Add missing Fluent APIs for SDK - Data Contracts

* Migrate all the integration tests to new module

* Migrate all the integration tests to new module

* Improve pagination test performance

* Fix tests

* Migration Complete

* Fix the code styling; add github workflows, fix tags parallel issues

* Update migration tracker, address flaky tests

* Address comments

* rename env -> bootstrap for java package

* Fix YAML syntax in playwright-sso-tests.yml and update integration test workflows
2025-12-26 23:47:49 -08:00
Sriharsha Chintalapani
f8627a7d59
Add Input/Output ports for Data Product (#24554)
* Add Input/Output ports for Data Product

* Update generated TypeScript types

* Fix tests

* Update generated TypeScript types

* fix

* fix

* fix static

* Update generated TypeScript types

* trigger

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2025-12-23 09:38:57 +01:00
Ram Narayan Balaji
a12587ff9d
Fix: "Problem deserializing 'setterless' property 'dataProducts': get method returned null" SDK issue (#24857) 2025-12-17 21:42:09 +05:30
Karan Hotchandani
c8501f2f4f
preparing 1.12 branch (#24870) 2025-12-17 18:36:03 +05:30