mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
1418 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b62db6224f
|
feat(tasks): policy-driven authorization with self-approval guard (#28315)
* feat(tasks): policy-driven authorization with self-approval guard Moves Task resolve/close/reassign authorization from ~150 lines of custom Java in TaskRepository into the policy engine. Adds ResolveTask, CloseTask, ReassignTask MetadataOperation values, isTaskFiler/isTaskAssignee/isTaskReviewer SpEL conditions, and a new TaskAuthorPolicy seed. Closes the self-approval gap where a filer who was also in the assignees list could approve their own task (now denied via deny rule). TaskResourceContext.getOwners now returns target entity owners so isOwner() retains its conventional meaning; v200 migration backfills the new policy attachment on the DataConsumer role for upgrades. |
||
|
|
1dcf8dd60f
|
MCP Tool Usage (#28352)
* MCP Tool Usage * Update generated TypeScript types * Address PR review feedback on MCP usage tracking Reorder UA heuristic so VS Code wins over Claude CLI for composite User-Agents, refactor to a predicate list, and sanitise the resolved client name (trim, strip control chars, cap at 64 chars). Bound the schema field to match. Bound the latency aggregation lists in McpUsageResource with reservoir sampling so summary/per-tool percentile estimates stay valid without unbounded heap growth. Skip null-timestamp rows in the history loop and update the stale /history Swagger description to reflect the ok/fail shape. Convert CallToolOutcome to a Java record and update the recorder flow to use accessor methods. Fix the pre-existing regression in McpImpersonationTest where the mock still wired the legacy callTool path. Add DefaultToolContextTest with direct coverage for classifyException (all four ErrorCategory buckets, cause-chain walk, null message in chain) and the unknown-tool outcome. |
||
|
|
cdd0b3a0d0
|
fix: return 400 for malformed JSON Patch pointers instead of 500 (#28316)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + Elasticsearch + Redis / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + Elasticsearch + Redis / integration-tests-postgres-elasticsearch-redis (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
Client patches with paths missing the leading '/' (e.g., "displayName" instead of "/displayName") triggered jakarta.json.JsonException from JsonPointerImpl, which fell through the exception mapper and surfaced as an unhandled 500 (and Sentry alert) on PATCH endpoints such as ClassificationResource. - JsonUtils.applyPatch now validates each operation's 'path' and 'from' upfront, throwing IllegalArgumentException with a clear RFC 6901 message before the cryptic library exception fires. - CatalogGenericExceptionMapper maps jakarta.json.JsonException to 400 as defense in depth, covering other RFC 6902 violations (e.g., out-of-range array index, replace on missing path) that were also returning 500. - Added JsonUtilsTest cases for malformed 'path' and 'from' pointers. |
||
|
|
7042153e32
|
feat(context-center): search indexing + vector body text for memories/pages (#28314)
* feat(context-center): enable search indexing + vector body text for memories/pages ContextMemory was indexable in the schema layer but supportsSearch was false, so live indexing and bulk reindex did not include it. Vector embeddings for ContextMemory and Page fell through to the default description-only body text extractor, which produced near-empty embeddings since the actual content lives in title/question/answer (ContextMemory) and displayName/page payload (Page). Changes: - Add ES/OS index mapping for context_memory_search_index across en/ru/zh/jp - Register contextMemory in indexMapping.json with parentAliases=[all] - ContextMemoryIndex (TaggableIndex) flattens shareConfig into visibility + sharedWithIds, normalizes source UUIDs, and populates entity refs with display names - Wire SearchIndexFactory.buildIndex() + flip ContextMemoryRepository supportsSearch=true so create/update/delete fire live indexing - Flip supportsSearchIndex=true in ContextMemoryIT to inherit BaseEntityIT's 4 search-index tests - ContextMemoryBodyTextContributor concatenates title/summary/question/answer/ description for the vector embedding instead of just description - PageBodyTextContributor adds title (displayName) and, for QuickLink pages, the destination URL alongside the markdown description - Register both contributors via static initializers in their owning EntityRepositories, per the VectorBodyTextContributor convention Tests: 25 new unit tests across ContextMemoryIndexTest (10), ContextMemoryBodyTextContributorTest (6), PageBodyTextContributorTest (9). All passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(context-center): address Copilot review feedback on indexing PR - PageBodyTextContributor: fall back to page.getName() when displayName is null/blank so vectors always have a title (matches the convention in SearchIndex.populateCommonFields) - PageBodyTextContributor: log the exception object (not e.getMessage()) so the stack trace is available when debug logging is on - ContextMemoryIndex: null-guard each principal entry in shareConfig.sharedWith before dereferencing, so a malformed payload cannot NPE the indexer Added 2 tests covering both behaviors; existing tests adjusted for the new title-fallback default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6150892cbd
|
fix(ingestion-pipeline): keep pipelineStatus working when queued-status client returns immutable list (#28283)
GET /services/ingestionPipelines/{fqn}/pipelineStatus returned HTTP 500
(UnsupportedOperationException with null message) whenever a
PipelineServiceClient.getQueuedPipelineStatus implementation returned an
immutable list such as Collections.emptyList() — e.g. when the hybrid
runner is offline. The repository was assigning the returned list to its
accumulator and then calling addAll on it, which fails on an immutable
list and prevents the DB-backed status history from being returned.
|
||
|
|
f3cb89f671
|
feat(ui): allow admins to set default landing-page panel color (#28285)
* feat(ui): allow admins to set default landing-page panel color Adds an optional panelBackgroundColor field to ThemeConfiguration and a matching color picker on the admin Theme settings page. The landing page welcome panel now falls back to this admin-configured color when no per-user or per-persona override exists, preserving existing customization. |
||
|
|
d66afdef1c
|
feat(mysql): support custom queryHistoryTable for usage & lineage (#28260)
* feat(mysql): support custom queryHistoryTable for usage & lineage
MySQL lineage/usage extraction reads query history from one of two
hardcoded tables: `mysql.general_log` (default) and `mysql.slow_log`
(when `useSlowLogs=true`). Many deployments don't grant ingestion users
access to either of these system tables and instead replicate query
history into a custom table or view, leaving the connector with no way
to point at the replacement.
Add an optional `queryHistoryTable` field to MysqlConnection. When set,
it overrides the table read by the lineage/usage SQL and by the
test-connection probe — for both the general-log and slow-log paths.
Column expectations (`argument` vs `sql_text`, `event_time` vs
`start_time`) still follow `useSlowLogs`, so the custom table must
expose columns compatible with the selected path.
Mirrors the existing `queryHistoryTable` pattern in the Trino connector
(parameterized `{query_history_table}` placeholder in the SQL template,
resolved at format time from the service connection).
Fixes #28089
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Update generated TypeScript types
* fix(mysql): add queryHistoryTable to SDK builder example
The exhaustive MysqlConnection construction in the SDK builder example
enumerates every connection field; basedpyright (run with
--baselinemode=discard) flagged the newly added queryHistoryTable as a
missing constructor argument. Pass it explicitly as None to keep the
example exhaustive and the static-checks job green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
||
|
|
b1781f7324
|
feat(workflow): add inputPorts, outputPorts, glossaryTerms as WorkflowTriggerFields (#28120)
* feat(workflow): add inputPorts, outputPorts, glossaryTerms as WorkflowTriggerFields - Add inputPorts, outputPorts, glossaryTerms to workflowTriggerFields.json enum - Unit tests in FilterEntityImplTest verifying all three fields pass the passesFieldBasedFilter check, including include/exclude config - Integration test scenario appended to test_CustomApprovalWorkflowForNewEntities: adds table as inputPort, removes it, adds as outputPort — each port change must produce an approval task proving the workflow trigger fires correctly * Update generated TypeScript types * fix(workflow): fix inputPorts/outputPorts changes not triggering governance workflows - Add inputPorts, outputPorts, glossaryTerms to WorkflowTriggerFields enum - Fix executeBulkPortsOperation to update entity changeDescription so FilterEntityImpl reads the correct changed fields - Add governance-bot impersonation to applyPatchEntityFieldAction to prevent entityStatus updates from triggering spurious workflow signals - Pass caller username through port endpoints to fix self-approval check incorrectly removing the reviewer when entity updatedBy was set to the reviewer from a prior task resolution --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
7485c5b421
|
feat: add ContextMemory entity (Context Center memories) (#28224)
* feat(spec): add ContextMemory + CreateContextMemory JSON schemas * feat(jdbi3): add ContextMemoryDAO * feat: register contextMemory entity type constant * feat(service): add ContextMemory repository, resource, mapper * feat(bootstrap): add context_memory table DDL * test(service): ContextMemory resource CRUD test * fix(context-memory): address review (relationship types, stable FQN, status msg, test name) - storeRelationships: rootMemory -> Relationship.CONTAINS, parentMemory -> Relationship.HAS so the root-ancestor and direct-parent hierarchies are distinguishable. - setFullyQualifiedName: derive from the immutable name only (drop mutable primaryEntity/owner derivation that destabilized nameHash on update). - validateStatusTransition: separate "no transitions defined" from "disallowed transition". - Rename ContextMemoryResourceTest -> ContextMemoryStatusTransitionTest (pure unit test). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(context-memory): add ContextMemoryIT + SDK ContextMemoryService Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(spec): register contextMemory in EntityLink.g4 ENTITY_TYPE grammar EntityLinkGrammarTest.testAllEntityTypesHaveGrammarOrExclusion enumerates every Entity.java constant and requires each to be in the EntityLink grammar or the test's exclusion list. ContextMemory is a normal EntityRepository-backed top-level entity (like learningResource / contextFile), so it belongs in the ENTITY_TYPE rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(context-memory): override owner ITs for creator-as-owner default ContextMemoryMapper.defaultOwners() intentionally assigns the creating user as owner when the create request omits owners. BaseEntityIT's patch_entityUpdateOwner_200 and patch_entityUpdateOwnerFromNull_200 assert "no owner initially" for any supportsOwners entity, so both failed for ContextMemory. Override both in ContextMemoryIT: keep the PATCH-replace-owner contract, change only the precondition to expect the creator as the sole initial owner (asserted by count, not a hardcoded principal). Mapper unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update generated TypeScript types Add the generated ContextMemory TS types (entity/context/contextMemory.ts, api/context/createContextMemory.ts). The schemas were on the branch but their generated types were missing, failing the TypeScript Type Generation check on this fork PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(context-memory): address review (relationship cleanup, owner scope, validations) Copilot review on the ContextMemory entity: - #1 record primaryEntity/relatedEntities/root/parent/source*/machineRepresentation in version history; usageCount/lastUsedAt documented as untracked telemetry - #2 clear stale HAS/RELATED_TO/CONTAINS edges before re-adding in storeRelationships - #4 default creator as owner only on create; PUT without owners no longer silently replaces previously set owners - #5 schema documents that any status is allowed at creation; transitions enforced only on update - #6 setFullyQualifiedName via FullyQualifiedName.build with skip-if-set guard - #7 validate shared principal type is user/team/domain - #8 reject self-reference for parentMemory/rootMemory - #10 inline Entity.CONTEXT_MEMORY, drop redundant constant Regenerate ContextMemory TS types for the schema doc change; add IT coverage for the self-reference and invalid-shared-principal validations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(context-memory): don't blanket-delete relationships (domain data loss) The #2 cleanup via deleteTo(memory, CONTEXT_MEMORY, HAS, null) also matched the framework's domain --HAS--> memory edge (storeDomains runs before storeRelationships in storeRelationshipsInternal, on every create and update), silently dropping domain assignments. storeRelationships is now add-only (addRelationship upserts, so re-running on update is idempotent). Stale-edge cleanup moved to ContextMemoryUpdater using the framework's updateFromRelationship(s) helpers, which delete only the specific changed refs and record the version change. parentMemory now uses Relationship.PARENT_OF (distinct from primaryEntity's HAS and the framework's domain HAS) so the parent edge can be maintained without collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(bootstrap): move context_memory DDL from 2.0.1 to 2.0.0 The context_memory table belongs in the 2.0.0 migration. Relocated the MySQL and Postgres DDL verbatim; the 2.0.1 schemaChanges.sql files are restored to their original task_migration_mapping-only content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(bootstrap): add ENGINE=InnoDB to context_memory MySQL DDL Explicit engine clause, consistent with the task/search-index tables in the same migration and robust to any server default change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(context-memory): preserve sanitized/validated fields; validate relatedEntities Review follow-ups: - ContextMemoryMapper no longer re-sets description/owners/domains/tags/displayName after copy(). copy() sanitizes description (stored-XSS) and validates owners and domains; re-setting the raw request values bypassed both. Only ContextMemory- specific fields are set now. - prepare() now assigns the result of EntityUtil.populateEntityReferences back onto relatedEntities so orphaned/invalid refs are filtered instead of persisted. - ContextMemoryIT Javadoc now references ContextMemoryRepository#setCreatorAsDefaultOwner (the defaultOwners mapper method no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
2b70fb4957
|
Fixes open-metadata/openmetadata-collate#4122: flatten nested children to avoid ES mapping depth limit (#28214)
* fix(search): flatten nested schemaFields children to avoid ES mapping depth limit
Search reindexing failed at the Sink stage with "Failed to parse" for
Topics and API Endpoints whose schemas nest records deeper than ~17
levels. The recursive messageSchema/requestSchema/responseSchema
`schemaFields[].children` tree pushed the Elasticsearch object path past
the default `index.mapping.depth.limit` of 20, so the bulk item was
rejected with a mapper_parsing_exception.
Map the recursive `children` field as `flattened` (auto-translated to
`flat_object` on OpenSearch by OsUtils) so the entire child subtree
collapses into a single field, capping object depth at 3 regardless of
how deep the schema nests. Top-level `schemaFields.{name,description,...}`
keep their normal analyzed mapping, so search/sort/aggregations there are
unchanged. zh/topic had no `children` mapping at all, so it is added to
prevent dynamic mapping from re-introducing the depth blow-up.
Drop the `schemaFields.children.keyword` boosted fields from
TopicIndex/APIEndpointIndex getFields() since `flattened` has no
`.keyword` sub-field.
Fixes #4122
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): flatten nested schemaFields children to avoid ES mapping depth limit
Search reindexing failed at the Sink stage with "Failed to parse" for
Topics and API Endpoints whose schemas nest records deeper than ~17
levels. The recursive messageSchema/requestSchema/responseSchema
`schemaFields[].children` tree pushed the Elasticsearch object path past
the default `index.mapping.depth.limit` of 20, so the bulk item was
rejected with a mapper_parsing_exception.
Map the recursive `children` field as `flattened` (auto-translated to
`flat_object` on OpenSearch by OsUtils) so the entire child subtree
collapses into a single field, capping object depth at 3 regardless of
how deep the schema nests. Top-level `schemaFields.{name,description,...}`
keep their normal analyzed mapping, so search/sort/aggregations there are
unchanged. zh/topic had no `children` mapping at all, so it is added to
prevent dynamic mapping from re-introducing the depth blow-up.
Drop the `schemaFields.children.keyword` boosted fields from
TopicIndex/APIEndpointIndex getFields() since `flattened` has no
`.keyword` sub-field.
Fixes open-metadata/openmetadata-collate#4122
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): flatten nested children for table, container, dashboardDataModel, worksheet, searchIndex
Extends the schemaFields depth-limit fix to the remaining entities whose
recursive `columns`/`fields` `children` trees blow past Elasticsearch's
default `index.mapping.depth.limit` of 20, failing search indexing with
mapper_parsing_exception ("Failed to parse"):
- table, dashboardDataModel, worksheet -> columns.children
- container -> dataModel.columns.children
- searchIndex -> fields.children
Map the recursive `children` field as `flattened` (auto-translated to
`flat_object` on OpenSearch) in all 4 locales. table/dashboardDataModel/
container had no `children` mapping at all, so it is added to stop dynamic
mapping from re-introducing the depth blow-up; worksheet/searchIndex had
it as a one-level object and are converted.
Also drop `.keyword` references to the now-flattened children fields,
which `flattened` does not provide:
- SearchEntityIndex.getFields(): remove `fields.children.name.keyword`
- searchSettings.json: retarget Topic/APIEndpoint/Table exact-match
boosts and the documented field list off `*.children*.keyword` onto
the flattened `*.children.name` virtual keys (addresses PR review).
Fixes open-metadata/openmetadata-collate#4122
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(ui): update search settings mock for flattened columns.children
The "Restore default search settings" Playwright test deep-equals the
reset `table` config against `mockEntitySearchConfig`. Update the mock's
`columns.children` searchField from `columns.children.name.keyword` to
`columns.children.name` to match searchSettings.json, since the
flattened `children` mapping no longer exposes a `.keyword` sub-field.
Fixes open-metadata/openmetadata-collate#4122
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): drop flattened children field from container highlightFields
`dataModel.columns.children.name` was listed in the container
`highlightFields`. Now that `children` is mapped as `flattened`
(`flat_object` on OpenSearch), highlighting a flat-object sub-field is
rejected — OpenSearch fails the search with
`search_phase_execution_exception` (HTTP 400), so every container
`/api/v1/search/query` returned 500 and the Explore summary panel could
not open.
Remove the field from container `highlightFields`. Highlighting of
deeply-nested child column names is dropped (accepted flattened
trade-off); top-level column highlighting is unaffected. Container is
the only entity whose `highlightFields` referenced a recursive
`children` path.
Fixes open-metadata/openmetadata-collate#4122
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
776b3d1460
|
fix(search-index): add warning records to table and stop marking clean reindex jobs as failed (#28227)
* fix(search-index): stop marking clean reindex jobs as failed A distributed search reindex was being stored with status `failed` and an empty `failureContext` even though `failedRecords` was 0 across every stage. Root cause: two independent entity-count implementations disagree. `DistributedIndexingStrategy.getEntityTotal` (ListFilter(null)) seeds `entityStats.totalRecords`, while `PartitionCalculator.getEntityCount` (ListFilter(Include.ALL)) sizes the actual partitions. For a churny time-series entity (e.g. testCaseResolutionStatus) the two drift — the pre-count saw 11, the partition plan covered 9. All 9 were indexed cleanly, but `StatsReconciler` kept the stale pre-count as the job total, producing a phantom `total > success` gap. `hasIncompleteProcessing` escalated that gap to `COMPLETED_WITH_ERRORS` -> `ACTIVE_ERROR`, which `OmAppJobListener` collapsed to `FAILED`. Changes: - hasIncompleteProcessing now treats only `failedRecords > 0` as an error; a total/success gap is never a failure. - updateEntityStats sets per-entity totalRecords from the partition plan, the authoritative "what we process", so total and success agree. - getEntityTotal's time-series path uses ListFilter(Include.ALL) to match PartitionCalculator at the source. - Thread warningRecords end-to-end (SearchIndexJob.EntityTypeStats, the coordinator, the stats mapper/aggregator, StatsReconciler) so warnings are counted instead of silently dropped — and never counted as failures. - Record stale-relationship orphans (READER_RELATIONSHIP_WARNING) in the search_index_failures table for operator visibility; countFailuresByJobId excludes them so failureRecordCount stays a failure count. Fixes open-metadata/openmetadata-collate#4099 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(search-index): include warnings in processedRecords and reconciler total Addresses PR review: making totalRecords = success + failed + warnings left two counters out of sync. - toEntityTypeStats / getJobWithAggregatedStats now include warnings in processedRecords, so getProgressPercent() reaches 100% when a job finishes with warnings instead of appearing stuck below 100%. - StatsReconciler.reconcile computedTotal now includes readerWarnings, so the "Stats discrepancy detected" warning no longer fires when the gap is fully explained by stale-relationship warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b69a4026fd
|
feat(context-center): sortBy on lists, drive-file move, IT coverage (#28096)
* feat(context-center): add sortBy, drive-file move, and IT coverage
- Add `sortBy`/`sortOrder`/`offset` to the article list (`/v1/knowledgeCenter`)
and drive-file list (`/v1/drive/files`). When `sortBy` is set, the request is
routed through `listInternalFromSearch` against OpenSearch; cursor pagination
(`before`/`after`) is rejected with HTTP 400 to keep the contract explicit.
Supported values: `name`, `createdAt` (aliased to `updatedAt` for v1),
`updatedAt`. Default `sortOrder` is `desc` when `sortBy` is set.
- Add `PUT /v1/drive/files/{id}/move` for moving a drive file between folders
or to the drive root. Sync handler (ContextFile is a leaf, no FQN cascade).
Body: new `MoveContextFileRequest` schema with an optional `folder`
EntityReference.
- Fix `ContextFileUpdater.entitySpecificUpdate` so folder changes are recorded
in the change description and the `CONTAINS` relationship edge is rewired
to the new folder. Without this, the move would silently update JSON only.
- Integration tests in `ContextFileIT` (move between folders, move to root,
non-existent folder, unprivileged-user 403, sort by updatedAt/name, cursor
combo rejection) and `KnowledgeCenterIT` (sort by updatedAt, name, createdAt
alias, cursor combo rejection).
|
||
|
|
c2dd3be26d
|
feat(query-runner): storage config + websocket notifications (#27542)
* feat(query-runner): add storage config + websocket notifications
Adds schema + infrastructure for the Collate Query Runner to stream results
to object storage and notify the UI via websocket when a query completes.
- queryRunnerRequest: runtime-injected storageConfig {bucketName, prefix}
and resultPath. Credentials never enter the payload — the worker fetches
them via a server callback so secret: refs resolve in-worker via
CustomSecretStr.
- queryRunnerResponse: resultPath for S3-backed results (mutually exclusive
with inline results).
- WebSocketManager: queryRunnerChannel constant.
- WebsocketNotificationHandler + QueryRunnerMessage: COMPLETED/FAILED
notifications for the UI hook.
- SOCKET_EVENTS.QUERY_RUNNER_CHANNEL so the UI hook can subscribe.
Paired with open-metadata/openmetadata-collate query_runner_socket_flow.
* Update generated TypeScript types
* formatting
* intermediate messgage
* chore(deps): bump sqlalchemy-pytds to ~=1.0
0.3.x's connector calls tds.skipall, but python-tds 1.x moved that to
tds_base.skipall — every server-side cursor fetch raises AttributeError
on TABNAME / COLINFO tokens. 1.0+ uses the correct module path.
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
||
|
|
1b35775d8a
|
fix(dq-dashboard): denormalize certification onto testCase index + filter plumbing (#28084)
* fix(dq-dashboard): denormalize certification onto testCase index and add filter plumbing * fix(dq-dashboard): cascade certification to all table-child search indices |
||
|
|
6c8441a06d
|
fix(search): skip soft-delete script propagation to time-series child aliases (#28064)
* Entity Time Series * test case results * Add Playwright * fix: declare 'summary' on TestSuiteIndex so reindex preserves lastResultTimestamp TestSuiteIndex.buildSearchIndexDocInternal computes the top-level lastResultTimestamp field from testSuite.getTestCaseResultSummary(). TestSuiteRepository registers a fetcher for that under the field name "summary"; the reindex path only invokes fetchers whose field is in getRequiredReindexFields(). Without "summary" declared, the fetcher does not run, testCaseResultSummary stays null on the entity, and the Index takes its else branch (TestSuiteIndex.java:41) writing lastResultTimestamp=0L on every reindexed suite. That field is exactly what the DQ /data-quality/test-suites list page sorts by (TestSuites.component.tsx:175), so a full reindex collapses every basic suite to the 1970 epoch and "most recently run first" stops working. Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com> |
||
|
|
1352d67cf4
|
feat(dar): Granted lifecycle, filters, sort, and self-service create policy (#28044)
* feat(dar): add Granted lifecycle, filters, sort, and self-service create policy Splits the Data Access Request lifecycle into Approved (awaiting grant) and Granted (active access) so the UI can show an "approved – awaiting grant" banner that clears once an admin marks the request as granted. Adds an indexed approvedBy/approvedById/approvedAt on Task, captured at the approve transition through a new direct-persist helper. Introduces a dedicated /v1/tasks/dataAccessRequests endpoint pre-scoped to category=DataAccess with DAR filters (dataset, service, status, requestedBy, approver, accessType) and an asc/desc sort on createdAt; generic /v1/tasks gains service/approver filters too. DataConsumerPolicy now grants Create on resource=task so authenticated non-admins can file a DAR (fixes "operations [Create] not allowed"). Reworks the workflow handler so transitions whose targetTaskStatus is non-terminal (Approved, Granted) don't close the task, and updates CreateTask.isTerminalTaskStatus to allow advancing between Approved → Granted stages. Adds a new "active" statusGroup that includes the DAR lifecycle states while preserving the existing open/closed semantics that Glossary-style workflows depend on. Includes a Postgres + MySQL migration for the indexed approvedById generated column and integration coverage in DataAccessRequestIT spanning the new lifecycle, filters, sorting, approver capture, and the non-admin policy path. Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: anuj-kumary <anujf0510@gmail.com> Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com> Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com> |
||
|
|
606fe7ba37
|
fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency (#28117)
* fix(rdf): converge Fuseki state on weekly rebuilds and isolate API latency
RdfIndexApp ran daily and never reconciled removed relationships, so triples
grew unboundedly across runs. When Fuseki crash-looped on the resulting disk
pressure, every entity-write hook blocked synchronously on the unreachable
server (no HTTP connect timeout, 3-retry loop on ConnectException), saturating
the bounded AsyncService pool and pushing login to ~45s.
Storage-side fixes (stop growth):
- Drop the extractRelationshipTriples "preserve forward" path in
RdfRepository.createOrUpdate; the translator is the source of truth and the
surrounding orchestration already rewrites the current relationship set.
This also removes a wasted CONSTRUCT round-trip per entity write.
- bulkStoreRelationships now does per-source-entity DELETE WHERE with a
predicate-exclusion FILTER for lineage edges, so relationships that no
longer exist actually leave the store.
- Wire RdfRepository.clearAllGlossaryTermRelations() into RdfIndexApp's
initializeJob (the method existed but had no callers).
- Flip recreateIndex default to true and move the cron to Saturday midnight
("0 0 * * 6"). Add reloadOntologies() so CLEAR ALL doesn't leave the
ontology graph empty before indexing starts.
- Include a 2.0.1 post-data migration that updates existing installed_apps
rows; the app loader is insert-only on upgrade.
Connectivity / concurrency fixes (isolate API latency from Fuseki health):
- Add 2s connectTimeout to every JenaFusekiStorage HttpClient and fast-fail
on ConnectException / ClosedChannelException / HttpConnectTimeoutException
instead of retrying. Introduce a 5-failure/30s circuit breaker.
- Route all RdfUpdater mutators through AsyncService.execute with a bounded
pendingWrites gate (cap 1000, drop-on-overflow with logged warning) so a
dead Fuseki can no longer block request threads or starve the AsyncService
pool.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): address PR review — preserve relationships, scope DELETEs, surface ontology failures
PR #28117 review feedback. Addresses 13 findings across gitar-bot and Copilot:
Storage correctness:
- JenaFusekiStorage.storeEntity now keeps URI-valued triples (relationships)
and only refreshes literal-valued triples. A metadata-only PATCH would
otherwise wipe every inter-entity edge until the next weekly recreate-index,
and async ordering between updateEntity and addRelationship could leave the
graph missing edges (Copilot #1, #2).
- RdfRepository.removeRelationship wraps the DELETE in the knowledge named
graph and uses getRelationshipPredicate so the predicate URI matches what
addRelationship actually wrote (e.g. UPSTREAM → prov:wasDerivedFrom). The
previous bare DELETE in the default graph was a silent no-op (Copilot #3).
- RdfBatchProcessor now calls a new RdfRepository.clearOutgoingEntityRelationships
for every entity in the batch, not just those with current edges. An entity
whose last outgoing relationship was removed in MySQL contributes zero
RelationshipData entries, so bulkStoreRelationships' per-source DELETE
never fired for it (Copilot #4).
- bulkStoreRelationships no longer swallows non-connect DELETE errors —
DELETE WHERE on a source with no edges is a no-op, so exceptions there
are real failures (malformed SPARQL, auth, server errors) and should
surface (Copilot #5).
Visibility:
- reloadOntologies() now checks areOntologiesLoaded() after load and throws
if still empty. OntologyLoader.loadOntologies catches internally, so the
old reloadOntologies always appeared to succeed (Copilot #6).
- clearAllGlossaryTermRelations rethrows on failure instead of silently
logging — the indexer's caller can now react to cleanup failures (Copilot #10).
- clearAllGlossaryTermRelations pulls custom predicate URIs from
GlossaryTermRelationSettings and includes them in the DELETE FILTER. The
hardcoded list missed any custom predicates an admin configured (Copilot #7).
Quality:
- Set / LinkedHashSet imported instead of using java.util.* fully qualified
in JenaFusekiStorage and RdfBatchProcessor (gitar-bot #2).
- RdfIndexAppTest uses InOrder to assert clearAll → reloadOntologies
ordering — a plain verify would have accepted a future change that
reordered the calls (Copilot #9).
- Documented the residual gap that HttpClient.connectTimeout only bounds
TCP connect, not request bodies; circuit breaker + bounded pendingWrites
contain the blast radius (Copilot #8).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(rdf): expect per-source clear on batches whose relationships are all filtered
The two EventSubscription-skip tests used verifyNoInteractions on the RDF
repository mock, which was valid before because filtered batches never
touched RDF. The new per-source reconciliation clear in
RdfBatchProcessor.processBatchRelationships now runs for every batch entity
regardless of whether its relationships survive filtering — that's
deliberate, since stale RDF state for those source entities still needs
to be reconciled even when their current MySQL edges all point to excluded
entity types. Switch the assertions to verify clearOutgoingEntityRelationships
is the sole interaction (no bulkAdd, no addRelationship).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): address remaining PR review nits
Three findings from the second gitar-bot review pass:
- Replace the fully qualified `org.openmetadata.schema.configuration.GlossaryTermRelationSettings` / `SettingsType` / `SettingsCache` references in clearAllGlossaryTermRelations with imports, matching the project's existing convention. Other pre-existing FQN usages in the same file are left alone (not part of this PR's scope).
- Make expandPredicateCurie throw IllegalArgumentException on null/empty input instead of silently defaulting to `om:relatedTo`. The current caller already null-guards so the path is unreachable today, but a future caller could otherwise silently miss-clean a misconfigured predicate.
- Document why the lineage predicate URIs in the reconciliation DELETE filter (UPSTREAM / hasLineageDetails) are literal-hardcoded rather than baseUri-derived: they match what addLineageWithDetails actually writes (also hardcoded at RdfRepository.java:423,435). Switching the filter to be baseUri-derived would stop matching the stored lineage triples on non-default baseUri deployments and would incorrectly delete them. Comment added in both clearOutgoingEntityRelationships and bulkStoreRelationships so the next reader doesn't get nudged into "fixing" it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): surface cleanup failures, sync fallback predicates, time-bound reads
Addresses the three unresolved Copilot findings from review 4295208187:
- Drop the try/catch around clearAllGlossaryTermRelations in initializeJob.
clearAllGlossaryTermRelations rethrows specifically so the indexer can fail
loudly; wrapping it again let an unreconciled graph slip past as a
"successful" run. The outer execute() handler will now mark the run FAILED.
- Sync DEFAULT_GLOSSARY_TERM_RELATION_PREDICATES with what SettingsCache
actually bootstraps (SettingsCache.java:355-486): adds skos:exactMatch (the
real default for `synonym`), om:antonym, om:partOf, om:hasPart, rdfs:seeAlso.
Keeps legacy om:* URIs from the stale getGlossaryTermRelationPredicateUri
switch so a cleanup run still scrubs pre-SettingsCache data.
- Apply READ_TIMEOUT_MS (10s) via QueryExecution.setTimeout on every read path
(executeSparqlQuery for SELECT/CONSTRUCT/ASK/DESCRIBE, getEntity, getAllGraphs,
getTripleCount, testConnection, the ontology presence check). A Fuseki that
accepts the TCP connection but stalls mid-query no longer hangs reads
indefinitely. UPDATE-side calls still rely on the connect timeout + circuit
breaker + bounded pendingWrites since Jena's RDFConnection.update API
doesn't expose a per-request timeout cleanly; comment near the constant
notes the gap and a viable follow-up via UpdateExecHTTPBuilder.timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): qualify EntityRelationship in test to fix compile
RdfIndexAppTest references EntityRelationship.class in two verify() calls
that I added in the previous commit, but the class was never imported into
the test file. CI's openmetadata-service test compile fails with "cannot
find symbol class EntityRelationship", which cascades into 11 dependent
checks (build x2, openmetadata-service-unit-tests, three Java integration
test workflows, two Python integration test shards that build OM as a
setup step, Test Report aggregate, maven-sonarcloud-ci, and the unit-test
status gate). Use the fully qualified
org.openmetadata.schema.type.EntityRelationship to match how every other
reference in this file already spells it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): drop QueryExecution.setTimeout — removed in Jena 5 used by IT classpath
GlossaryOntologyExportIT was failing on RdfUpdater.initialize with
NoSuchMethodError: 'void org.apache.jena.query.QueryExecution.setTimeout(long,
java.util.concurrent.TimeUnit)'. openmetadata-service builds against Jena 4.10
(apache-jena-libs), but openmetadata-integration-tests directly pulls in
jena-core/jena-arq 5.0.0, and Jena 5 removed the setTimeout overloads from
the QueryExecution interface. Compile passes, integration test JVM links the
5.x class and bombs at the first read path (loadOntology's ASK check).
Strip the nine setTimeout calls and the READ_TIMEOUT_MS constant. A clean
read-side timeout that works on both Jena 4 and 5 needs to be plumbed via
QueryExecutionHTTPBuilder.timeout / UpdateExecHTTPBuilder.timeout instead of
RDFConnection — bigger change than this PR should carry. The comment near
CONNECT_TIMEOUT now records that history so the next reader knows why we
don't simply re-add setTimeout. Protection against a stalled-but-accepting
Fuseki still relies on the 5-failure circuit breaker + bounded pendingWrites
gate in RdfUpdater.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): align ontology-loaded check, predicate URIs, and CURIE fallback
Three real bugs flagged by Copilot's later review passes:
- areOntologiesLoaded() looked for `"boolean" : true` (space before colon) but
JenaFusekiStorage formats ASK results without that space, so the check never
matched and reloadOntologies() always threw. recreateIndex=true (now the
default) ran into this on the very first scheduled run. Normalise whitespace
before checking.
- bulkAddRelationships wrote `om:<lowercase relationshipType>` directly, while
removeRelationship uses getRelationshipPredicate which maps a handful of
types to prov:* (UPSTREAM → prov:wasDerivedFrom, USES → prov:used, etc.).
Triples written by the indexer were therefore unreachable by the live
remove hook. Pre-compute predicateUri via getRelationshipPredicate in
bulkAddRelationships and pass it through a new field on RelationshipData
so JenaFusekiStorage uses the same URI both paths agree on. The legacy
RelationshipData(5-arg) ctor still works for callers that don't have a
predicate handy; bulkStoreRelationships falls back to the old shape there.
- expandPredicateCurie returned bare strings like `customRel` unchanged, but
createPropertyFromUri's default branch writes `<baseUri>ontology/customRel`.
Custom relation predicates expressed as local names would never match the
cleanup FILTER. Mirror createPropertyFromUri: full URIs pass through, bare
local names get the OM-ontology prefix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): schema default + migration force entities=[all] for safe full reindex
- rdfIndexingAppConfig.json: flip recreateIndex.default from false to true so
any UI form / config generation path that surfaces the schema default agrees
with the install JSON files and the new full-rebuild semantics.
- 2.0.1 migration (MySQL + Postgres): in addition to flipping recreateIndex=true
and the weekly Saturday cron, also rewrite appConfiguration.entities to
["all"]. Pre-upgrade an operator could have narrowed RDF indexing to a subset
of entity types; the new recreateIndex=true semantics issues CLEAR ALL before
indexing, which would otherwise wipe triples for excluded entity types and
leave the graph permanently missing them. Forcing entities back to ["all"]
ensures the post-CLEAR-ALL run repopulates the graph fully. Operators can
re-narrow after the migration if they need partial indexing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): scope storeEntity DELETE to translator-managed predicates
Replace the literal-only FILTER(!isIRI(?o)) in JenaFusekiStorage.storeEntity
with a predicate-scoped DELETE so translator-emitted URI triples (tags,
glossary terms, owner, domain, tier, data products, structured sub-resources)
are refreshed from the new model on every entity write, while hook-managed
predicates (om:UPSTREAM, om:hasLineageDetails, om:owns / om:contains / ...)
stay intact.
Previously, with !isIRI(?o), every URI-valued triple survived storeEntity
forever — when a tag was removed or an owner changed, the old URI coexisted
with the new one because no hook ever cleans those up (tags live in
tag_usage, not entity_relationship; owners' translator-side predicate
om:hasOwner is not what the OWNS hook writes).
The DELETE set is the union of:
- RdfPropertyMapper.TRANSLATOR_MANAGED_DIRECT_PREDICATES, a static list of
predicates that may shrink to empty between writes (so the current model
walk wouldn't see them) — rdf:type, om:hasOwner, prov:wasAttributedTo,
om:hasTag, om:hasGlossaryTerm, om:hasTier, om:belongsToDomain,
om:hasDataProduct, dct:source, om:sourceUrl, plus the structured-resource
attachment predicates (om:hasLifeCycle / hasCertification / hasExtension /
hasCustomProperty).
- the predicates the current model actually emits for the entity subject,
covering JSON-LD context-driven predicates that aren't in the static list.
Added two coverage tests on RdfPropertyMapperTest: the static set contains
the documented core predicates, and never contains lineage-hook predicates
(om:UPSTREAM, prov:wasDerivedFrom, om:hasLineageDetails) — that overlap
would let storeEntity wipe lineage edges on every entity update.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): scope reconciliation DELETE to relationship-hook predicates only
Both clearOutgoingEntityRelationships (in RdfRepository) and the per-source
DELETE inside JenaFusekiStorage.bulkStoreRelationships used to clear ANY
outgoing edge whose object was a baseUri/entity/ URI (with only the three
lineage predicates excluded). That swept up translator-managed URI triples
(om:hasTag, om:hasGlossaryTerm, om:hasOwner, om:belongsToDomain, …) which
bulkAddRelationships does not re-emit, so reconciliation runs were
permanently destroying tag/owner/domain links.
Switch the filter to opt-in: only delete triples whose predicate is in
RELATIONSHIP_HOOK_PREDICATES, derived from the Relationship enum via the
existing getRelationshipPredicate mapping. The set excludes the lineage
predicates by skipping the UPSTREAM enum value (managed by
addLineageWithDetails). Translator-managed predicates aren't relationship
types so they're naturally not in the set; the new
RdfPredicatePartitionTest enforces the partition.
Refactored getRelationshipPredicate into a static
getRelationshipPredicateUri so it can be reused at class-init time to build
the predicate set without an instance. Added a small buildPredicateInList
helper exposed at package level for JenaFusekiStorage to reuse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): scope bulk reconciliation to batch entities, not all relationship sources
bulkStoreRelationships used to compute its per-source DELETE set from the
relationships list, so any source URI mentioned by any row in the batch was
reconciled. RdfBatchProcessor passes BOTH outgoing relationships (sources
inside the batch) and incoming UPSTREAM lineage (sources outside the batch
where this batch's entity is the target). The outside-batch sources had
their OTHER outgoing edges wiped, even though the indexer never planned to
re-index them.
Add a 2-arg overload to RdfStorageInterface.bulkStoreRelationships that
takes an explicit Set<String> sourcesToReconcile. The default 1-arg method
keeps the legacy "derive from relationships" behavior for any plugin caller
that hasn't migrated. RdfRepository.bulkAddRelationships gains a matching
overload taking Set<EntitySourceRef>; RdfBatchProcessor passes its
batchSources (the entities IT is actually indexing in this pass).
JenaFusekiStorage.bulkStoreRelationships now iterates sourcesToReconcile for
the per-source DELETE instead of computing distinctSources from
relationships. The new buildEntityUri helper on the interface lets callers
(or the default delegate) build consistent source URIs.
QLeverStorage stubs the new overload (still UnsupportedOperationException).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): time-bound HTTP request bodies via CompletableFuture wrapper
Wrap every blocking RDFConnection call in the hot read/write paths
(storeEntity DELETE+LOAD, storeRelationship, bulkStoreRelationships,
getEntity, deleteEntity, executeSparqlQuery, executeSparqlUpdate) with a
CompletableFuture-based 10s request timeout. When Fuseki accepts the TCP
connection and then stalls on the response, the caller thread now frees
after 10s instead of waiting until the OS gives up on the socket (~60s).
We chose CompletableFuture over Jena's QueryExecution.setTimeout because
that overload was removed in Jena 5 (broke integration tests already once
in this PR), and over Jena's QueryExecutionHTTPBuilder / UpdateExecHTTPBuilder
because their API surface differs between Jena 4 and Jena 5 and our two
classpaths use different versions. The CompletableFuture wrapper is Jena-
API-agnostic.
On timeout the underlying HTTP request still leaks its (virtual) thread
until OS-level TCP give-up; that's bounded by the existing circuit breaker
(after 5 timeouts the breaker opens for 30s, short-circuiting subsequent
traffic).
Lower-traffic paths (loadTurtleFile, clearGraph, getAllGraphs, getTripleCount,
loadOntology, testConnection) keep using the direct connection.update /
connection.query / connection.load calls — they're protected by the
circuit breaker and the connect timeout, and adding wrappers there is
churn without proportional benefit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(rdf): document RdfUpdater async-ordering trade-off in submitAsync
Add a comment block in RdfUpdater.submitAsync explaining why we accept the
loss of per-entity ordering when submitting through AsyncService:
- EntityUpdater diff-applies changes per request, so add-then-remove of the
same edge within one API call nets to no-op (no hooks fire).
- Cross-request races reconcile at the next weekly recreate-index, which
rebuilds from MySQL.
- The alternative (per-entity striped lock) costs memory and adds latency
for the no-contention common case.
- Pointers for the future maintainer if an observed-in-production race
emerges: gate via ConcurrentHashMap<UUID, Semaphore>.
No behavior change. The two open Copilot threads on this trade-off
(M6CQYup, M6CYbM2) stay open so a future PR can pick them up if needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(rdf): atomic clear+insert, broader fallback predicate set, close temp models
Three follow-up findings from the latest Copilot pass:
- Atomicity (3249716506): clearOutgoingEntityRelationships + bulkAddRelationships
ran as two separate SPARQL updates. If bulkAddRelationships failed after the
clear succeeded, the batch entities had their relationships wiped without
the replacement edges in place — they stayed gone until the next weekly
recreate-index. Combine the per-source DELETE and the INSERT DATA into a
single SPARQL update inside JenaFusekiStorage.bulkStoreRelationships and
drop the now-redundant separate clear call from RdfBatchProcessor. Either
the whole reconciliation commits or none of it does. Also let
bulkStoreRelationships handle the zero-edge case (relationships empty,
sourcesToReconcile non-empty) so RdfBatchProcessor doesn't need a separate
clear for entities whose relationships were all removed in MySQL.
- Fallback predicate set (3249716532): when SettingsCache returns null,
getGlossaryTermRelationPredicate falls back to literal
`https://open-metadata.org/ontology/<relationType>` — so `broader` /
`narrower` / `exactMatch` get written as om:broader/om:narrower/om:exactMatch,
not skos:* equivalents. Without those URIs in DEFAULT_GLOSSARY_TERM_RELATION_
PREDICATES, a cleanup run during a transient settings-cache outage would
miss them. Added the three om:* fallback variants alongside the existing
skos:*/rdfs:* bootstrap defaults.
- Temp Model leaks (3249319886): bulkAddRelationships and removeRelationship
each create an ephemeral Jena Model just to mint property URIs. Wrapped
both in try/finally close() so the in-memory graphs are released right after
use. Jena 4's Model has a close() method but doesn't implement
java.lang.AutoCloseable so try-with-resources isn't possible there.
Copilot's "still deleting only non-IRI" finding (3249716480) is a stale-
snapshot false positive — JenaFusekiStorage.storeEntity has used predicate-
scoped DELETE via TRANSLATOR_MANAGED_DIRECT_PREDICATES since
|
||
|
|
4217e6db8d
|
fix(log-storage): plug clobber bugs in streamable S3 logs (partial.txt + logs.txt) (#27926)
* fix(api): make closeStream idempotent when log storage is not configured
closeStream used to throw IllegalStateException("Log storage is not
configured") which the resource layer translates to a 500 response.
That made the contract surprising for callers: any defensive cleanup
path (exit handlers, retry logic, generic teardown) had to know in
advance whether streaming was configured before calling close, or eat
spurious server errors.
Closing a stream is naturally idempotent — same shape as DELETE on a
non-existent resource. When log storage is not configured, return
silently with a debug log so callers can call close() defensively
without checking state first.
Adds a unit test covering the no-op path.
* Add design spec for streamable logs stability fix
Captures the design discussion for fixing partial.txt and logs.txt
clobber bugs in S3LogStorage when ingestion runs hit idle gaps longer
than the 5-minute stream timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add full design flow doc for streamable ingestion logs
End-to-end documentation of the streamable logs feature: architecture,
storage layout, run lifecycle, read paths, abandoned-run recovery,
configuration, concurrency model, and observability. Reflects the
post-fix design captured in the streamable-logs-stability spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add implementation plan for streamable-logs stability fix
Step-by-step TDD plan grouped into 8 PR-sized tasks: config schema
additions, per-stream lock, pendingFlush + merge-always flush, multipart
removal, sweeper rewrite, /close rewrite, read-path correction, and
integration tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(log-storage): add config fields for streamable-logs stability fix
Adds streamTimeoutHours, cleanupIntervalMinutes, partialFlushIntervalMinutes,
earlyFlushWatermarkBytes, pendingFlushAlertAfterFailures. Deprecates
streamTimeoutMinutes in favor of streamTimeoutHours. Pure schema-only
change; no Java code consumes these fields yet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(log-storage): add deprecated:true keyword and clarify watermark unit
Addresses code review on Task 1: project convention uses the JSON Schema
deprecated keyword alongside description annotation. Also clarifies that
earlyFlushWatermarkBytes default (5242880) equals 5 MB.
* feat(log-storage): wire new stability-fix config fields into S3LogStorage
Reads streamTimeoutHours, cleanupIntervalMinutes, partialFlushIntervalMinutes,
earlyFlushWatermarkBytes, pendingFlushAlertAfterFailures from
LogStorageConfiguration with sane defaults. No behavioral change yet —
values are stored but not consumed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(log-storage): broaden streamTimeoutMinutes deprecation warning + drop FQN
Addresses code review on Task 2: warning now fires whenever
streamTimeoutMinutes is set (not only for values < 30 min), since the
field is deprecated for all deployments. Also imports java.lang.reflect.Field
in the test helper instead of using a fully-qualified name (CLAUDE.md
no-FQN rule).
* refactor(log-storage): add per-stream ReentrantLock for S3LogStorage
Introduces streamLocks map and acquire/release helpers. appendLogs,
writePartialLogsForStream, closeStream, and cleanupExpiredStreams all
serialize on the per-stream lock. No behavior change; locking is
pure mutual-exclusion at this point.
* fix(log-storage): close iterator.remove race in cleanupExpiredStreams
Move iterator.remove() inside the per-stream lock to prevent a window
where a concurrent appendLogs sees the still-present closed StreamContext
and writes to a closed stream. Also clarifies the comment on flush(fqn,runId)
ordering and documents that streamLocks accumulates monotonically until
Tasks 7 and 8 add cleanup.
* feat(log-storage): track pendingFlush queue and totalLinesAppended counter
Each appendLogs now also populates per-stream pendingFlush (lines awaiting
flush) and totalLinesAppended (monotonic logical line counter). State is
written but not yet consumed; the new flush logic in the next commit reads it.
* fix(log-storage): document thread-safety + lifecycle on Task 4 maps, add test
Addresses review on Task 4: documents that pendingFlush ArrayList values
may only be accessed under the per-stream lock; clarifies that
consecutiveFlushFailures is written and consumed in Task 5 (not just
consumed); aligns its type with AtomicInteger for consistency with
the other counters; adds a test for the trailing-newline trim path.
* fix(log-storage): merge-always partial.txt PUT and persist offset in S3 metadata
Replaces the old writePartialLogsForStream that skipped the read-merge step
when partialLogOffsets[streamKey] was 0 (the canonical 80MB->KB clobber bug).
The new flush always reads existing partial.txt, appends a snapshot of
pendingFlush, and PUTs with offset state in S3 user-defined metadata.
Also adds an early-flush watermark trigger so high-burst writes don't
pile up unbounded in pendingFlush.
Closes the partial.txt-clobber half of the streamable-logs-stability spec.
* fix(log-storage): replace task-number comments with intent-describing language
Addresses code review on Task 5: production code comments should describe
invariants, not the planning-doc task that filled the gap. Also clarifies
the parse-before-lock and the byte-counter atomicity assumption.
* refactor(log-storage): remove MultipartS3OutputStream, rewrite closeStream as server-side copy
appendLogs no longer initiates a multipart upload; bytes flow only through
pendingFlush -> partial.txt PUTs.
closeStream now: (1) drains pendingFlush via final partial.txt PUT,
(2) issues CopyObjectRequest from partial.txt to logs.txt server-side,
(3) deletes partial.txt and the .active marker, (4) drops in-memory state.
Idempotent: a second /close sees no partial.txt (NoSuchKeyException) and
returns gracefully.
Closes the logs.txt-clobber half of the streamable-logs-stability spec
and finalizes the canonical /close flow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(log-storage): plug listener/lock leaks, propagate SSE on copy, recover counter from metadata
Addresses code review on combined Tasks 6+8:
- dropStreamState now removes activeListeners entries (SSE listener leak fix).
- cleanupExpiredStreams now removes streamLocks entries on expire (lock leak fix).
- copyPartialToLogs applies SSE configuration to CopyObjectRequest (was unencrypted on copy).
- writePartialLogsForStreamLocked reads last-flushed-line metadata from existing
partial.txt and uses it to keep totalLinesAppended monotonic across restarts.
- consecutiveFlushFailures reset uses computeIfAbsent + set(0) instead of allocating
a new AtomicInteger every successful flush.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(log-storage): rewrite sweeper as cleanupAbandonedStreams (24h/1h)
Bumps the idle threshold from 5 min to streamTimeoutHours (default 24h)
and the poll interval from 1 min to cleanupIntervalMinutes (default 1h).
On expire, finalizes the abandoned run by copying partial.txt -> logs.txt
server-side, deleting partial.txt, and dropping in-memory state — same
end-state as closeStream.
Also wires partialFlushIntervalMinutes into the periodic flush schedule
and removes the legacy streamTimeoutMs field that no longer drives behavior.
* fix(log-storage): preserve streamLocks entry on cleanup retry path
Addresses code review on Task 7: streamLocks.remove was unconditionally
in the finally block of finalizeAbandonedStream, so it ran even when the
sweeper returned early to retry next tick on a copy failure. That meant
the next sweep tick would create a fresh ReentrantLock, and any
concurrent appendLogs in the meantime would contend on a different lock
object than the retry, defeating mutual exclusion.
Now we only remove the lock entry once finalization has succeeded
(after dropStreamState). The retry path leaves the lock in place so
the next tick and any concurrent appendLogs see the same lock identity.
* fix(log-storage): include pendingFlush snapshot in mid-run reads
getCombinedLogsForActiveStream now appends the in-memory pendingFlush
snapshot to the partial.txt body when reading mid-run, so the UI's
paginated GET surfaces the most recent tail even before the next
scheduled flush has happened.
Only appends pendingFlush when a partial.txt file exists, avoiding
duplication in the fallback path where recentLogsCache already
includes those lines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(log-storage): tighten Task 9 read path safety + invariant comment
Addresses review on Task 9: the unsafe null-lock fallback in the
pendingFlush append path is removed (it was structurally unreachable
but a latent hazard for future lifecycle changes). The pendingFlush
read now happens entirely under the per-stream lock, with a
conservative skip if no lock entry exists.
Also documents the recentLogsCache-vs-pendingFlush invariant in the
fallback path and adds a total-count assertion to the new test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(log-storage): add bug-reproducer ITs for streamable-logs stability
- testIdleGapDoesNotClobberPartial: two log bursts within an open run;
asserts both are present in the read response.
- testCloseProducesLogsTxtMatchingPartial: write, close, read; asserts
content survives the close.
- testCloseIsIdempotent: a second /close is a graceful no-op.
Tests are tolerant of the storage backend in the test environment
(DefaultLogStorage in CI may not persist; S3LogStorage in S3-configured
environments). Deep behavioral coverage is in S3LogStorageTest unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(log-storage): address final-review critical bugs
- closeStream and finalizeAbandonedStream now propagate PUT failures
from writePartialLogsForStreamLocked (which returns boolean).
closeStream throws IOException; the sweeper retains state for retry.
Fixes silent data loss when the final flush PUT fails.
- streamLocks entries are no longer removed; this prevents an
acquire-vs-remove race that would break mutual exclusion. Memory
growth is bounded by maxConcurrentStreams in practice.
- cleanupAbandonedStreams re-checks expiration inside the per-stream
lock so a stream that was bumped by appendLogs between the scan
and the lock acquisition is not finalized.
- deleteLogs now acquires the per-stream lock before mutating state.
- getCombinedLogsForActiveStream appends pendingFlush in BOTH the
S3-found and memory-fallback branches, so reads aren't truncated
when recentLogsCache evicts oldest at its 1000-line cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(log-storage): use pendingFlush as canonical mid-run read source (no duplicates)
The previous Issue 5 fix appended pendingFlush unconditionally, which
caused duplicate lines in the read response when the fallback branch
used recentLogsCache (since both are populated by the same appendLogs).
Now: in the foundPartialFile branch, append pendingFlush AFTER the S3
body (non-overlapping by construction). In the fallback branch
(no partial.txt yet), use pendingFlush directly as the canonical
source — this is more complete than recentLogsCache (1000-line cap)
and avoids the duplicate issue. recentLogsCache remains a defensive
fallback for the rare case where pendingFlush is empty in the fallback
path.
* Update generated TypeScript types
* chore(log-storage): drop dead abortIncompleteMultipartUpload lifecycle rule
The multipart upload write path was removed; the bucket lifecycle's
abortIncompleteMultipartUpload(7 days) rule served only as migration
cleanup for in-flight uploads from the old code at deploy time. After
the migration window it does nothing.
Drops the rule from configureLifecyclePolicy, the AWS SDK import, the
"7 days multipart cleanup" string in the startup log, and the
corresponding bullet in docs/streamable-logs.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: ignore docs/superpowers/
Local-only working notes (specs, plans) live there and shouldn't be tracked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(log-storage): tolerate DefaultLogStorage in CI for streamable-logs ITs
CI runs the integration tests against the bootstrap config which uses
DefaultLogStorage (delegates to k8s/Airflow which isn't running). The
storage returns:
- "No pods found for this pipeline" sentinel for getLogs
- non-2xx status (the SDK wraps it as statusCode -1) for /close
Adjustments:
- testIdleGapDoesNotClobberPartial: parse JSON, only assert when total>0.
When storage actually persists (S3 deployments), assert BOTH bursts
are present — that's the real "no clobber" check.
- postClose helper: tolerate any exception from the close call
(idempotency is the contract; transient errors are acceptable).
The deep behavioural coverage continues to live in S3LogStorageTest unit
tests where mock S3 is the storage backend.
* test
* fix
* Update generated TypeScript types
* fix
* Update generated TypeScript types
* fix(log-storage): record UTF-8 byte length in partial.txt total-bytes metadata
String.length() returns UTF-16 code units; for non-ASCII content this
diverged from the actual S3 object size, breaking the drift cross-check
documented in docs/streamable-logs.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(log-storage): address PR review findings on S3LogStorage
Plumbs the documented timing knobs (cleanupIntervalMinutes, partialFlushIntervalMinutes,
earlyFlushWatermarkBytes, pendingFlushAlertAfterFailures) through LogStorageConfiguration
so operators can actually tune them. Replaces the unbounded streamLocks ConcurrentHashMap
with a Guava Striped<Lock> capped at 256 stripes, eliminating the per-(fqn, runId) memory
leak and the acquire-vs-remove race that a per-key map would have. Adds a Multipart
Upload + UploadPartCopy concatenation path for partial.txt >= 5 MB, avoiding the O(n^2)
total transfer and full in-JVM body merge that the prior GET+PUT-everything strategy hit
on long-running pipelines. Realigns docs/streamable-logs.md with the actual schema and
implementation, drops the broken superpowers/* spec link, and renames the misleading
testIdleGapDoesNotClobberPartial IT (which posted bursts back-to-back without simulating
any gap).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
||
|
|
495ca16e62
|
feat: policy agent (#27396) | ||
|
|
ed5e68c2a5
|
Make SearchIndexing distributed-only (#27971)
* Make search indexing distributed-only * Update generated TypeScript types * Address search index review comments * Normalize search index entity aliases * Return defensive search index config copies * Update search indexing application docs * Share staged reindex context mapping * Speed up distributed job polling discovery * Use database polling for distributed job discovery * Address distributed search indexing review comments * Address distributed indexing polling review * Add SearchIndex promotion test coverage * Fix distributed reindex finalization review comments * Fix SearchIndex app Playwright run history parsing --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
4d476e5f58
|
feat: track MCP tool-call usage (#28045)
* feat: track MCP tool-call usage with per-user/per-tool breakdown REST API * Update generated TypeScript types * address review: imports, window validation, license headers * Update generated TypeScript types * address review: reuse limits extension for MCP usage, isolate by appName --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
1b8b4f0c63
|
refactor(schema): extract chart Function/KPIDetails into chartFunctions.json (#28049)
* refactor(schema): extract chart Function/KPIDetails into chartFunctions.json Moves the shared `function` and `kpiDetails` definitions out of `dataInsightCustomChart.json` into a dedicated leaf schema so `lineChart`, `summaryCard`, `formulaHolder`, and `dataInsightCustomChartResultList` can `$ref` them without pulling in the full chart entity model. This breaks the generated-code import chain that ran from Collate's `dataInsightQueryConfig` (which `oneOf`s `lineChart`/`summaryCard`) back into the entity schema. `javaType` annotations are preserved so generated Java classes remain at `org.openmetadata.schema.dataInsight.custom.Function` and `org.openmetadata.schema.dataInsight.custom.KPIDetails` — existing imports are unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ingestion): resolve basedpyright errors exposed by schema regen - Guard `err.response.status_code` access with None checks in Grafana client and dbt config loader (reportOptionalMemberAccess). - Type MCP notification dict as Dict[str, Any] so adding `params` does not get narrowed to dict[str, str] (reportArgumentType). - Suppress reportPrivateImportUsage on requests.utils / requests.compat re-exports we rely on. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ingestion): type send_request payload as Dict[str, Any] Same JsonType narrowing as send_notification — initial dict literal infers as dict[str, str], blocking the later `params` assignment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * format * test(ingestion): regression coverage for chartFunctions schema extraction Guards the dataInsightCustomChart ⇄ lineChart/summaryCard circular import fix on two levels: - Schema-level: chartFunctions.json must own `function`/`kpiDetails`, and lineChart/summaryCard must not $ref back into dataInsightCustomChart. - Python-level: each generated module must succeed as the cold entry point into the cycle (purges sys.modules + parent package __dict__). Parametrized over the four generated modules so we cover every import-order permutation without duplicating boilerplate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * format --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
f4cb7d0f14
|
feat(ingestion): add QuestDB database connector (#27604)
* feat(ingestion): add QuestDB database connector QuestDB speaks the PostgreSQL wire protocol but implements a minimal pg_catalog, so the default PG dialect queries fail on the CHAR->DOUBLE cast in pg_class.relkind. This connector routes SQLAlchemy inspection through information_schema and short-circuits constraint/index lookups (QuestDB has no PK/FK/unique/indexes), letting CommonDbSourceService handle the rest of the topology unchanged. - Fixed /qdb target in the psycopg2 URL regardless of databaseName (which remains the OpenMetadata display name) - get_database_names defaults to 'qdb' instead of 'default' - 12 unit tests + live-verified against QuestDB 9.3.5 on localhost:8812 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(questdb): address review feedback — rename to QuestDB, wire UI Code review fixes for PR #27604: Blockers resolved: - Rename Questdb -> QuestDB across schema, enum, Python classes, and all generated TS files. Matches peer connectors (PinotDB, DynamoDB) and the product's actual brand. Changing post-merge would be a breaking migration. - Remove sslConfig from schema. QuestDB's sslConfig was declared but never wired — ssl_manager.check_ssl_and_init is @singledispatch and has no QuestDBConnection registration, so enabling SSL in the UI was a silent no-op. Can be added in a follow-up with an explicit psycopg2 wiring. Warnings resolved: - authType now in schema's required array — was failing with opaque 401. - Delete dead queries.py (QUESTDB_TEST_GET_TABLES was defined but never imported). - Add bytea -> LargeBinary to the type map (verified via live information_schema probe against QuestDB 9.3.5 — all other native types normalize to standard PG names that were already mapped). - Complete type annotations on utils._get_table_names, _get_columns, _information_schema_type. - Dialect patch test now uses a real PGDialect_psycopg2 instance instead of a MagicMock dialect, so it catches signature drift against the real SQLAlchemy Inspector contract. Added a separate test that verifies get_table_names emits a query against information_schema.tables (not pg_catalog). - Add ingestion_logger() to utils.py with a debug log on dialect patching. - _empty_view_definition now returns None instead of "" to match how other dialects signal the absence of a DDL. Also fixes UI visibility (QuestDB was missing from the service picker): - Regenerate 15 TS enum files via json2ts.sh -> quicktype so the new DatabaseServiceType.QuestDB value flows through the UI. - Register service-icon-questdb.png in ServiceIconUtils.ts. - Add locales/en-US/Database/QuestDB.md connector form docs. - Add quicktype as a devDependency — json2ts.sh needs it and it wasn't installed. Docs: update skills/connector-building and skills/standards/registration to reflect reality — i18n locale files are not needed, icon + locale MD registration steps are, and Services.constant.ts is deprecated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * skill * fix(questdb): restore databaseSchema field for test connection test_connection_db_schema_sources reads service_connection.databaseSchema directly with no hasattr guard. Removing it from the schema in the prior review fix broke GetTables and GetViews steps: 'QuestDBConnection' object has no attribute 'databaseSchema' Restored as an optional string with a clearer description (defaults to public when unset). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix owners * add yaml * Update generated TypeScript types * Sync package.json and yarn.lock with main * Fix: ingestion files , Added Lineage for questdb tests and UI changes, Refactored code * FIX: python_checkstyle * Fix: test and unused param * Fix: yield_table enforcing tabletype to partition, Refactored lineage * Fix: Failing test and remove print statement * FIX: python_checkstyle and added error handling * FIX: Resolved comments * FIX: failing tests and schema cleaning * Minor change * Fix: Failing unit tests * Fix: Unit test unrelated changes ignored * FIX: tests * Fix: Failing test due to extra parameter in yaml --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local> Co-authored-by: Akash Verma <138790903+akashverma0786@users.noreply.github.com> |
||
|
|
7e0ee80c28
|
feat(search): add Google Gemini embedding provider (#27974)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Add design: Google Gemini embedding client Adds a fourth embedding provider (google) alongside openai/bedrock/djl, using the Generative Language API with a single API key. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Add implementation plan: Google Gemini embedding client 7 tasks covering schema change + regen, client implementation, validation tests, error path tests, request shape tests, switch wiring, and final verification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(spec): add google embedding provider config block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(search): add GoogleEmbeddingClient with happy-path test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(search): extract MODELS_PREFIX constant in GoogleEmbeddingClient The string "models/" appeared in both DEFAULT_BASE_URL and the buildRequestBody method. Extract it as a named constant per project standards. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): add constructor validation tests for GoogleEmbeddingClient Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): add blank model id test and clarify null-modelId workaround Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): add HTTP error and malformed response tests for GoogleEmbeddingClient * test(search): tighten empty values array assertion to check message Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(search): verify Google embedding request URL, headers, and body shape Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(search): extract endpoint constant and harden extractBody helper Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(search): wire google embedding provider into SearchRepository switch * test(search): cover null dimension and custom endpoint, drop redundant comment Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Remove internal planning docs from PR These were workflow scaffolding (design spec + implementation plan) generated by the superpowers brainstorming/planning flow; they belong in the local development trail, not the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address PR review comments - GoogleEmbeddingClient.buildRequest: handle endpoint with existing query string by switching the key separator from '?' to '&' as needed; document why the API key travels in the URL (Google Generative Language API requirement, not Bearer-header). - GoogleEmbeddingClient.extractErrorMessage: replace empty catch block with a trace-level log to comply with the 'no empty catch' standard. - elasticSearchConfiguration.json: clarify google.endpoint description so operators know it must be the full ':embedContent' URL, not a base URL. - GoogleEmbeddingClientTest.extractBody: await onComplete via CompletableFuture.get(5s) instead of relying on synchronous publisher delivery; surface onError properly. - New test: testEndpointWithExistingQueryStringUsesAmpersand verifies the '?' / '&' separator logic. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Wire google embedding provider into openmetadata.yaml defaults - Add `google:` block under naturalLanguageSearch with env-var fallbacks (GOOGLE_API_KEY, GOOGLE_EMBEDDING_MODEL_ID, GOOGLE_EMBEDDING_DIMENSION, GOOGLE_API_ENDPOINT). - Update embeddingProvider option list comment to include "google". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Use gemini-embedding-001 default and pass outputDimensionality The previous default (text-embedding-004) is rejected on some Google projects with `404: not found for API version v1beta, or is not supported for embedContent`. Switch to gemini-embedding-001 — the current GA model, available at v1beta and broadly accessible. - GoogleEmbeddingClient.buildRequestBody: include outputDimensionality from the configured embeddingDimension. Required for gemini-embedding-001 (defaults to 3072 dims otherwise) and supported as a truncation hint by text-embedding-004. - elasticSearchConfiguration.json + openmetadata.yaml: change default embeddingModelId to gemini-embedding-001 and document the outputDimensionality semantics on the embeddingDimension field. - GoogleEmbeddingClientTest.testRequestBodyShape: assert outputDimensionality=768 in the captured body and use gemini-embedding-001 as the test fixture model. - SystemRepository.getEmbeddingConfigurationMessage: add a `google` case so /api/v1/system/status surfaces the configured model/endpoint instead of "Unknown provider 'google'". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Guard against missing google config in SystemRepository diagnostic If `embeddingProvider=google` but the `google` config block is absent, calling `nlpConfig.getGoogle().getEndpoint()` would NPE and produce a misleading "Unable to determine embedding configuration" message. Add an explicit null check that yields a clear diagnostic instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Validate google.endpoint contains :embedContent at construction A custom endpoint missing the `:embedContent` action used to silently produce 404s at runtime. Fail fast at startup with a clear message showing the expected URL form, so misconfiguration surfaces in logs instead of in vector-search failures. - Update testCustomEndpointConstruction to use a valid full URL. - Add testCustomEndpointWithoutEmbedContentThrows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(spec): add modelId chat field to google block Adds a `modelId` property to the natural-language-search `google` block, parallel to how the `openai` block exposes both `modelId` (chat) and `embeddingModelId` (embedding). This enables Gemini-based NLQ filter extraction (chat completions via :generateContent) on top of the existing embedding support. Default: gemini-2.5-flash. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update generated TypeScript types * Update generated TypeScript types * trigger --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
882ef3f8c5
|
add nlq to OpenMetadataApplicationConfig (#27988)
* add nlq to OpenMetadataApplicationConfig * move config under naturalLanguageSearch * openai client * Update generated TypeScript types --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com> |
||
|
|
22a6c10072
|
Context center (#27558)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Add Context Center: Migrate Knowledge Center , Images/ PDFs document support * Add Context Center: Migrate Knowledge Center , Images/ PDFs document support * Address PR #27558 review comments - KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch and getHierarchyWithSearchForActivePage so the /search/hierarchy endpoint no longer NPEs when the pageType query param is omitted. The ES/OS client helpers already skip the pageType term when the value is null or empty, so this is a pure null-guard. - ContextFileResource.uploadFile: when a failure happens after the ContextFileContent row is created (e.g. inside extractionService.submit), the cleanup path now hard-deletes that content row so the DB is not left with an orphaned record. - ContextFileResource: replace the raw Content-Disposition string with a buildContentDisposition helper that emits both the legacy quoted filename= and the RFC 5987 filename*=UTF-8'' parameter with percent-encoded bytes, so international filenames round-trip while staying header-injection safe. sanitizeFileName also falls back to "download" on null/blank input. - ContextFileResourceTest: new cases for sanitizeFileName null/blank fallbacks and for buildContentDisposition ASCII/unicode/space/injection behaviour (18 tests, all passing). * Address copilot review comments on PR #27558 - AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix) matches the DAO signature — previous ordering always missed the index. - FolderResource / ContextFileResource getEntitySpecificOperations: return List.of() instead of null so callers iterating the returned list cannot NPE. - SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid helper that returns null for missing/malformed values and logs a warning instead of failing the whole hierarchy response. - DaoListFilter: qualify the pageType column with the caller-provided tableName, rename getArticleCondition to getPageTypeCondition (legacy no-arg method kept as @Deprecated wrapper for compatibility). - Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit getChildrenCountForPage search (N+1) with a single pass over the batch that derives childrenCount from pages whose parent is in the same result set. Also drops the now-unused helper and its throws clause. - openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient, jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional> so they don't leak into SDK consumers that only use the core client. - InMemoryAssetService: use the shared AsyncService executor for upload /read/delete instead of the JVM common ForkJoinPool. - sample-pricing.xlsx: replace the plain-text placeholder with a real minimal XLSX workbook so detection-based and extraction-based code paths see a valid Microsoft Excel 2007+ file. * Use one filters aggregation for page hierarchy childrenCount Follow-up to |
||
|
|
219c5683fa
|
ISSUE #3032 (#27912)
* feat: move flat sampling to sampling config + dynamic sampling option * feat: move flat sampling on the backend to sample profile conifg object * feat: fix circular import * feat: align UI with new profiler config * feat: fix json schema * feat: align python imports with new schema path * feat: update migration to look at extension * feat: remove enable * feat: remove enable * feat: added titles to sample config * feat: generated ts classes * feat: addressed comments * feat: change sample config instantiation to match new structure * feat: removed backward compatible fields * feat: ran java linting * feat: updated imports to point to generated files * feat: added dynamic sampler resolution logic * feat: ran python linting * feat: remove duplicate migration * chore: merge upstream and clean conflicts * feat: update logic to support dynamic and static sampling * feat: adjust sample config call * feat: test for statis, dynamic, row count and tier methods * feat: more sample config unit tests * feat: added tests for metric and sampling * feat: added tests to validate fallback is not called i nmetric computers * feat: strengthen profiler validation tests * feat: fix sampling config * feat: fix sampling config * feat: fix sampling config * feat: generated typescript models * feat: fixed missing dq pipeline migration * feat: fixed static check * feat: fixed ci failures * feat: fixed ci failures * feat: fixed unit tests faioure and linting * feat: fixed integration tests failures * chore: fixe burstiq refactor * chore: fix trino ci failures * chore: revert baseline.json file * chore: fix sampler availabl burst iq changes * feat: added smart sampling radio button * feat: ignore static checks errors * feat: ran ts linting * feat burstiq infinite recursion issue with dynamic as default * feat: translate i8n keys * feat: fix failing tests |
||
|
|
4c07b28c82
|
Add alias marketplace (#27943)
* Add alias marketplace * wire fingerprint and embeddings in domain_index_mapping |
||
|
|
80375a7dc6
|
Add data access request support (#27879)
* Add DAR tasks
* Removed UI related changes of DAR
* nit
* Update generated TypeScript types
* fix linting issue
* Removed all languages changes
* nit
* removed white space
* add request data access button with owner/status conditions
* fix lint issue
* fix minor validation for data access button
* fix lint issue
* fix data access button visiable condition
* fix java lint checks and fix test cases
* nit
* fix test
* fix(tasks): model CreateTask.about as entityLink, validate target entity
Replace `about` (FQN string) + `aboutType` (string) with a single
`about` field of type entityLink (`<#E::{entityType}::{fqn}>`). The
resource layer parses the link and resolves it via
`Entity.getEntityReferenceByName(type, fqn, NON_DELETED)`, which
guarantees the target asset exists and is not soft-deleted.
Why: long-FQN data assets were rejected with `[query param name size
must be between 1 and 256]` because the modal was constructing a Task
`name` from the FQN. The `about` was modelled as a free string with
no schema validation that the target was a real, non-deleted entity.
The Threads API already uses entityLink for this exact purpose; tasks
now align with that pattern. The link is supplied as a hidden field
by the UI — users never see it.
Also fixes the missing `@ExtendWith(TestNamespaceExtension.class)` on
`DataAccessRequestIT` that caused four test failures in CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix unit test failure
* fix(test): await workflow stage transition in DataAccessRequestIT
The workflow advances the task from pending-workflow-start to review
asynchronously. Asserting on the object returned by create() was a
race condition. Use Awaitility to poll until the stage is review,
matching the pattern in IncidentTaskIntegrationIT.
---------
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
|
||
|
|
b12506fc6d
|
Add container entity type (#27957)
* Add alias marketplace * wire fingerprint and embeddings in domain_index_mapping * add container entity to dataAssetEmbeddings * add container to VECTOR_INDEXABLE_ENTITIES * Move changes for marketplace to another PR |
||
|
|
e91c90c144
|
fix: validate custom property name charset (#27808)
* fix: validate custom property name charset
Tighten custom property name validation to block characters that break
downstream parsers, with verified empirical reproduction:
- `"` causes HTTP 500 on PUT /metadata/types/{id}
- `:` breaks CSV import — exporter writes `key:value;key:value`, importer
splits at first colon, treats prefix as the field name
- `^` breaks OpenSearch query when the name is in
searchSettings.searchFields — Lucene reads `^` as the boost separator
in `field^boost`
- `$` breaks CSV import via java.util.regex.Matcher.replaceAll which
interprets `$<letter>` as a backreference
Adds a `customPropertyName` definition in basic.json and switches
customProperty.json to reference it. Adds a defensive regex check in
TypeRepository.validateProperty so the API returns 400 with a clear
error message even if schema validation is bypassed.
Tests cover allowed-charset acceptance, the four blocked characters,
leading-character validation, max-length enforcement, and unbalanced
brackets.
* Update generated TypeScript types
* test: add schema-vs-Java consistency test for custom property name
Guards against drift between basic.json#customPropertyName and the
TypeRepository regex/length constants. If either side is updated
without the other, CI fails with a message pointing to both files.
The Java validator is kept (better error message + covers internal
callers that bypass the HTTP layer); the consistency test guarantees
the two definitions cannot drift.
* fix: extend custom property name charset after gap-coverage matrix
Re-ran the matrix on previously-untested chars (+ ? * ~ ` \) across all
17 property types × create/patch/CSV/search:
- + ? * ~ ` all pass cleanly on every operation × every property type — add to allow list
- \ fails CSV roundtrip for entityReference and entityReferenceList types
(escape inconsistency in CSV serialization) — add to block list
Updates the regex, schema description, Java validator error message, and
adds the new chars to the allow/block integration tests. Consistency
unit tests in TypeRepositoryTest continue to pass.
Final allow set: alphanumeric _ - . / & % # @ ! , ; = | ' + ? * ~ `
space ( ) < > [ ] { }
Final block set: " : ^ $ \
* Update generated TypeScript types
* updated the custom property name validation
* added name suffix in custom property name
* lint fixes
* include backslash in invalid char
Co-authored-by: Copilot <copilot@github.com>
* fixed the playwright issue
Co-authored-by: Copilot <copilot@github.com>
* lint fix
* fix check style
* Drop redundant Java validator for custom property name; tighten IT assertions
Schema is the single source of truth: jsonschema2pojo emits @Pattern + @Size
on CustomProperty.name from basic.json#/definitions/customPropertyName, and
@Valid on TypeResource.addOrUpdateProperty enforces them at the HTTP boundary.
The hand-written Pattern constant, validateCustomPropertyName, and the
schema-vs-Java sync test were duplicating that rule and could never reach the
HTTP user (Bean Validation always fires first via @Valid).
Tighten the new TypeResourceIT cases from assertThrows(Exception.class) to
assertThrows(InvalidRequestException.class) so a regression to a different
exception type or status code fails loudly.
* restrict few more special characters from Cp name
* minor fix
* Disallow & < > in custom property names; align IT cases
Schema-side counterpart to the UI changes in the previous two commits:
basic.json#/definitions/customPropertyName now blocks &, <, > alongside the
existing " : ^ $ \\. The DOMPurify pass on the UI sanitizes &, <, > into HTML
entities, which produced inconsistent persisted values; rejecting them at the
schema layer prevents that drift across all write paths.
IT updates:
- Drop &, <, > from the allowed-charset cases (and the "withMatched(pair)And<more>" composite)
- Add &, <, > to the disallowed-charset cases
- Drop "<" leading-character case (now covered as a disallowed character)
- Drop "<" / ">" unbalanced-bracket cases
* Update generated TypeScript types
* Close PATCH bypass for custom property name validation on Type
Bean Validation runs for the dedicated PUT /types/{id} (addOrUpdateProperty)
because the resource declares @Valid CustomProperty, and the createOrUpdate
path can't carry customProperties at all (CreateType schema doesn't include
the field). PATCH /types/{id} accepts an opaque JsonPatch, so @Valid never
reaches into the resulting customProperties[] — a JSON Patch like
[{"op":"add","path":"/customProperties/-","value":{"name":"bad:colon",...}}]
persisted bad-named properties (verified live: HTTP 200 before this fix).
Run Hibernate Validator programmatically inside TypeRepository.prepare() so
every write path enforces the schema-derived @Pattern / @Size / @NotNull on
each CustomProperty. The rule still lives only in basic.json — picked up via
the generated @Pattern annotation, executed via ValidatorUtil.validate.
Tests in TypeResourceIT:
- test_patchCannotAddCustomPropertyWithDisallowedName — seeds a valid property
to ensure /customProperties exists, then PATCHes appending a name with ':',
asserts InvalidRequestException and verifies the bad name is not persisted
- test_patchCanAddCustomPropertyWithValidName — guards against the fix
rejecting valid PATCH-driven additions
* Block * in custom property names — breaks ES field-path lookup
When the property name appears in extension.<propertyName>^boost entries of
searchSettings.searchFields, OpenSearch treats * as a field-path wildcard.
The literal * field never matches its own wildcard pattern, so the field
gets silently skipped from the query and Explore search returns no hit for
the value. Bisected against the running server: of 12 candidate Lucene-special
chars, only * actually breaks the mainline UI search flow. ? ~ ( ) { } [ ] /
! and space all returned hits via the searchFields path because OS looks up
the field literally and only treats * as a wildcard at that layer.
Updates the regex + description in basic.json/customProperty.json, the UI
regex in regex.constants.ts, the validation message across 19 locales, the
generated TS docstrings, the Playwright invalid-name fixtures and spec, and
the IT TypeResourceIT case (with*asterisk moves from allowed to disallowed).
* Validate only newly-added custom properties; isolate PATCH IT to fresh types
prepare() previously validated the entire customProperties[] on every Type
write. An upgraded instance with a legacy property whose name contained a
now-banned char would then reject any subsequent PUT/PATCH on that type,
even when the write only adds a different valid property. Move the name
validation into TypeUpdater.updateCustomProperties() and scope it to the
`added` list computed by recordListChange against the original entity. New
properties are still validated; pre-existing names are left alone.
Replace the IT PATCH cases' shared `topic` Type with a fresh, namespaced
entity-category Type per test (createEntityTypeForTest). The shared `topic`
was mutated concurrently by other tests in the class — combined with
PATCH's lack of per-type locking, that produced lost-update races and
flaky asserts. The fresh per-test type has customProperties: [] from
creation, so the patch sets the array directly without a seed property.
* chore: prettier formatting on the new asterisk-rejection test
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* docs: add + ? ~ ` to JSDoc allow-list to match the regex
* fix(it): request customProperties field on read-back in PATCH IT
Type.customProperties is a lazy field — TypeRepository.setFields only
populates it when the request URL includes ?fields=customProperties. The
default getTypeById helper omits the param, so the read-back always saw
customProperties == null. That made test_patchCanAdd... fail (the just-
persisted property wasn't visible) and made test_patchCannotAdd... pass
for the wrong reason (would have stayed green even if the bad name had
slipped through validation).
Add a fields-aware getTypeById overload and use it in both PATCH cases.
Empirically verified against the live server: good name returns 200 +
appears in customProperties, bad name returns 400 + does not.
* minor fix
* playwright test fix
* removed unecessary test
* blocked ~ and / from custom property name
* lint-fix
* Block / and ~ in custom property names (JSON Pointer reservations)
Forward slash and tilde are reserved by JSON Pointer (RFC 6901): / is the
path separator and ~ is the escape lead-in (~0 = ~, ~1 = /). Allowing
them in a property name shifts the burden onto every caller that builds
a JSON Patch by string interpolation; a raw `/extension/${propertyName}`
either splits into the wrong number of segments or contains an invalid
escape sequence, and the server applies the patch to the wrong key (or
400s outright).
This surfaced as a reproducible failure in the table-cp Playwright suite:
the preceding test ended with `path: \`/extension/${propertyName}\`` where
propertyName ended in `/`. The server addressed extension[name-without-/][""]
instead of extension[name-with-/], returned 400, and TableClass.patch
overwrote entityResponseData with the error body — stripping id and FQN.
The next test fell into the search-based navigation path with an empty
search term and timed out at 180s.
Tighten the schema regex in openmetadata-spec/.../basic.json#customPropertyName
to drop / and ~ from the allowed set; update the human-readable description
in basic.json and customProperty.json to call out the RFC 6901 reservation.
Move the with/slash and with~tilde cases from the allowed-charset IT to
the disallowed-charset IT in TypeResourceIT.
* Update generated TypeScript types
* Use fresh per-test Type in custom-property name validation IT
The five charset/length/lead-char tests added in this PR previously mutated
the shared built-in TABLE_ENTITY_TYPE under @Execution(CONCURRENT). The
PUT path acquires TYPE_PROPERTY_LOCKS so concurrent writes serialize, but
relying on that lock for test isolation is fragile — the PATCH-driven IT
in the same class already uses a per-test fresh Type via
createEntityTypeForTest(client, ns, ...) for exactly this reason
(see
|
||
|
|
3beb1e020b
|
Improve cache warmup configuration and availability (#27948)
* Fix cache warmup app config rendering * Add optional relationship cache warmup * Restore relationship repository in warmup test * Update generated TypeScript types * Disable cache warmup when cache is unavailable * Address cache warmup review comments * Address Copilot cache warmup comments * Memoize app detail tabs --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
3741a6e5fd
|
fix(tagLabel): lenient appliedAt date deserialization (#27771)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* fix(tagLabel): lenient appliedAt date deserialization
Accept ISO-8601 datetimes with or without fractional seconds for
TagLabel.appliedAt. Python clients omit fractional when a datetime's
microsecond is zero, emitting strings like "2026-04-24T10:27:06Z" that
the strict global SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'")
rejects, causing PATCH operations to fail with "Failed to convert
JsonValue to target class".
* test(JsonUtils): expect JsonParsingException wrapping JsonMappingException
JsonUtils.readValue catches JsonProcessingException and rethrows as
JsonParsingException, so the public-API caller observes the wrapper.
Assert on the wrapper type and verify the cause chain carries the
underlying Jackson mapping exception with the field path context.
|
||
|
|
60a2e6546e
|
Migrate Databricks from sqlalchemy-databricks to databricks-sqlalchemy (#26896)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Update Databricks Dependency to databricks-sqlalchemy * Update generated TypeScript types * address comments and pyformat * pyformat * fix log filtering * address comments * fix static unit tests * fix rule for static type * pyformat * update baseline * revert basepyright changes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com> |
||
|
|
e07f03334d
|
Patching error message verbosity (#27692)
* patching error message * fix: resolve unresolved merge conflict in patch_mixin.py Keep the patch_body in the warning log; the main side dropped it. * fix(patch): replace full body dump with op:path summary on patch failure Avoid leaking patch values (descriptions, sample data, tags) in WARNING logs. The Jackson server message in 'Reason:' still carries the offending value for the failing field, which is enough to debug deserialization errors without dumping every field being changed. |
||
|
|
5620121e50
|
SearchIndex: tunable index settings + per-stage latency metrics (#27865)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* SearchIndex: configurable index settings + per-stage latency metrics Adds two diagnostic and operational improvements to the distributed search indexing pipeline so operators can both tune cluster behavior per installation and pinpoint where reindex latency is being spent. Configurable index settings (per-installation, no code changes needed) - New SearchIndexing app config fields: liveIndexSettings (post-promote), bulkIndexSettings (during reindex), and per-entity overrides. - DefaultRecreateHandler applies bulk overrides on staged-index creation (e.g. refresh=-1, replicas=0, async translog) and reverts to live values before alias swap. Optional force-merge before swap. - Safety revert ensures the promoted index never inherits a disabled refresh interval, even if the admin only configured bulk overrides. - Live UX is preserved: refresh defaults to 1s so users and agents that read-after-write see near-real-time results. - New IndexManagementClient methods (updateIndexSettings, forceMerge) with implementations for OpenSearch and Elasticsearch. Per-stage latency metrics (consumer-vs-producer attribution) - StageStatsTracker accumulates per-stage wall-clock time alongside existing counters; added timing-only addStageTime() so per-record callbacks and per-batch wall-clock don't double-count. - DB migration 1.13.0 adds readerTimeMs / processTimeMs / sinkTimeMs / vectorTimeMs columns to search_index_server_stats. Existing rows get DEFAULT 0; aggregation queries SUM the new columns. - Reader timing wraps PartitionWorker.readEntitiesKeyset (DB latency). Process timing wraps the doc-build join in OpenSearch and Elasticsearch bulk sinks (CPU/serialization). Sink timing wraps client.indices().bulk (pure search-cluster latency), attributed per participating tracker. - DistributedJobStatsAggregator surfaces totalTimeMs on each StepStats so the UI can compute avg latency = totalTimeMs / successRecords and throughput = successRecords / (totalTimeMs / 1000) on every WebSocket push without server-side derivation. - New per-server aggregation query (getStatsByServer) for distributed visibility, fed into SearchIndexJob.ServerStats with timing fields. UI: each of the four stage cards (Reader / Process / Sink / Vector) shows "Latency: X ms · Y r/s" when timing is available; per-entity table gains Sink avg + Sink throughput columns. Docs panel updated. New SearchIndexing config section added with sane defaults that preserve current behavior. Tests: 6 new StageStatsTracker timing tests, new aggregator test that asserts StepStats.totalTimeMs is populated at job and per-entity level. All existing tests updated for new arg shapes; 60 unit tests pass. The pattern operators see: Reader avg climbing means DB-side issue (cache/autovacuum); Sink avg climbing means OS-side issue (segments/ back-pressure); only one entity's row climbing identifies the offender. |
||
|
|
620d1b6ad9
|
Cache audit fixes: tag rename + relationship invalidation, bundle warmup (#27864)
* Cache audit fixes: tag rename invalidation, relationship invalidation, bundle warmup Fix two write-through cache correctness bugs and enhance the warmup app: Bug A — Tag/Glossary/Classification rename now invalidates the cached entity JSON for every entity tagged with the renamed tag. Adds invalidateCacheForTaggedEntities helpers in EntityRepository that use the search index (same source updateClassificationTagByFqnPrefix already uses) to enumerate affected entities, then call invalidateCacheForEntity for each. Wired into TagRepository, ClassificationRepository, and both rename + parent- move paths in GlossaryTermRepository. Bug B — Direct addRelationship/deleteRelationship and bulk variants now invalidate the cached bundle/owners/domains on both sides. Bot/Domain/Data Product (already in UNCACHED_ENTITY_TYPES) short-circuit in invalidateCacheForEntity so cascade-heavy delete paths don't pay for Redis ops on keys that were never written. Warmup enhancements — new BundleWarmupBatcher pre-warms the per-entity bundle cache (tags + certification) using batched tag_usage queries; relations stay lazy. CacheWarmupApp adds per-entity-type checkpoint resume, opt-in SETNX-based distributed claim for multi-instance deploys, and per-entity-type cache.warmup.coverage / cache.warmup.bundle.coverage gauges. CacheProvider gains scanCount; CacheMetrics gains coverage gauges and a completed_runs counter. cacheWarmupAppConfig.json drops dead consumerThreads/queueSize and adds warmBundles + enableDistributedClaim flags. Tests — BundleWarmupBatcherTest covers the batcher with mocked DAO/cache. TagRenameCacheIT and RelationshipCacheInvalidationIT cover the bug fixes end-to-end under the cache-tests / postgres-os-redis profile. * Update generated TypeScript types * Address PR review: claim ownership, config wiring, coverage, keyspace Six follow-ups against PR #27864: - releaseClaim now does a compare-and-delete: GET the owner first and only DEL when it matches our instanceId. Previously a 10-min TTL could expire mid-warm, another instance could claim, and we'd blindly delete its lock. - warmBundles / enableDistributedClaim are now read from the user-supplied app config map (matching the cacheWarmupAppConfig.json schema fields) instead of being JVM-system-property-only. System properties remain a fallback for bootstrap / unedited records. - Per-entity-type checkpoint and claim keys now embed cacheConfig.redis .keyspace so two environments sharing one Redis with different keyspaces no longer collide on warmup metadata. - reportCoverage now prefers scanCount over current-run success delta when available, so resumed runs report end-state coverage instead of the artificially low partial number. - CacheWarmupApp class Javadoc updated to reflect that bundle pre-warm is on by default; intentionally-not-done text was stale. - RelationshipCacheInvalidationIT Javadoc clarifies that the assertion is on the cacheable table side; Domain itself is in UNCACHED_ENTITY_TYPES. - CacheProvider.scanCount Javadoc documents the O(DBSIZE) cost so callers don't drop it on the request path. - EntityRepository.invalidateCacheForTaggedEntities Javadoc strengthens the search-lag tradeoff section with the actual fallback (entity TTL). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * PR review: accept string and Boolean forms for warmup config flags readAppConfigFlags previously only accepted Boolean.TRUE / Boolean.FALSE. Depending on how the app config arrives (typed POJO, raw JSON, YAML env-var override, API string body) the same logical value can land as a "true"/"false" string — and the instanceof Boolean check would silently ignore it and fall back to the JVM system property. Operators setting the flag in the UI would see no effect. Added a small parseBooleanFlag helper that handles both shapes. |
||
|
|
457afabcf0
|
MINOR - React and Java telemetry (#27773)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* chore: added sentry for UI * chore: added sentry for UI * chore: added sentry for UI * chore: added sentry for UI and backend * chore: move setup to config * Update generated TypeScript types * chore: addressed CI comments * chore: handle no ui startup in IndexResource --------- Co-authored-by: IceS2 <pablo.takara@getcollate.io> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
368fae160b
|
Revert "Feature #18173: Version API Improvements" (#26307) (#27837)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Revert "Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration (#26307)"
This reverts commit
|
||
|
|
46390da897
|
fix(athena): ingest Iceberg table properties from $properties metatable (#27715) | ||
|
|
7140502804
|
feat(nlq): add modelId to OpenAI NLQ configuration (#27789)
* feat(nlq): add modelId to OpenAI NLQ configuration Mirrors the bedrock provider, which already exposes a modelId for query transformation alongside its embeddingModelId. The OpenAI provider only had embeddingModelId, leaving the chat-completions model hardcoded in the client. Adding modelId here lets operators pick the chat model (e.g. gpt-4o-mini, gpt-4o) per deployment without code changes. * Update generated TypeScript types --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
b582ecc81c
|
sap-success-factors (#27664) | ||
|
|
88c44502ae
|
feat: Add auto-classification support for storage service containers (#26495)
Some checks failed
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Has been cancelled
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Has been cancelled
Publish Package to Maven Central Repository / publish-maven-packages (push) Has been cancelled
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Has been cancelled
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Has been cancelled
* Add schema support for container auto-classification Extend container entity schema to support sample data storage, enabling PII detection and classification workflows on storage service containers. Changes: - Add sampleData field to container.json for storing sample data - Create storageServiceAutoClassificationPipeline.json schema defining configuration for storage service auto-classification pipelines - Update workflow.json to include StorageServiceAutoClassificationPipeline as a supported pipeline type This provides the schema foundation for running auto-classification workflows on S3, GCS, and other storage service containers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add backend support for container sample data and classification Implement Java backend functionality to handle sample data ingestion, storage, and PII masking for container entities. Changes: - ContainerRepository: Add sample data retrieval and storage operations - EntityRepository: Extend sample data support to container entities - ContainerResource: Add REST endpoint for container sample data ingestion - PIIMasker: Extend PII masking to support container entities This enables the backend to process and store sample data from storage service containers and apply PII masking rules during data retrieval. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Extend classifiable entity types to include containers Add Container to the ClassifiableEntityType union, enabling PII detection and auto-classification workflows to process storage service containers alongside database tables. Changes: - Update ClassifiableEntityType from Table-only to Union[Table, Container] - Import Container entity type - Update module docstring to reflect current support This type extension allows the PII processor to handle both database tables and storage containers uniformly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add container sample data ingestion to OpenMetadata API Implement container-specific API mixin for sample data operations and integrate it into the main OpenMetadata client. Changes: - Add OMetaContainerMixin with ingest_container_sample_data method - Handle binary data encoding (base64) and serialization errors - Register mixin in OpenMetadata class hierarchy - Mirror table sample data ingestion patterns for consistency This provides the Python API layer for ingesting sample data from storage service containers into OpenMetadata. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Implement storage service samplers for S3 and GCS Add sampler implementations for storage services to extract sample data from structured containers (Parquet, CSV) for auto-classification. Changes: - Create base StorageSamplerInterface for storage service sampling - Implement S3Sampler for AWS S3 containers with structured file support - Implement GCSSampler for Google Cloud Storage containers - Support column extraction and data sampling for structured formats - Handle dataModel-based column definitions from containers Storage samplers read container metadata, fetch file contents, and generate sample datasets for downstream PII detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update PII processor to support container entities Extend the base PII processor to handle both Table and Container entities with unified column extraction logic. Changes: - Add _get_entity_columns helper to extract columns from Table or Container - Handle Container entities with optional dataModel.columns structure - Improve column matching with safe fallback for missing columns - Use generic entity reference in error reporting - Add early return when entity has no columns to process This enables PII detection to run on storage containers the same way it processes database tables. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add storage service support to sampler processor Extend the sampler processor to handle both database and storage service entities with appropriate sampler class selection. Changes: - Detect service type from source config (Database vs Storage) - Import StorageServiceAutoClassificationPipeline - Handle both Table and Container entity types in _run method - Add column validation for Container entities (via dataModel.columns) - Create storage-specific sampler interfaces for S3 and GCS - Update sampler_interface to support Container entities - Improve error messages with entity type context The processor now dynamically selects database or storage samplers based on the pipeline configuration type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add storage fetcher strategy for container classification Implement fetcher strategy pattern for storage services to retrieve containers for auto-classification workflows. Changes: - Add StorageFetcherStrategy to handle storage service entity fetching - Update EntityFetcher to select appropriate strategy based on service type - Support both DatabaseService and StorageService in strategy selection - Import StorageService type for service detection - Improve error messages with specific service type information The fetcher now dynamically creates database or storage-specific strategies to retrieve entities based on pipeline configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Register auto-classification pipeline in storage service specs Add AutoClassification pipeline support to S3 and GCS storage service specifications, enabling UI and workflow registration. Changes: - Add AutoClassification to S3ServiceSpec supported pipelines - Add AutoClassification to GCSServiceSpec supported pipelines - Import StorageServiceAutoClassificationPipeline in both specs This registers the auto-classification workflow type for storage services in the ingestion framework's service registry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add container support to metadata sink and patch operations Extend metadata sink and patch mixin to handle container entities, enabling sample data ingestion and tag updates for containers. Changes: - Add Container to MetadataRestSink entity type handling - Implement container sample data ingestion in sink._run - Add Container to PatchMixin tag operations - Import Container entity type in both modules This completes the metadata ingestion pipeline by allowing the sink to persist sample data and classification tags for container entities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update classification workflow for storage service support Extend the auto-classification workflow to handle both database and storage service pipelines with unified step orchestration. Changes: - Import StorageServiceAutoClassificationPipeline - Add type checking for both Database and Storage pipeline configs - Remove unnecessary cast, use direct type checks - Add validation warning for unsupported config types - Preserve enableAutoClassification flag behavior for both types The workflow now supports running PII detection and classification on both database tables and storage containers based on config type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add unit tests for container classification components Add test coverage for container-specific fetcher and sampler components. Changes: - Add test_container_fetcher.py for StorageFetcherStrategy tests - Add test_container_sampler_processor.py for container sampler tests Tests validate: - Storage service fetcher strategy selection and instantiation - Container sampler processor initialization and execution - Proper handling of Container entities vs Table entities 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Reorganize integration tests by entity type Restructure auto-classification integration tests into separate directories for databases and containers to improve organization. Changes: - Move database classification tests to databases/ subdirectory - Move conftest.py, init.sql, and test_tag_processor.py into databases/ - Container tests already organized in containers/ subdirectory - Remove old flat test structure This organization makes it clearer which tests target database entities vs storage container entities in classification workflows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Properly retrieve sample data * Update generated TypeScript types * Apply Gitar bot * Fix tests * feat: Add supportsProfiler to storage connection schemas Add supportsProfiler field to storage connection schemas (S3, GCS, ADLS, Custom Storage) to enable auto-classification pipeline support for storage services. This aligns with the backend changes in PR #26495 that added container auto-classification functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add UI support for storage service auto-classification - Update IngestionWorkflowUtils to route storage services to storage-specific auto-classification schema - Modify getSupportedPipelineTypes to filter pipeline types based on service category (storage services only show AutoClassification, not Profiler) - Update AddIngestionButton to pass serviceCategory parameter - Add unit test to verify storage services only get AutoClassification option This enables users to configure and run auto-classification agents on storage services (S3, GCS, ADLS) for PII detection on containers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Add BucketArn field to S3BucketResponse model AWS S3 API now returns a BucketArn field in list_buckets() responses. Add this optional field to prevent Pydantic extra_forbidden validation errors. Error: BucketArn Extra inputs are not permitted [type=extra_forbidden] 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Add Container permissions to AutoClassificationBotPolicy Add Container entity permissions to AutoClassificationBotPolicy to allow the autoClassification-bot to apply tags and sample data to storage containers. Previously, the bot only had permissions for Table entities, causing permission denied errors when running auto-classification on storage services. Changes: - Add Container rule with EditAll and ViewAll operations to policy seed data - Create migrations for MySQL and PostgreSQL to update existing installations Error fixed: Principal: CatalogPrincipal{name='autoclassification-bot'} operations [EditTags] not allowed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update generated TypeScript types * fix: Add fallback for storage service type detection in sampler Add fallback logic to detect storage services by source type name when the pipeline config type check fails. This handles cases where the Airflow environment might not have the updated schema/package with StorageServiceAutoClassificationPipeline. Changes: - Add fallback detection for s3, gcs, azuredatalake, customstorage - Add debug logging for service type detection - Preserve primary instanceof check for proper type detection This fixes the "No module named 'metadata.ingestion.source.database.gcs'" error when running storage auto-classification pipelines. * Guide to support new entities in classification agent * docs: Update auto-classification guide with debugging learnings Add critical troubleshooting information discovered during container classification debugging: 1. storeSampleData defaults to false - Sample data NOT ingested unless explicitly enabled - Document why this is by design (avoid large datasets) - Add troubleshooting steps to verify flag is set 2. Service type detection fallback pattern - Explain why fallback is needed (Airflow package caching) - Show complete implementation with source type lists - Add debug logging pattern 3. Troubleshooting section - Sample data not appearing: check storeSampleData, database, logs - Module import errors: service type detection issues - PII tags not applied: config and data issues 4. Common pitfalls additions - Emphasize storeSampleData default value - Service type detection in cached environments These updates reflect real debugging scenarios and will help future developers avoid the same issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Apply gitar bot suggestions * Fix suggestions, linting, and SonarCloud issues * More gitar bot suggestions * Fix compile error * Fix linting * Fix broken tests * Fix unorganized import * Improve config parsing This is so that we rightly discover polymorphic properties of `source` when the config does not provide enough fields for Pydantic to correctly discriminate between models (e.g: confusing database source config with storage source config) * Gitar bot comment * Fix s3 source test * Apply comments from reviews * Extract cantidate column logic in samplers * Fix tests * Fix container customization test --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
1c8b30002d
|
Add streamDeadlineMinutes to AiPlatform gRPC config (#27693)
* add grpc timeout var * Update generated TypeScript types * enhance description * Update generated TypeScript types * simplify description --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
5d9dbfa2d1
|
fix(domains): add dryRun support to bulk asset add/remove for domain warnings (#27141)
* fix(domains): add dryRun support to bulk asset add/remove for domain warnings (#22673) Adds a `dryRun` boolean field to `BulkAssets` so callers can preview the impact of moving or removing assets from a domain before committing the change. When `dryRun=true` on PUT /v1/domains/{name}/assets/add: - Returns per-asset impact messages describing which domain the asset would be moved from and which data product relationships would be lost - No writes are performed When `dryRun=true` on PUT /v1/domains/{name}/assets/remove: - Returns per-asset messages stating the asset will be removed and which data product relationships would also be broken - No writes are performed The per-asset message includes the entity type (table, dashboard, etc.) so the UI can show a precise warning modal before the user confirms. Changes: - openmetadata-spec: added `dryRun` field to bulkAssets.json schema - DomainRepository: bulkAssetsOperation respects dryRun flag, computes and returns impact via buildDryRunImpactResponse/buildDryRunImpactMessage - Integration tests: DomainBulkAssetsDryRunIT covers add/remove dryRun with and without data product side effects, and verifies no writes occur Fixes #22673 * Update generated TypeScript types * fix: address PR review comments for dryRun bulk assets - Return informative messages for all dryRun add cases: first-time add, already-in-target-domain, and domain move (fixes empty message issue) - Extract shared filterDataProductsByDomain to deduplicate logic between getAffectedDataProductsForDryRun and cleanupDataProducts - Add test for first-time dryRun add (asset with no existing domain) - Run mvn spotless:apply Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(ui): warn before moving assets between domains When adding assets to a domain, first call the new dryRun endpoint to preview the impact. If the move would reassign assets from another domain or break data-product links, surface a confirmation modal with the backend-provided impact messages before committing the change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(ui): wire dryRun confirmation on domain asset remove Extend the dryRun warning modal to the asset-remove path (previously only on add) so users see when removing an asset will break data product relationships. Extract the shared impact-detection helper into DomainDryRunUtils so add/remove paths use the same matcher. Add Playwright coverage for both flows: warning modal appears on impactful changes, cancel aborts without committing, and commit fires automatically when dryRun reports no impact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): add hasSideEffects flag to BulkResponse for dryRun previews Add a structured hasSideEffects boolean to the BulkResponse schema so clients can detect impactful dryRun outcomes (cross-domain moves, broken data product relationships) without parsing the human-readable message text. DomainRepository.bulkAssetsOperation sets the flag when a preview detects a real side effect, and the integration tests now assert the flag directly. The UI util drops its English phrase matcher in favor of filtering successRequest by hasSideEffects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update generated TypeScript types * refactor(ui): extract DomainAssetDryRunModal with Untitled UI styling Build a shared modal component for the domain asset dryRun warning, rendered with Untitled UI primitives (ModalOverlay/Dialog/Alert/Button) instead of the generic ConfirmationModal. The new modal surfaces: - a warning Alert at the top explaining that changes will have side effects - a structured per-asset row with entity-type icon, clickable FQN link, and the backend impact message - an affected-count indicator in the footer alongside cancel/confirm buttons Both add (AssetsSelectionModal) and remove (AssetsTabs) paths use the same component, passing their own confirm label, header, and testid. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): align dryRun modal footer buttons on a single row Dialog.Footer hard-codes a 2-column grid for its children. Passing the count plus two buttons as three siblings made the second button wrap onto a new line. Group Cancel and Confirm in one wrapper so the grid sees just two cells: count on the left, button row on the right. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): rename DomainAssetDryRunModal to follow component conventions Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/bbd40239-487d-4272-95dc-abb04d416693 Co-authored-by: siddhant1 <30566406+siddhant1@users.noreply.github.com> * fix(ui): mock DomainAssetDryRunModal in AssetsTabs tests AssetsTabs now renders DomainAssetDryRunModal, which pulls in react-aria-components via the core-components package. The core-components package ships its own React copy in its node_modules, so Jest resolves two React instances and ModalOverlay's useContext returns null on render -- breaking every AssetsTabs test regardless of isOpen. Mocking the modal in the test isolates the render from that dependency graph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(e2e): drive dryRun confirmation modal in domain asset remove spec DomainDataProductsWidgets "assets are removed" clicks delete on a topic that is also attached to a data product, so the new bulk-asset flow pops the DomainAssetDryRunModal instead of removing immediately. The spec now waits for the dryRun preview response, confirms the modal via "Remove Anyway", and then waits for the real remove call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): give DomainAssetDryRunModal a dedicated testid The modal was reusing data-testid="confirmation-modal", which is already set on ConfirmationModal, TeamHierarchy, UserTab, and GlossaryTermTab. Tests had to scope it with hasText filters on the English title to disambiguate. Renames it to domain-dry-run-modal and updates the Playwright specs (Domains, DomainAdvanced, DomainDataProductsWidgets) to select the modal directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Siddhant <siddhant@MacBook-Pro-621.local> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: siddhant1 <30566406+siddhant1@users.noreply.github.com> |
||
|
|
df9e5e0a58
|
Fivetran improvements (#27270)
* Fix Fivetran connector: stage processor error reporting, messaging service names, and schema support
- Report failures to status when stage processor throws an exception in topology_runner
- Add get_messaging_service_names() to PipelineServiceSource for messaging lineage support
- Add messagingServiceNames to pipelineServiceMetadataPipeline JSON schema
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add task types and pipeline execution status to Fivetran connector
- Set taskType="sync" on Fivetran pipeline tasks
- Implement yield_pipeline_status() to derive execution history from
succeeded_at/failed_at timestamps in connector details
- Add failed_at to mock dataset for test coverage
- Add tests for task type, status with both/one/no timestamps
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Enhance Fivetran connector with destination DB log queries, ELT task phases, and lineage fixes
The Fivetran REST API sync-history endpoint has very limited retention
(entries age out within hours). This change queries the destination
warehouse's fivetran_metadata.log table directly for comprehensive sync
history with accurate per-phase timing, falling back to the REST API
when the destination DB is unavailable.
Key changes:
Destination DB sync history:
- Resolve the destination warehouse's DatabaseService from the service
registry using the existing dbServiceNames lineage configuration
- Query fivetran_metadata.log table with sqlglot-generated quoted
identifiers (dialect-aware for Snowflake uppercase, Postgres lowercase)
- Parse LOG events (sync_start, extract_summary, write_to_table_start/end,
sync_end, sync_stats) to derive per-phase timing and status
- Graceful fallback to REST API on any failure (unsupported destination
type, missing service, query error)
- Time-bounded queries (90-day retention) to avoid unbounded fetchall()
ELT task phases (Extract → Process → Load):
- Replace single "sync" task with three distinct pipeline tasks
representing Fivetran's ELT phases
- Each task has independent timing and status derived from LOG events
- sync_stats durations used as fallback when intermediate events are
missing (e.g., incremental syncs with no data changes)
- Task DAG wiring via downstreamTasks for UI rendering
Lineage fixes:
- Fix service name fallback: change `or "*"` to `or []` to prevent
building FQNs with literal "*" as service name
- Resolve pipeline entity once per connector instead of per-table (N+1)
- Fetch destination details once per group instead of per-connector (N+1)
- Support messaging sources (Kafka, Confluent Cloud) with topic lineage
- Column-level lineage via Fivetran schema API
- Self-lineage prevention (source == destination entity)
Client robustness:
- Add null/type guards to run_paginator for None API responses
- Fix get_connector_details/get_destination_details to return {} instead
of None on failure
- Fix base64 token encoding to use .decode("ascii") instead of str()[2:-1]
- Fix type annotation from Optional[Response] to Optional[dict]
- Remove unused Response import
Other improvements:
- Display name shows source first: "postgres <> Snowflake"
- Task type "Process" (not "Transform") for the processing phase
- sourceUrl only on Pipeline, not on individual Task objects
- Add copyright headers to models.py and service_spec.py
- Add return type annotations to model properties
- 34 unit tests covering DB query path, fallback scenarios, lineage
resolution, column lineage, and task status building
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix review findings: null guard pipeline_entity, cascade extract failure, schedule interval edge cases
- Add null guard after pipeline_entity resolution in lineage yield to
prevent AttributeError crash when pipeline entity is not found
- Cascade extract failure to process/load status instead of reporting
false success when write events or sync_stats timestamps exist
- Handle non-hour-divisible schedule intervals (e.g. 90 min) and clamp
values >= 24 hours to daily cron
- Add 5 tests covering all three fixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update generated TypeScript types for messagingServiceNames schema change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix client null guards and replace deprecated datetime.utcnow()
- Add isinstance(response, dict) guards in get_connector_schema_details
and get_connector_column_lineage, consistent with get_destination_details
- Replace datetime.utcnow() with datetime.now(timezone.utc) for Python 3.12+
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix operator precedence, null-safe config access, and tz-aware datetime fallback
- Add parentheses around messagingServiceNames to fix operator precedence
when lineageInformation is None
- Use (dict.get("config") or {}) pattern to handle explicit null values
from the Fivetran API without AttributeError
- Use tz-aware datetime.min fallback in sync sorting to avoid TypeError
with tz-aware DB timestamps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Apply Python formatting (black/isort) to changed files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix Python formatting to match CI black==22.3.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix destination service resolution and zero-change sync status
- Filter _resolve_destination_service by dest_service_type to avoid
resolving to the source database service instead of the destination
- Mark process_status as Successful for zero-change incremental syncs
where extract succeeds and sync_end is SUCCESSFUL but no write events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Re-trigger CI for flaky WorkflowDefinitionResourceIT
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Address PR review feedback: SQLAlchemy reflection, constant cleanup, tests
- Revert out-of-scope topology_runner.py change (Harsha)
- Remove UNSUPPORTED_DESTINATION_TYPES — Fivetran Platform Connector
is available on all destinations including Databricks
- Move FIVETRAN_STATUS_MAP and HISTORICAL_SYNC_FIELDS to module level
- Replace sqlglot + raw SQL with SQLAlchemy MetaData.reflect() and
select(), use yield_per(100) for OOM protection
- Extract _try_parse_json helper to reduce nesting in _parse_sync_events
- Standardize StatusType enum usage (remove .value calls)
- Fix operator precedence in get_db_service_names/get_storage_service_names
- Add unit tests for schedule interval edge cases, malformed JSON,
multi-sync parsing, fallback task statuses
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Revert test_runner.py test for reverted topology_runner change
Remove TestRunStageProcessorErrorReporting test class since the
corresponding topology_runner.py status.failed() change was reverted
as out of scope for this PR.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix Python formatting (black)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add test coverage; address review comments
* Add test coverage; address review comments
* Update generated TypeScript types
* Address review comments
* Address comments
* Add test coverage
* Address comments
* chore: remove connector-audit prompt file (deleted upstream in main)
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
* Address comments for loading all events
* Fix sonar warnings
---------
Co-authored-by: Aydin Geeringh <aydingeeringh@Mac.lan>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Aydin Geeringh <47472853+aydingeeringh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
|
||
|
|
6f8d107bee
|
fix: added description to paging field to not overwite python import (#27673)
* fix: added description to paging field to not overwite python import * Update generated TypeScript types --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
51ecf4502f
|
Task redesign (#25894)
* Task Redesign: Add Task entity & tests * Task Redesign: Add Task entity & tests * Task Redesign: Add Permissions checks for Task APIs * Task UI changed to the new APIs * Migrate UI and APIs to new tasks system inlcuding suggestions * Add Suggestions integration * Activity Feed Refactor * ActivityFeed -> ActivityStream publisher * Activity Feed redesign * Activity Feed redesign, adding tests * Incident Manager update * Migrate Incidents to new tasks * Migrate Incidents to new tasks * Update generated TypeScript types * Update generated TypeScript types * feat(tasks): add domain-aware task cutover and workflow v2 migration * test(tasks): cover domain filters and task feed visibility flows * Address comments * Fix workflow tests to use new Task entity API and fix UserApprovalTaskV2 candidate transformation Migrated 9 WorkflowDefinitionResourceIT tests from legacy Feed/Thread API to the new Task entity API (UserApprovalTaskV2 creates Task entities, not Thread entities). Fixed a bug in UserApprovalTaskV2 where candidates were passed as raw EntityReferences instead of being transformed into users/teams FQN arrays for SetApprovalAssigneesImpl. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix tests * refactor: stabilize task entity workflows * refactor: finish task entity cutover and activity migration * refactor: migrate legacy thread feed during cutover * refactor: split legacy thread rename and archive migrations * Merge main; fix tests * Update generated TypeScript types * feat: advance task redesign through phase 2 * Merge main; fix tests * Update generated TypeScript types * Fix failing tests * Update generated TypeScript types * fininsh phase 6 of the design, configurable task forms * Update generated TypeScript types * Update generated TypeScript types * Fix linting * Address gitar comments * Address gitar comments * Fix build * Address giar comments * fix build * Add task custom forms * Fix tests * Address tests * Apply UI lint autofixes * Fix tess * Fix linter * Fix task patching * Fix tests * Fix playwright tests * fix java checkstyle * Add python sdk support for tasks, annoucements * Fix playwright tests * Fix playwright tests * Fix playwright tests * Fix python tests * Fix python tests * Fix linting workflows * fix pycheck * fix pycheck * Fix tests * Fix build * Address deviations from main and fix tests * Fix integration tests * Fix integration tests * Fix integration tests * Update generated TypeScript types * Fix Playwright tests * Fix Playwright tests * feat(incident): wire incident manager to task-first architecture (#27369) * feat(incident): wire incident manager to task-first architecture Connect the incident manager to the task redesign so it works end-to-end: resolve data persistence, backward transitions, reopen from resolved, and incident discovery via TCRS. * Update generated TypeScript types * refactor: single-query incident task lookup with parameterized statuses Replace two sequential queries (Open, InProgress) in getOrCreateIncident with one findByAboutAndTypeAndStatuses query using @BindList for status IN (...). --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Fix Playwright tests * Update generated TypeScript types * Fix linter * Fix tests * Fix tests * Fix checkstyle * Fix tests * Fix checkstyle * Update FeedResourceIT.java * Update TableRepository.java * fix tests * Update ActivityFeedProvider.tsx * fix tests * fix tests * Address Task comments * Fix unit test * Fix the feed summary panel showing on landing page * Fix comment functionality * Fix pytests * Fix failing playwright tests * Fix test flakiness * Fix ui-checkstyle * Fix advanced search spec failure * Fix playwright tests Co-authored-by: Copilot <copilot@github.com> * Fix checkstyle * Fix the flaky tests Co-authored-by: Copilot <copilot@github.com> * fix checkstyle * Reduce the workflow polling * Update generated TypeScript types * skip failing tests Co-authored-by: Copilot <copilot@github.com> * Fix ui-checkstyle --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com> Co-authored-by: IceS2 <pablo.takara@getcollate.io> Co-authored-by: karanh37 <karanh37@gmail.com> Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com> Co-authored-by: Copilot <copilot@github.com> |
||
|
|
b47c219954
|
FEATURE: Add SAP S4/HANA Dashboard Connector (#27242)
* FEATURE: Add SAP S4/HANA Dashboard Connector * Added ssl config * fix: verifySSL default to no-ssl and rename test connection file to sapS4Hana.json Co-authored-by: Akash Verma <138790903+akashverma0786@users.noreply.github.com> * Update generated TypeScript types --------- Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local> Co-authored-by: Gitar <noreply@gitar.ai> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
e4d3e423e1
|
Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration (#26307)
* Feature #18173: Improve Version API, through paginatio, get x latest versions, specifict time, specific metadata changes * Feature #18173: Version API Improvements, Last x versions order by desc, versions from specific timeline, versions for specific metadata changes, sdk support and UI integration * Update generated TypeScript types * address comments * fix py check * Address comments * Address comments * Fix tests * Fix tests * Fix tests * Better way to lookup versions * Fix pytests * Fix tests * Address comments * chore(migrations): move version API schema additions from 1.13.0 to 1.12.7 Moves the PR's new entity_extension columns (versionNum, changedFieldKeys), indexes, and backfill scripts from the 1.13.0 migration directory into a new 1.12.7 directory. Keeps 1.13.0 identical to upstream main; only this PR's additions land in 1.12.7. Also updates MigrationSqlStatementHashTest to exercise the relocated files. * fix(versions): address CI failures and review feedback - testAPI.test.ts: update getTestCaseVersionList mock expectation to include the new params argument (APIClient.get is called with { params } since the function now supports limit/offset/fieldChanged). - PaginatedVersionHistory.spec.ts: replace banned networkidle waits and waitForSelector with web-first assertion on version-button visibility (satisfies playwright/no-networkidle and playwright/no-wait-for-selector). - EntityVersionTimeLine.tsx: implement infinite scroll via IntersectionObserver on a sentinel element at the bottom of the version list. Hooks up the onLoadMore/hasMore/isLoadingMore props that were in the interface but previously unused. - EntityVersionPage.component.tsx: fix stale-closure bugs in fetchMoreVersions (gitar-bot review). Use versionListRef for currentOffset and isLoadingMoreRef to gate concurrent invocations so IntersectionObserver double-firing does not cause duplicate appends. - EntityResource.java: accept offset > 0 with default limit when no fieldChanged is provided, so pagination params are no longer silently ignored (Copilot review). - datamodel_generation.py: raise explicit errors if generated files or expected replacement targets are missing, instead of silently succeeding when the generator output drifts (Copilot review). * fix(checkstyle): format Java, ESLint/Prettier on UI, relax datamodel_generation strict check - Java: spotless:apply on EntityResource.java (line-break formatting). - Python: relax datamodel_generation.py DIRECT_IMPORT_FIXES check — replacement targets are alternative forms the generator may or may not emit. Only require the final marker ('from .paging import Paging') is present after replacements; the prior strict per-target check broke 'make generate'. - UI lint: organize-imports, ESLint --fix, Prettier on all version-related files touched by the PR (resolves lint-src + lint-playwright CI checks). - EntityVersionTimeLine: guard IntersectionObserver effect with isLoadingMore so the observer is torn down while a fetch is in flight (Copilot review). - EntityVersionTimeline.test.tsx: add unit tests covering sentinel rendering conditions (hasMore, onLoadMore) and the isLoadingMore observer-guard (Copilot review). * fix(ui-checkstyle): prettier+eslint on EntityVersionTimeline.test.tsx Collapse import line and reorder JSX props (callbacks last) per repo lint rules. Reruns ui-checkstyle-changed caught these in the new test file from the previous commit. * test(playwright): address @aniketkatkar97 review on PaginatedVersionHistory spec - Add waitUntil: 'domcontentloaded' to every page.goto() call. - Wait for loaders (waitForAllLoadersToDisappear) before asserting the version-button to avoid racing the initial entity render. - Replace the manual { timeout: 15_000 } on versionSelectors.nth(1) with an explicit waitForResponse on the second paginated /versions call (offset > 0). This deterministically synchronises on the infinite-scroll fetch instead of a wall-clock timeout. * fix: address Copilot review — one-shot observer + local SQL splitter 1. EntityVersionTimeLine.tsx: call observer.unobserve(entry.target) as soon as the sentinel first intersects so onLoadMore fires only once per attached observer. The effect reattaches a fresh observer after isLoadingMore flips back to false, so subsequent pages still load — we just no longer rely on the parent's in-flight ref as the sole stopgap against repeated fires for the same page. 2. MigrationSqlStatementHashTest.java: replace Flyway's non-public org.flywaydb.core.internal.* parser classes with a small, local SQL statement splitter. Handles line (--) and block comments, single-, double-, and backtick-quoted strings, backslash escapes, and doubled- quote escapes. Removes a brittle dependency on Flyway internals that could break on upgrades. Tested: - mvn test -pl openmetadata-service -Dtest=MigrationSqlStatementHashTest → 2 tests pass. - yarn test EntityVersionTimeline.test.tsx → 8/8 tests pass. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: sonika-shah <sonika-shah@users.noreply.github.com> Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com> Co-authored-by: sonika-shah <sonikashah94@gmail.com> Co-authored-by: Mohit Yadav <105265192+mohityadav766@users.noreply.github.com> |