Accidentally re-introduced by a merge with the pre-amend commit. The active
.java test is unchanged; this only deletes the dead duplicate copy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new it/tests/search/*IT.java tests perform server-global, destructive
search operations on the shared test server — full reindex with
recreateIndex=true (ReindexStopUnderLoadIT), OpenSearch container pause
(LiveIndexRetryIT via EsOutageInjector), and alias swaps / reindex triggers
(ReindexAliasSwapIT, NoDuplicatesDuringReindexIT, DbToEsCountReconciliationIT,
ReindexStatsIT, IndexFieldExplosionIT, IndexMappingTemplatesIT,
NestedHierarchyIndexIT).
They guarded themselves only with @ResourceLock("SEARCH_INDEX_APP"/
"SEARCH_INDEX_RETRY"), which serializes them against each other but not
against the hundreds of entity-CRUD ITs that declare no such lock. Running
concurrently, a reindex/recreate or ES pause wiped or hid the freshly-indexed
documents of every other in-flight test, producing board-wide "Entity should
be present in search index" 3-minute timeouts.
Move them into the existing sequential-tests execution (forkCount=1,
parallel.enabled=false) and exclude them from parallel-tests across all CI
profiles, mirroring the existing tests/search/scale handling. The glob
**/tests/search/*IT.java matches the package directly and not the already
excluded scale/ subpackage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new it/tests/search/*IT.java tests perform server-global, destructive
search operations on the shared test server — full reindex with
recreateIndex=true (ReindexStopUnderLoadIT), OpenSearch container pause
(LiveIndexRetryIT via EsOutageInjector), and alias swaps / reindex triggers
(ReindexAliasSwapIT, NoDuplicatesDuringReindexIT, DbToEsCountReconciliationIT,
ReindexStatsIT, IndexFieldExplosionIT, IndexMappingTemplatesIT,
NestedHierarchyIndexIT).
They guarded themselves only with @ResourceLock("SEARCH_INDEX_APP"/
"SEARCH_INDEX_RETRY"), which serializes them against each other but not
against the hundreds of entity-CRUD ITs that declare no such lock. Running
concurrently, a reindex/recreate or ES pause wiped or hid the freshly-indexed
documents of every other in-flight test, producing board-wide "Entity should
be present in search index" 3-minute timeouts.
Move them into the existing sequential-tests execution (forkCount=1,
parallel.enabled=false) and exclude them from parallel-tests across all CI
profiles, mirroring the existing tests/search/scale handling. The glob
**/tests/search/*IT.java matches the package directly and not the already
excluded scale/ subpackage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The util clicked the Delete Property confirm save-button and returned
immediately. Custom property deletion is implemented in CustomPropertiesPageV1
as a JSON-patch on the Type entity (rest/metadataTypeAPI.ts updateType ->
PATCH /api/v1/metadata/types/{id}). While that PATCH was in flight the
ConfirmationModal stayed mounted with its Confirm button in a `loading`
state, leaving an <ant-modal-wrap ant-modal-centered> in the DOM that
intercepted pointer events. Callers like GlossaryImportExport.spec.ts loop
through deleteCreatedProperty + settingClick(GLOSSARY_TERM); on slow AUT
runs the leftover modal mask intercepted the next iteration's click on
[data-testid="app-bar-item-settings"] and the 180s test budget was burned
waiting for the sidebar item to become actionable. PR #27952 addressed the
wrong modal; the trace in nightly run aut/26261444729 shows the visible
dialog is the Delete Property confirm, not the version-history drawer.
Await the PATCH 200 on /metadata/types/ and assert the modal's body-text
has unmounted. ConfirmationModal uses destroyOnClose, so body-text detach
is the cleanest signal that the mask is gone. save-button cannot be used
for the detach assertion because its testid briefly swaps to loading-button
while the PATCH is in flight.
Co-authored-by: Siddhant <siddhant@MacBook-Pro-751.local>
* fix(search): index usageSummary so reindex preserves Explore weekly-usage sort
TableIndex.getRequiredReindexFields() declared "columns" but not
"usageSummary". usageSummary is fields-gated in TableRepository
(clearFields nulls it unless requested), so the reindex path — which
fetches only the declared required fields — dropped it from the table
search document's _source. Explore's "Sort by Weekly Usage" reads
_source.usageSummary.weeklyStats.count, so it silently broke after any
full reindex even though live-served docs looked fine.
Add "usageSummary" to the required reindex field set so the reindexed
document carries it, matching the live entity payload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): update search index on usage report (live path)
Usage is recorded via direct DAO writes in UsageRepository, bypassing
EntityRepository.update — so the entity-lifecycle SearchIndexHandler
never fires, and a reported usage never reached the search document.
The search doc kept a stale/absent usageSummary until the next full
reindex, so Explore "Sort by Weekly Usage" didn't reflect freshly
reported usage live.
After recording usage, push the refreshed entity into the search index
(updateEntity re-fetches with all fields, so usageSummary is included).
Table usage rolls up to its schema + database, so refresh those docs
too. Search failures are logged, not propagated — the usage write is
already committed.
Together with the reindex-fields change this keeps usageSummary present
in _source on both the live and reindex paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): use index required fields for usage reindex, not "*"
The live-path usage search update called SearchRepository.updateEntity,
which re-fetches the entity with getFields("*"). For the rolled-up
database/schema that hydrates every child table — tens of thousands on
large catalogs — risking OOM, the exact over-fetch the reindex-fields
work exists to avoid.
Fetch each affected entity (table, schema, database) with only its
index's required reindex fields via ReindexingUtil.getSearchIndexFields,
then updateEntityIndex(entity) directly. Mirrors the reindex pipeline's
field selection and keeps the rollup cheap and bounded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): bound updateEntity field fetch to index required fields
SearchRepository.updateEntity(EntityReference) re-fetched the entity with
getFields("*") before re-indexing. This runs on every live entity update
(SearchIndexHandler.onEntityUpdated) and tag-propagation refresh, for all
entity types. For container entities (database/schema) "*" hydrates every
child — tens of thousands of tables on large catalogs — so a single live
update could OOM the server. That's a far bigger blast radius than the
usage path alone.
Fetch only the fields the entity's search index declares as required
(searchIndexFactory.getReindexFieldsFor) — the same set the reindex
pipeline uses — so live updates and reindex stay consistent and bounded.
Reverts the per-call workaround in UsageRepository.updateUsageInSearch
back to plain updateEntity calls, now that updateEntity itself is bounded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(usage): switch expression + drop schema/database live refresh
Address PR #28350 review:
- addUsage: convert the break/mutable-variable switch to a Java 21
switch expression (no mutable response, no break boilerplate).
- updateUsageInSearch: only refresh the reported entity's search doc.
Dropped the cascade to the rolled-up schema + database — usage
reporting can be high-volume and the table doc is the surface that
matters ("Sort by Weekly Usage"); schema/database usageSummary
reconciles on the next reindex. This also removes the redundant
second table fetch and the unguarded schema/database refs the bots
flagged. The single updateEntity call is bounded (required fields).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Use safe list
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(alerts): guard isBot() against deleted users to avoid aborting batches
AlertsRuleEvaluator.isBot() called Entity.getEntityByName with
Include.NON_DELETED without catching EntityNotFoundException. When a
change event's userName referenced a user that had been deleted (common
for short-lived test fixtures torn down by afterAll), the exception
escaped the SpEL filter and was caught in AbstractEventConsumer.execute
as "Error in polling events for alert : Entity not found: user ...".
That outer catch logs the error and lets the finally block advance the
offset by batchSize, silently dropping every other event in the batch.
The ActivityFeedAlert subscription was hit hardest because every event
flows through its isBot() filter rule.
Catch EntityNotFoundException and treat an unresolvable actor as not-a-bot.
* fix(alerts): null-safe getIsBot() + IT + unblock ActivityAPI spec
- AlertsRuleEvaluator.isBot(): use Boolean.TRUE.equals(user.getIsBot())
so a null isBot field on the resolved user doesn't NPE on auto-unbox.
- Add AlertsRuleEvaluatorResourceIT#test_isBot_returnsFalseWhenActorUserDeleted
covering the original bug — create a user, evaluate isBot() (false),
delete the user, evaluate isBot() again and assert it still returns
false instead of throwing.
- Remove test.describe.fixme from ActivityAPI.spec.ts so the spec runs
again now that the underlying batch-abort is fixed.
---------
Co-authored-by: Siddhant <siddhant@MacBook-Pro-751.local>
POST/PUT-create with column.extension inlined in the request body never
wrote a row to entity_extension — the data only landed in the table JSON,
while every reader (and the PATCH original reload) looks in entity_extension
via getColumnExtension(). The override clobbered the inline data with null
on every read, leaving the Custom Properties tab blank and causing any
JsonPatch op walking /columns/N/extension/... to fail with "no mapping for
the name 'extension'" on the reloaded original.
One shared single-column writer for both create and update paths:
- Add EntityRepository.storeColumnExtension(UUID, Column) — the upsert that
used to live inside EntityUpdater.updateColumnExtension, lifted out so the
create paths can reach it.
- Add EntityRepository.storeColumnExtensions(UUID, List<Column>) — single
entity walker that flattens via EntityUtil.getFlattenedEntityField and
calls the primitive per column.
- Add EntityRepository.getColumnsForExtensionPersistence(T) hook overridden
in TableRepository and DashboardDataModelRepository to return getColumns().
- Wire into createNewEntityFlush alongside storeExtension.
Update path simplifications:
- updateColumns now uses the shared storeColumnExtension primitive.
- updateColumnExtension (the EntityUpdater-private duplicate writer) is
deleted.
- Existing-column branch skips the upsert when stored.extension equals
updated.extension — previously every update pass rewrote identical JSON
for every column with a non-null extension.
- Added-columns branch now persists extensions via the walker. updateColumns'
main loop skips columns with no stored match, so inline column.extension
on a freshly-added column was silently dropped on PUT-update too.
Regression coverage in ColumnCustomPropertiesIT covers both tableColumn
and dashboardDataModelColumn inline-on-create persistence, plus the
PUT-update-adds-column path.
* feat(tasks): policy-driven authorization with self-approval guard
Moves Task resolve/close/reassign authorization from ~150 lines of custom
Java in TaskRepository into the policy engine. Adds ResolveTask, CloseTask,
ReassignTask MetadataOperation values, isTaskFiler/isTaskAssignee/isTaskReviewer
SpEL conditions, and a new TaskAuthorPolicy seed. Closes the self-approval
gap where a filer who was also in the assignees list could approve their
own task (now denied via deny rule). TaskResourceContext.getOwners now
returns target entity owners so isOwner() retains its conventional meaning;
v200 migration backfills the new policy attachment on the DataConsumer role
for upgrades.
* MCP Tool Usage
* Update generated TypeScript types
* Address PR review feedback on MCP usage tracking
Reorder UA heuristic so VS Code wins over Claude CLI for composite
User-Agents, refactor to a predicate list, and sanitise the resolved
client name (trim, strip control chars, cap at 64 chars). Bound the
schema field to match.
Bound the latency aggregation lists in McpUsageResource with reservoir
sampling so summary/per-tool percentile estimates stay valid without
unbounded heap growth. Skip null-timestamp rows in the history loop and
update the stale /history Swagger description to reflect the ok/fail
shape. Convert CallToolOutcome to a Java record and update the recorder
flow to use accessor methods.
Fix the pre-existing regression in McpImpersonationTest where the mock
still wired the legacy callTool path. Add DefaultToolContextTest with
direct coverage for classifyException (all four ErrorCategory buckets,
cause-chain walk, null message in chain) and the unknown-tool outcome.
* fix(reindex): batch-prefetch upstream lineage off doc-build threads
Reindex doc-build executor runs 50 virtual workers each calling
`getLineageData -> findFrom` per entity, holding a Hikari connection
for the duration. On JDK 21 + HikariCP 7's synchronized borrow path,
those virtual threads get pinned to carriers and stall for >60s, which
fires the connection leak detector and freezes the reindex.
SelectiveFieldReindexUIIT asserted _source.usageSummary.weeklyStats.count
on both the live baseline and after reindex. CI showed it failing at the
live baseline: usage is not indexed via the live path (a usage report
returns its own ChangeEvent and doesn't drive SearchIndexHandler), so
usageSummary only ever lands in the search doc via a reindex — and only
once TableIndex.getRequiredReindexFields() requests it (OM PR #28350).
The test's premise that the live baseline always populates usageSummary
was wrong, so the assertion (and its helper, JSON_MAPPER field, and now-
orphaned imports) is removed. The usage seed scaffolding is kept so an
after-reindex-only assertion can be re-added once #28350 merges and this
branch rebases on main. Coverage for columns/queries/worksheets/testCase/
testSuite surfaces is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Client patches with paths missing the leading '/' (e.g., "displayName"
instead of "/displayName") triggered jakarta.json.JsonException from
JsonPointerImpl, which fell through the exception mapper and surfaced
as an unhandled 500 (and Sentry alert) on PATCH endpoints such as
ClassificationResource.
- JsonUtils.applyPatch now validates each operation's 'path' and 'from'
upfront, throwing IllegalArgumentException with a clear RFC 6901
message before the cryptic library exception fires.
- CatalogGenericExceptionMapper maps jakarta.json.JsonException to 400
as defense in depth, covering other RFC 6902 violations (e.g.,
out-of-range array index, replace on missing path) that were also
returning 500.
- Added JsonUtilsTest cases for malformed 'path' and 'from' pointers.
* Fixes#28245: ingest valueless Databricks/Unity Catalog tags
Databricks/Unity Catalog exposes system-generated (and some user-defined)
tags as (tag_name, tag_value=null). The connectors mapped tag_name ->
Classification and tag_value -> Tag, so an empty tag_value was either
skipped (Unity Catalog) or coerced to a "NONE" sentinel (Databricks).
When tag_value is empty, fall back to a dedicated per-connector
classification (DATABRICKS_TAGS / UNITY_CATALOG_TAGS) and use tag_name
verbatim as the tag under it (no dot-splitting). Valued tags are
unchanged: classification = tag_name, tag = tag_value.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address review: harden valueless-tag mapping
- Treat whitespace-only tag_value as valueless (strip-based check) so it
falls back to the *_TAGS classification instead of being silently
dropped downstream by get_ometa_tag_and_classification.
- Skip rows with empty/None tag_name in the Databricks connector, for
parity with Unity Catalog, so an empty classification name is never
sent to the API.
- Add tests for whitespace-only tag_value (both connectors) and the
empty tag_name skip (Databricks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI surfaced two failures:
1) SearchAvailableDuringReindexUIIT 503 "all shards failed" mid-flight.
Right after a recreate-mode reindex swaps an alias onto a freshly-
created index, that index's shards can still be initialising, so
OpenSearch answers 503 for a beat — shard-allocation lag, not a
search blackout. The await loop used .ignoreNoExceptions() so the
transient crashed the run. Added SearchQueryHelper
.probeIndexToleratingShardLag(budget) which retries 5xx within a
15s budget (a genuine sustained outage still fails) and switched
both mid-flight probes (single-table + all-kinds) to it. The
post-reindex eventual-consistency probe stays a clean query.
2) SelectiveFieldReindexUIIT 401 on /api/v1/search/query. The _source
assertion issued the query via ui.context().request() (Playwright
APIRequestContext), but auth is injected into localStorage which
only the SPA reads — the APIRequestContext sends no Bearer header.
Switched to the authenticated SDK HTTP client (same path
SearchQueryHelper uses); the _source shape is identical regardless
of caller.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(migration): backfill task domains in v1.12.8 for domain-scoped activity feed visibility
Tasks created via approval workflows lacked domains, causing domain-scoped users
to miss them in the activity feed. This migration backfills domains on both
thread_entity (1.12.x) and task_entity (2.x) tables.
- thread_entity: reads tasks where JSON_EXTRACT(json,'$.domains') IS NULL in
batches, resolves domains from the linked entity, and sets $.domains=[] or
the real domain list. Uses json->'domains' IS NULL on Postgres.
- task_entity: inserts HAS-relationship rows from existing MENTIONED_IN rows
via INSERT IGNORE / ON CONFLICT DO NOTHING.
- Failure is non-fatal: logs an error and skips rather than blocking startup.
* fix(migration): correlate NOT EXISTS on fromId to handle multi-domain tasks
* refactor(migration): fix transient-failure caching and extract SQL builder methods
* fix(migration): properly distinguish lookup failures from no-domains; fix counter and skip-on-failure
* feat(migration): move task domain backfill migration from 1.12.8 to 1.12.9
* fix(migration): mark unresolvable thread rows as migrated to prevent infinite loop
Previously, any thread row whose target entity could not be resolved
(malformed JSON, EntityLink parse failure, or non-EntityNotFound errors)
returned null from resolveThreadTaskDomains and was skipped without
updating. Because the batch read uses WHERE \$.domains IS NULL ORDER BY
createdAt LIMIT 500 with no offset, the same failing rows would be
re-fetched in every subsequent batch — once all unaffected rows had
been processed, the loop would spin forever on the remaining failures.
Now any unrecoverable failure marks the row with \$.domains = [] (same
as the legitimate no-domains case), logs a WARN, and increments a
markedDoneOnError counter surfaced at the end of the run. A stall
detector also breaks the loop if a non-empty batch produces zero
updates, guarding against the residual case where the mark UPDATE
itself keeps failing.
Also adds a class-level reindex note: the migration only writes to DB,
so the tasks search index must be rebuilt post-upgrade for the activity
feed (which queries Elasticsearch/OpenSearch) to reflect the backfilled
domains.
* fix(migration): address PR review — non-fatal policy migrations + deleted-row filter
Two issues called out in PR #28013 review:
1. Policy migrations could block server startup. addTriggerOperationToDefaultBotPolicies
handled errors internally, but addTriggerRuleToDataStewardPolicy only catches
EntityNotFoundException — a DB error from policyDAO.update() or a
JsonProcessingException from JsonUtils.pojoToJson() would propagate through
@SneakyThrows and fail the migration step. Wrap the policy calls in their own
try-catch (matching the task domain migration's non-fatal pattern) and drop
@SneakyThrows.
2. INSERT...SELECT for task_entity domain backfill did not filter on
entity_relationship.deleted. Adds er_about.deleted = FALSE,
er_domain.deleted = FALSE, and ex.deleted = FALSE to avoid deriving task
domains from soft-deleted relationships.
Adds two SQL-shape tests asserting the deleted filters are present in both MySQL
and Postgres variants.
* fix(migration): drop ON CONFLICT target so backfill survives PK shape change
The Postgres `entity_relationship` PK is 3 columns on 1.12.x (fromid, toid,
relation) but 4 columns on 2.x (... + relationtype). Naming a 3-column target
in `ON CONFLICT (fromId, toId, relation)` would fail at parse-time on the 4-col
schema with "no unique or exclusion constraint matching the ON CONFLICT
specification", silently breaking the task-domain backfill on forward upgrades.
Use a bare `ON CONFLICT DO NOTHING` — Postgres applies it to any unique/PK
violation, matching MySQL's `INSERT IGNORE` semantics. The `NOT EXISTS` in the
SELECT still prevents intra-statement duplicates; ON CONFLICT is just the
race-safety net.
Test asserts the bare form to prevent regressions reintroducing a column list.
* fix(migration): also match JSON null domains in thread_entity backfill
The WHERE clause only matched rows where $.domains was SQL NULL (key missing),
which left out tasks where Jackson serialized the unset field as "domains": null
explicitly. JSON_EXTRACT / -> returns JSON null in that case, not SQL NULL, so
"IS NULL" did not match.
Repro: a 1.13 task created before CreateApprovalTaskImpl.withDomains(...) shipped
serializes "domains": null. On upgrade to main, v1129 read 9 thread tasks but
only backfilled 2 — the 7 with explicit JSON null were silently skipped, and the
v200 promotion to task_entity then carried that empty state forward.
Broaden the WHERE on both dialects:
MySQL : JSON_EXTRACT(json,'$.domains') IS NULL
OR JSON_TYPE(JSON_EXTRACT(json,'$.domains')) = 'NULL'
Postgres : json->'domains' IS NULL
OR jsonb_typeof(json->'domains') = 'null'
After backfill, $.domains is written as an array (empty or populated), so neither
clause matches the updated row — no infinite loop. Tests assert both branches of
the OR for each dialect.
* fix(migration): v200 task promotion must resolve inherited domains
queryDomainsForEntity did a raw entity_relationship lookup for
"domain --HAS--> entity" rows. For entities that inherit their domain from a
parent (e.g. glossary terms inheriting from their parent glossary), no direct
HAS row exists — inheritance is computed at read time by the repository layer
(see GlossaryTermRepository.inheritDomains). The raw SQL silently returned an
empty list, so v200 promoted such tasks into task_entity with no domains in the
JSON and no domain HAS task rows in entity_relationship, breaking the activity
feed for domain-scoped users.
Switch resolveDomainsForTaskAbout to load the entity via EntityRepository.get
with FIELD_DOMAINS, then call ei.getDomains(). That path already handles every
entity type's inheritance rule, matching what v1129 does on the thread_entity
side.
Also fix the alreadyExists branch in migrateThreadTasksToTaskEntity. Force-
migrate previously skipped that branch's domain reconciliation entirely, so
tasks already promoted by a pre-fix v200 run would stay broken even after the
fix shipped. Now the alreadyExists path also resolves and inserts domain HAS
rows; insertTaskDomainRelationships swallows duplicate-key on already-present
rows, keeping the call idempotent.
Removes the now-unused queryDomainsForEntity / buildDomainReference helpers.
* perf(migration): cache resolved domains per (entityType, entityId) in v200
When task migration runs against an install with many tasks pointing at a small
number of target entities (the typical pattern — hundreds of tasks per glossary
term), calling EntityRepository.get for each task re-loads the entity and
re-walks its inheritance chain. For 100K tasks across ~100 unique entities,
that is ~100K full-entity loads vs the ~100 actually required.
Add a per-migration HashMap cache keyed by entityType::entityId. The migration
runs single-threaded on startup so a plain HashMap is sufficient. Transient
lookup failures are not cached so a later task can retry the same entity. The
cache lives for the JVM lifetime but only grows during v200.
Empirical cost per task drops from ~5-20ms (cold repo.get) to ~0.1ms (cache
hit) once the working set is loaded.
* fix(migration): bound v200 domain cache and make cached lists unmodifiable
Per gitar review on PR #28013. The static DOMAIN_CACHE was an unbounded
HashMap. Two defensive improvements:
1. Bound the cache via LinkedHashMap with access-order LRU eviction at 10K
entries. A pathological install with millions of unique target entities
(e.g. one task per distinct table) can no longer grow the cache without
limit and OOM the migration step. Each entry is small (~100 bytes), so the
cap costs ~1 MB at saturation while still absorbing the realistic working
set in one pass.
2. Wrap cached lists with Collections.unmodifiableList so that a future
downstream caller mutating the returned list cannot silently corrupt the
cache entry for all subsequent lookups of the same entity.
No synchronization needed; v200 runs single-threaded on startup.
* fix(migration): drop ex.deleted = FALSE from task_entity NOT EXISTS check
Per copilot review on PR #28013. With ex.deleted = FALSE in the NOT EXISTS
subquery, SELECT can yield candidate rows that collide on the PK with an
existing soft-deleted row (deleted is not part of (fromId, toId, relation,
relationType)). INSERT IGNORE silently skips the collision, the affected-row
count drops below BATCH_SIZE, and the while loop terminates early — leaving
later candidates unprocessed.
Tasks are hard-deleted only in this codebase, so a soft-deleted domain HAS
task row is not a state we expect to encounter, but the asymmetry between
the SELECT's NOT EXISTS and the PK's collision behavior is a real correctness
bug for any installation that does have such rows. Drop the inner deleted
filter so NOT EXISTS treats any row (active or soft-deleted) as already
present; the SELECT then only yields genuinely-new candidates, and inserted
count accurately reflects remaining work.
Outer er_about.deleted = FALSE and er_domain.deleted = FALSE filters stay,
since we still don't want to propagate soft-deleted MENTIONED_IN or
soft-deleted domain assignments forward into new HAS rows.
Tests flipped from "ex.deleted = FALSE must be present" to "ex.deleted must
be absent" to pin the new contract.
* fix(migration): use List.copyOf for v200 domain cache entries
Per copilot review on PR #28013. Collections.unmodifiableList wraps the
underlying list but does not snapshot it — if a later read of the same
entity through the repository layer mutates the list backing the cached
reference, the cached value silently changes too.
Switch to List.copyOf which produces an independent immutable snapshot,
so cache entries are genuinely stable for the lifetime of the migration.
- nightly workflow: reformat the Topology comment block (drop the
column-aligned space padding that read as "weird spaces").
- nightly workflow: hoist the stress cohort sizes (simpleReindex
tables/topics/dashboards/pipelines, searchAvailable tables) into
workflow_dispatch inputs with the current values as defaults, so
they're tunable from the Actions UI per run.
- remove openmetadata-integration-tests/REINDEX_TEST_PLAN.md — a
planning/tracking doc that doesn't belong in the repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Register page.waitForEvent('download') at the top of performTestCaseExport,
before any click actions, to eliminate a race condition.
Test case export always takes the async path: /exportAsync returns a jobId,
the server processes it, then fires a WebSocket COMPLETED event which triggers
downloadFile — a programmatic <a download href="blob:..."> click. For an
empty table the job completes almost instantly. When the WebSocket fires
before Playwright has finished configuring blob-download capture via CDP
(which happens asynchronously after waitForEvent is called), Chromium treats
the blob-URL click as a page navigation instead of a download, closing the
page context and throwing:
Error: page.waitForEvent: Target page, context or browser has been closed
Moving the listener to the very top of the function gives Playwright the full
duration of the subsequent awaits (visibility checks, form wait, button state)
to complete its CDP setup — eliminating the race.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(snowflake): log discovered databases and filter-out reasons
SHOW DATABASES only returns databases the ingestion role can see, so a
missing database could mean either a privilege/share gap on the role or
an exclusion by databaseFilterPattern. The logs distinguished neither:
the raw result was never logged, and filtered databases were collected
into the status summary without a live log line.
Log the SHOW DATABASES result (count + names) at INFO, and log each
filtered-out database at INFO with the value matched and whether FQN
filtering was used. This makes "we see fewer databases than expected"
diagnosable from the ingestion logs alone.
* fix(snowflake): %-style logging, narrow filter_name type, list at DEBUG
- Convert f-string log calls to %-style so the formatting cost is paid
only when the level is enabled (no f-string interpolation on suppressed
DEBUG / large-list payloads).
- Narrow filter_name to str: use new_database when useFqnForFiltering is
on but fqn.build() returned None, so filter_by_database (str-typed
parameter) stops getting str | None. Fixes the basedpyright error
surfaced by extracting the inline expression.
- Move the full SHOW DATABASES name list to DEBUG; INFO keeps the count.
Addresses Copilot's note about INFO log volume on accounts with many
databases.
* fix(powerbi): flush sink buffer before lineage resolution
Adds a Barrier sentinel record that, when yielded from a source,
triggers a synchronous flush of MetadataRestSink's bulk buffer.
PowerBI's yield_dashboard_lineage override yields a Barrier before
delegating to super(), so that target-entity lookups via get_by_name
resolve against committed entities instead of returning None for
items still in the sink's bulk buffer.
Effect: intra-workspace and backward-cross-workspace lineage is
captured on the first ingestion run instead of requiring a re-run.
Forward cross-workspace lineage (target in a workspace not yet
scanned in the current run) remains a separate concern.
* chore(powerbi): satisfy ruff + basedpyright on barrier changes
- Use PEP 604 `str | None` in the new Barrier dataclass (UP045).
- Add explicit strict= to zip() in the new test (B905).
- write_barrier returns Either[Entity] to match _flush_buffer.
- Suppress the basedpyright Either(...) reportCallIssue false positive
(same pattern baselined ~1700x) and the rule-less generator-return
variance on the defensive Barrier register.
- ruff format the touched test files.
kubernetes==36.0.0 regressed in-cluster authentication, breaking the
KubernetesSecretsManager in the hybrid runner.
Mechanism:
- Configuration.auth_settings() in v36 looks up the bearer token under
api_key['BearerToken'], but load_incluster_config() / load_kube_config()
still write it to api_key['authorization']. The mismatch means no
Authorization header is sent — the API treats the request as
system:anonymous and returns 403 Forbidden when reading the secret.
The caller surfaces this as "password authentication failed."
Proof it's the client, not env/RBAC:
- curl with the same mounted SA token returns 200.
- kubernetes 35.0.0 works; 36.0.0 doesn't.
Upstream is open and unfixed:
- https://github.com/kubernetes-client/python/issues/2582
- https://github.com/kubernetes-client/python/issues/2584
The previous unbounded `>=21.0.0` pin caused the post-2026-05-19 image
build to pull 36.0.0. Capping to <36 keeps us on the working 35.x line
and guards against further 36.x regressions until upstream ships a
patch — at which point this becomes `!=36.0.0` or a fixed `>=36.x`.
Real bugs:
- UiTestServer: external mode (OM_URL+OM_ADMIN_TOKEN) now honours the
operator token instead of minting a local one the external server
won't trust; no TokenRefresher for the static external token.
- UiSession.uiUrl(): strip the /api REST base before appending UI
paths instead of relying on URI.resolve (fragile for relative paths
/ trailing-slash bases → /api/<route> 404s).
- CpuSampler.percentile(): index off (length-1); floor(p*length)
returned the max for small n, overstating p95.
- OidcEnvBuilder: keep OM's own JWKS in AUTHENTICATION_PUBLIC_KEYS
alongside the mock IdP's — SSO mode still validates OM-minted
internal/bot tokens.
- DataQualityDashboardPage.tryClickDimensionCard: stop swallowing
click/navigation failures as "card absent"; only true absence skips.
- UiSessionExtension: don't save a trace for TestAbortedException
(a skipped assumption is not a failure).
Robustness / cleanup:
- GoogleSsoBootstrapUIIT: build expected authority from
MockOidcServer.NETWORK_ALIAS/PORT instead of a hardcoded :1080.
- EntityLoaderSmokeUIIT: log load duration instead of asserting a
wall-clock bound (flaky on shared runners).
- ReindexHelpers.stopAppAndWait: drop unused stopRequestedAt.
- nightly workflow: dedupe apt package list.
- Javadoc fixes (UiSessionExtension AuthStrategy ref, IncidentManager
seed count 18 -> 20).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Pipeline Alert` called `test.slow()` inside the test body. Combined with
any outer timeout multiplier this produced a 9-minute effective timeout
instead of the expected ~3 minutes. Removed the redundant call.
Entity declarations (`table1`, `table2`, `pipeline`, `domain`) were
module-level `const` constructed at import time. This caused entity
IDs to be frozen before `beforeAll` ran, leading to stale or empty
`fullyQualifiedName` values when observability creation details were
built. Moved declarations to `let` and initialised them inside `beforeAll`
so they always reflect the actual API response.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `showSearch` to the two column `Select` dropdowns in `ParameterForm`
that previously had no search capability:
- Partition column select (tableRowInsertedCountToBeBetween / columnName)
- Generic column select (data.name === 'column')
Fixes#28303
* Add Tests field back
* test(dq): address review feedback on TestSuiteListAfterReindex spec
Trim the list query to the params that drive the exists(tests) filter
and assert the exact 200 status from the async reindexEntities endpoint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(security): upgrade Apache Airflow to 3.2.1 and Flask to 3.1.3 to resolve CVEs
* Fix: Gitar bot comments and failing dependency requirement
* Fix: Failing tests , pycheckstyle and gitarcomment
* Fix: Remove changes not needed after rebasing with main
* Fix: Airflow-api-tests failing due to 'Can't append to data files in parallel mode.'
---------
Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>
Co-authored-by: IceS2 <pablo.takara@getcollate.io>
Test #8 ("should queue concurrent 401s behind a single refresh call") was
passing for the wrong reason. The click on app-bar-item-explore was a
no-op (NavLink to the same route), but the document.body click listener
in AppContainer.tsx was firing analytics.track on every click anywhere —
that PUT to /api/v1/analytics/web/events/collect 401'd, was caught by
the in-app axios interceptor, refreshed, and the test happened to see a
200 on /api/v1/auth/refresh.
PR #28232 removed that global click listener (it was dead-ended analytics
surface area). The hidden trigger is gone, so test #8 has no actual
authenticated request to 401 — waitForResponse for /refresh times out.
Fix: navigate the user away from /explore (back to /my-data) before the
expiry wait, so the subsequent app-bar-item-explore click is a real
route change that fires the page's API calls and 401s through the
in-app refresh path the test name promises.
Co-authored-by: Siddhant <siddhant@MacBook-Pro-751.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(context-center): enable search indexing + vector body text for memories/pages
ContextMemory was indexable in the schema layer but supportsSearch was false,
so live indexing and bulk reindex did not include it. Vector embeddings for
ContextMemory and Page fell through to the default description-only body text
extractor, which produced near-empty embeddings since the actual content lives
in title/question/answer (ContextMemory) and displayName/page payload (Page).
Changes:
- Add ES/OS index mapping for context_memory_search_index across en/ru/zh/jp
- Register contextMemory in indexMapping.json with parentAliases=[all]
- ContextMemoryIndex (TaggableIndex) flattens shareConfig into visibility +
sharedWithIds, normalizes source UUIDs, and populates entity refs with
display names
- Wire SearchIndexFactory.buildIndex() + flip ContextMemoryRepository
supportsSearch=true so create/update/delete fire live indexing
- Flip supportsSearchIndex=true in ContextMemoryIT to inherit BaseEntityIT's
4 search-index tests
- ContextMemoryBodyTextContributor concatenates title/summary/question/answer/
description for the vector embedding instead of just description
- PageBodyTextContributor adds title (displayName) and, for QuickLink pages,
the destination URL alongside the markdown description
- Register both contributors via static initializers in their owning
EntityRepositories, per the VectorBodyTextContributor convention
Tests: 25 new unit tests across ContextMemoryIndexTest (10),
ContextMemoryBodyTextContributorTest (6), PageBodyTextContributorTest (9).
All passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(context-center): address Copilot review feedback on indexing PR
- PageBodyTextContributor: fall back to page.getName() when displayName is
null/blank so vectors always have a title (matches the convention in
SearchIndex.populateCommonFields)
- PageBodyTextContributor: log the exception object (not e.getMessage()) so
the stack trace is available when debug logging is on
- ContextMemoryIndex: null-guard each principal entry in shareConfig.sharedWith
before dereferencing, so a malformed payload cannot NPE the indexer
Added 2 tests covering both behaviors; existing tests adjusted for the new
title-fallback default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two reindex-availability tests read near-identical but distinct
system-property namespaces (jpw.searchAvailable.* for the single-table
test, jpw.searchAvailableAllKinds.* for the all-kinds test). Passing
one test's flags to the other silently no-ops and falls back to
defaults — easy to do, hard to notice, and burns a long run before you
realise scale/workers never applied.
Both tests now log their resolved config at startup (tables/scale,
workers, total, per-kind counts) and warn loudly if they detect the
OTHER test's property prefix on the command line. One glance at the
log now confirms whether your -D flags took effect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(e2e): wait for Resolved incident to be indexed before DQ tab assertion
After posting the Resolved status for the uniqueness test case, the test
immediately navigated to the DQ tab and asserted the incident status
was "Resolved". Because there was no ES indexing wait after the status
transition, the assertion raced against the indexer and saw "New".
Extended `waitForIncidentToBeIndexed` with an optional `expectedStatus`
param so callers can block until a specific resolution status appears in
the API. Used it in the beforeAll hook right after the POST to Resolved.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix checkstyle
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>