* added fix for 26694
* updated code as per Gitar comment
* added E2E test cases
---------
Co-authored-by: Satender <sommy@Satenders-MacBook-Pro.local>
Add the embedding fields (fingerprint, textToEmbed, chunkIndex,
chunkCount, parentId) to all locale variants of the context_memory
mapping and include dataAssetEmbeddings in its parentAliases.
Without fingerprint, OsUtils.addKnnVectorSettings() returned early
and never injected the knn_vector embedding field; without the
alias, vector search at /dataAssetEmbeddings/_search never fanned
out to the context memory index. Both gates are now satisfied so
ContextMemory participates in semantic search alongside tables,
glossary terms, and knowledge pages.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix bot name matching on the Bots page search
* Fixes#26970: migrate Bots search to API-based name/email lookup with robust partial email matching
* Fix flaky Playwright setup by making Table/User creation idempotent and hardening glossary/tag and cleanup flows
* Fixes#26970: avoid full bot scan by switching search result resolution to direct getBotByName lookups
* Fixes#26970: align bot user search with deleted toggle and tighten tour retry timeout handling
* Fixes#26970: keep bot search API-driven, align wildcard matching with name/email expectations, and revert unrelated Playwright changes
* fix: stabilize bot search behavior and flaky Playwright flows across bots, glossary, lineage, and announcements
* fix: limit PR scope to bot search by reverting unrelated Tour Playwright changes
* fix: remove unrelated Playwright changes and keep bot search scope focused
* fix: optimize bot search scalability with paginated user-index retrieval and bounded-concurrency bot resolution
* fix: harden Bots API search with bounded pagination/concurrency and consistent active-search refresh behavior
* fix: prevent stale bot search state and strengthen Bots Playwright coverage with deterministic positive/negative assertions
* test: scope Playwright fixes to bot flow and remove unrelated test changes
* Add local bot search and stabilize tests
* Fixes: keep Bot search API-driven for complete results and stabilize bot cleanup assertions in Playwright
* chore: revert out-of-scope bot Playwright test changes
* test: add bot search e2e coverage and tighten bot API response assertions
* test: stabilize bot search no-match assertions using filter placeholder testid
* fix: add search API wait in bot Playwright flow and refactor bot-user mapping helper
* fix: extract reusable searchbar helper with search API wait and deduplicate bot user mapping logic
* fix(playwright): make bot search test stable by removing brittle API wait
* Refactor bot search integration and Playwright synchronization for reliable name/email query behavior
* Fix bot search regressions by preserving getBotByName compatibility export, stabilizing BotListV1 memoized enrichment, and skipping Playwright search API wait for empty terms
* fix: stabilize bot search Playwright API wait to prevent encoded-query timeout flakes
* } from '../generated/api/teams/createUser';
* fix: eliminate Playwright search response race by pre-registering waiter and matching query-specific GET /search/query responses
* fix: normalize bot search queryFilter format and harden bot-user resolution flow
* refactor code
* fix bot search
* fix checkstyle
* fix display name search
* address gitar
* remove unwanted code
* address gitar and improve performance
* address gitar
* fix bot spec
---------
Co-authored-by: Harsh Vador <58542468+harsh-vador@users.noreply.github.com>
Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>
The util clicked the Delete Property confirm save-button and returned
immediately. Custom property deletion is implemented in CustomPropertiesPageV1
as a JSON-patch on the Type entity (rest/metadataTypeAPI.ts updateType ->
PATCH /api/v1/metadata/types/{id}). While that PATCH was in flight the
ConfirmationModal stayed mounted with its Confirm button in a `loading`
state, leaving an <ant-modal-wrap ant-modal-centered> in the DOM that
intercepted pointer events. Callers like GlossaryImportExport.spec.ts loop
through deleteCreatedProperty + settingClick(GLOSSARY_TERM); on slow AUT
runs the leftover modal mask intercepted the next iteration's click on
[data-testid="app-bar-item-settings"] and the 180s test budget was burned
waiting for the sidebar item to become actionable. PR #27952 addressed the
wrong modal; the trace in nightly run aut/26261444729 shows the visible
dialog is the Delete Property confirm, not the version-history drawer.
Await the PATCH 200 on /metadata/types/ and assert the modal's body-text
has unmounted. ConfirmationModal uses destroyOnClose, so body-text detach
is the cleanest signal that the mask is gone. save-button cannot be used
for the detach assertion because its testid briefly swaps to loading-button
while the PATCH is in flight.
Co-authored-by: Siddhant <siddhant@MacBook-Pro-751.local>
* fix(search): index usageSummary so reindex preserves Explore weekly-usage sort
TableIndex.getRequiredReindexFields() declared "columns" but not
"usageSummary". usageSummary is fields-gated in TableRepository
(clearFields nulls it unless requested), so the reindex path — which
fetches only the declared required fields — dropped it from the table
search document's _source. Explore's "Sort by Weekly Usage" reads
_source.usageSummary.weeklyStats.count, so it silently broke after any
full reindex even though live-served docs looked fine.
Add "usageSummary" to the required reindex field set so the reindexed
document carries it, matching the live entity payload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): update search index on usage report (live path)
Usage is recorded via direct DAO writes in UsageRepository, bypassing
EntityRepository.update — so the entity-lifecycle SearchIndexHandler
never fires, and a reported usage never reached the search document.
The search doc kept a stale/absent usageSummary until the next full
reindex, so Explore "Sort by Weekly Usage" didn't reflect freshly
reported usage live.
After recording usage, push the refreshed entity into the search index
(updateEntity re-fetches with all fields, so usageSummary is included).
Table usage rolls up to its schema + database, so refresh those docs
too. Search failures are logged, not propagated — the usage write is
already committed.
Together with the reindex-fields change this keeps usageSummary present
in _source on both the live and reindex paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): use index required fields for usage reindex, not "*"
The live-path usage search update called SearchRepository.updateEntity,
which re-fetches the entity with getFields("*"). For the rolled-up
database/schema that hydrates every child table — tens of thousands on
large catalogs — risking OOM, the exact over-fetch the reindex-fields
work exists to avoid.
Fetch each affected entity (table, schema, database) with only its
index's required reindex fields via ReindexingUtil.getSearchIndexFields,
then updateEntityIndex(entity) directly. Mirrors the reindex pipeline's
field selection and keeps the rollup cheap and bounded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(search): bound updateEntity field fetch to index required fields
SearchRepository.updateEntity(EntityReference) re-fetched the entity with
getFields("*") before re-indexing. This runs on every live entity update
(SearchIndexHandler.onEntityUpdated) and tag-propagation refresh, for all
entity types. For container entities (database/schema) "*" hydrates every
child — tens of thousands of tables on large catalogs — so a single live
update could OOM the server. That's a far bigger blast radius than the
usage path alone.
Fetch only the fields the entity's search index declares as required
(searchIndexFactory.getReindexFieldsFor) — the same set the reindex
pipeline uses — so live updates and reindex stay consistent and bounded.
Reverts the per-call workaround in UsageRepository.updateUsageInSearch
back to plain updateEntity calls, now that updateEntity itself is bounded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(usage): switch expression + drop schema/database live refresh
Address PR #28350 review:
- addUsage: convert the break/mutable-variable switch to a Java 21
switch expression (no mutable response, no break boilerplate).
- updateUsageInSearch: only refresh the reported entity's search doc.
Dropped the cascade to the rolled-up schema + database — usage
reporting can be high-volume and the table doc is the surface that
matters ("Sort by Weekly Usage"); schema/database usageSummary
reconciles on the next reindex. This also removes the redundant
second table fetch and the unguarded schema/database refs the bots
flagged. The single updateEntity call is bounded (required fields).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Use safe list
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(alerts): guard isBot() against deleted users to avoid aborting batches
AlertsRuleEvaluator.isBot() called Entity.getEntityByName with
Include.NON_DELETED without catching EntityNotFoundException. When a
change event's userName referenced a user that had been deleted (common
for short-lived test fixtures torn down by afterAll), the exception
escaped the SpEL filter and was caught in AbstractEventConsumer.execute
as "Error in polling events for alert : Entity not found: user ...".
That outer catch logs the error and lets the finally block advance the
offset by batchSize, silently dropping every other event in the batch.
The ActivityFeedAlert subscription was hit hardest because every event
flows through its isBot() filter rule.
Catch EntityNotFoundException and treat an unresolvable actor as not-a-bot.
* fix(alerts): null-safe getIsBot() + IT + unblock ActivityAPI spec
- AlertsRuleEvaluator.isBot(): use Boolean.TRUE.equals(user.getIsBot())
so a null isBot field on the resolved user doesn't NPE on auto-unbox.
- Add AlertsRuleEvaluatorResourceIT#test_isBot_returnsFalseWhenActorUserDeleted
covering the original bug — create a user, evaluate isBot() (false),
delete the user, evaluate isBot() again and assert it still returns
false instead of throwing.
- Remove test.describe.fixme from ActivityAPI.spec.ts so the spec runs
again now that the underlying batch-abort is fixed.
---------
Co-authored-by: Siddhant <siddhant@MacBook-Pro-751.local>
POST/PUT-create with column.extension inlined in the request body never
wrote a row to entity_extension — the data only landed in the table JSON,
while every reader (and the PATCH original reload) looks in entity_extension
via getColumnExtension(). The override clobbered the inline data with null
on every read, leaving the Custom Properties tab blank and causing any
JsonPatch op walking /columns/N/extension/... to fail with "no mapping for
the name 'extension'" on the reloaded original.
One shared single-column writer for both create and update paths:
- Add EntityRepository.storeColumnExtension(UUID, Column) — the upsert that
used to live inside EntityUpdater.updateColumnExtension, lifted out so the
create paths can reach it.
- Add EntityRepository.storeColumnExtensions(UUID, List<Column>) — single
entity walker that flattens via EntityUtil.getFlattenedEntityField and
calls the primitive per column.
- Add EntityRepository.getColumnsForExtensionPersistence(T) hook overridden
in TableRepository and DashboardDataModelRepository to return getColumns().
- Wire into createNewEntityFlush alongside storeExtension.
Update path simplifications:
- updateColumns now uses the shared storeColumnExtension primitive.
- updateColumnExtension (the EntityUpdater-private duplicate writer) is
deleted.
- Existing-column branch skips the upsert when stored.extension equals
updated.extension — previously every update pass rewrote identical JSON
for every column with a non-null extension.
- Added-columns branch now persists extensions via the walker. updateColumns'
main loop skips columns with no stored match, so inline column.extension
on a freshly-added column was silently dropped on PUT-update too.
Regression coverage in ColumnCustomPropertiesIT covers both tableColumn
and dashboardDataModelColumn inline-on-create persistence, plus the
PUT-update-adds-column path.
* feat(tasks): policy-driven authorization with self-approval guard
Moves Task resolve/close/reassign authorization from ~150 lines of custom
Java in TaskRepository into the policy engine. Adds ResolveTask, CloseTask,
ReassignTask MetadataOperation values, isTaskFiler/isTaskAssignee/isTaskReviewer
SpEL conditions, and a new TaskAuthorPolicy seed. Closes the self-approval
gap where a filer who was also in the assignees list could approve their
own task (now denied via deny rule). TaskResourceContext.getOwners now
returns target entity owners so isOwner() retains its conventional meaning;
v200 migration backfills the new policy attachment on the DataConsumer role
for upgrades.
* MCP Tool Usage
* Update generated TypeScript types
* Address PR review feedback on MCP usage tracking
Reorder UA heuristic so VS Code wins over Claude CLI for composite
User-Agents, refactor to a predicate list, and sanitise the resolved
client name (trim, strip control chars, cap at 64 chars). Bound the
schema field to match.
Bound the latency aggregation lists in McpUsageResource with reservoir
sampling so summary/per-tool percentile estimates stay valid without
unbounded heap growth. Skip null-timestamp rows in the history loop and
update the stale /history Swagger description to reflect the ok/fail
shape. Convert CallToolOutcome to a Java record and update the recorder
flow to use accessor methods.
Fix the pre-existing regression in McpImpersonationTest where the mock
still wired the legacy callTool path. Add DefaultToolContextTest with
direct coverage for classifyException (all four ErrorCategory buckets,
cause-chain walk, null message in chain) and the unknown-tool outcome.
* fix(reindex): batch-prefetch upstream lineage off doc-build threads
Reindex doc-build executor runs 50 virtual workers each calling
`getLineageData -> findFrom` per entity, holding a Hikari connection
for the duration. On JDK 21 + HikariCP 7's synchronized borrow path,
those virtual threads get pinned to carriers and stall for >60s, which
fires the connection leak detector and freezes the reindex.
Client patches with paths missing the leading '/' (e.g., "displayName"
instead of "/displayName") triggered jakarta.json.JsonException from
JsonPointerImpl, which fell through the exception mapper and surfaced
as an unhandled 500 (and Sentry alert) on PATCH endpoints such as
ClassificationResource.
- JsonUtils.applyPatch now validates each operation's 'path' and 'from'
upfront, throwing IllegalArgumentException with a clear RFC 6901
message before the cryptic library exception fires.
- CatalogGenericExceptionMapper maps jakarta.json.JsonException to 400
as defense in depth, covering other RFC 6902 violations (e.g.,
out-of-range array index, replace on missing path) that were also
returning 500.
- Added JsonUtilsTest cases for malformed 'path' and 'from' pointers.
* Fixes#28245: ingest valueless Databricks/Unity Catalog tags
Databricks/Unity Catalog exposes system-generated (and some user-defined)
tags as (tag_name, tag_value=null). The connectors mapped tag_name ->
Classification and tag_value -> Tag, so an empty tag_value was either
skipped (Unity Catalog) or coerced to a "NONE" sentinel (Databricks).
When tag_value is empty, fall back to a dedicated per-connector
classification (DATABRICKS_TAGS / UNITY_CATALOG_TAGS) and use tag_name
verbatim as the tag under it (no dot-splitting). Valued tags are
unchanged: classification = tag_name, tag = tag_value.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address review: harden valueless-tag mapping
- Treat whitespace-only tag_value as valueless (strip-based check) so it
falls back to the *_TAGS classification instead of being silently
dropped downstream by get_ometa_tag_and_classification.
- Skip rows with empty/None tag_name in the Databricks connector, for
parity with Unity Catalog, so an empty classification name is never
sent to the API.
- Add tests for whitespace-only tag_value (both connectors) and the
empty tag_name skip (Databricks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(migration): backfill task domains in v1.12.8 for domain-scoped activity feed visibility
Tasks created via approval workflows lacked domains, causing domain-scoped users
to miss them in the activity feed. This migration backfills domains on both
thread_entity (1.12.x) and task_entity (2.x) tables.
- thread_entity: reads tasks where JSON_EXTRACT(json,'$.domains') IS NULL in
batches, resolves domains from the linked entity, and sets $.domains=[] or
the real domain list. Uses json->'domains' IS NULL on Postgres.
- task_entity: inserts HAS-relationship rows from existing MENTIONED_IN rows
via INSERT IGNORE / ON CONFLICT DO NOTHING.
- Failure is non-fatal: logs an error and skips rather than blocking startup.
* fix(migration): correlate NOT EXISTS on fromId to handle multi-domain tasks
* refactor(migration): fix transient-failure caching and extract SQL builder methods
* fix(migration): properly distinguish lookup failures from no-domains; fix counter and skip-on-failure
* feat(migration): move task domain backfill migration from 1.12.8 to 1.12.9
* fix(migration): mark unresolvable thread rows as migrated to prevent infinite loop
Previously, any thread row whose target entity could not be resolved
(malformed JSON, EntityLink parse failure, or non-EntityNotFound errors)
returned null from resolveThreadTaskDomains and was skipped without
updating. Because the batch read uses WHERE \$.domains IS NULL ORDER BY
createdAt LIMIT 500 with no offset, the same failing rows would be
re-fetched in every subsequent batch — once all unaffected rows had
been processed, the loop would spin forever on the remaining failures.
Now any unrecoverable failure marks the row with \$.domains = [] (same
as the legitimate no-domains case), logs a WARN, and increments a
markedDoneOnError counter surfaced at the end of the run. A stall
detector also breaks the loop if a non-empty batch produces zero
updates, guarding against the residual case where the mark UPDATE
itself keeps failing.
Also adds a class-level reindex note: the migration only writes to DB,
so the tasks search index must be rebuilt post-upgrade for the activity
feed (which queries Elasticsearch/OpenSearch) to reflect the backfilled
domains.
* fix(migration): address PR review — non-fatal policy migrations + deleted-row filter
Two issues called out in PR #28013 review:
1. Policy migrations could block server startup. addTriggerOperationToDefaultBotPolicies
handled errors internally, but addTriggerRuleToDataStewardPolicy only catches
EntityNotFoundException — a DB error from policyDAO.update() or a
JsonProcessingException from JsonUtils.pojoToJson() would propagate through
@SneakyThrows and fail the migration step. Wrap the policy calls in their own
try-catch (matching the task domain migration's non-fatal pattern) and drop
@SneakyThrows.
2. INSERT...SELECT for task_entity domain backfill did not filter on
entity_relationship.deleted. Adds er_about.deleted = FALSE,
er_domain.deleted = FALSE, and ex.deleted = FALSE to avoid deriving task
domains from soft-deleted relationships.
Adds two SQL-shape tests asserting the deleted filters are present in both MySQL
and Postgres variants.
* fix(migration): drop ON CONFLICT target so backfill survives PK shape change
The Postgres `entity_relationship` PK is 3 columns on 1.12.x (fromid, toid,
relation) but 4 columns on 2.x (... + relationtype). Naming a 3-column target
in `ON CONFLICT (fromId, toId, relation)` would fail at parse-time on the 4-col
schema with "no unique or exclusion constraint matching the ON CONFLICT
specification", silently breaking the task-domain backfill on forward upgrades.
Use a bare `ON CONFLICT DO NOTHING` — Postgres applies it to any unique/PK
violation, matching MySQL's `INSERT IGNORE` semantics. The `NOT EXISTS` in the
SELECT still prevents intra-statement duplicates; ON CONFLICT is just the
race-safety net.
Test asserts the bare form to prevent regressions reintroducing a column list.
* fix(migration): also match JSON null domains in thread_entity backfill
The WHERE clause only matched rows where $.domains was SQL NULL (key missing),
which left out tasks where Jackson serialized the unset field as "domains": null
explicitly. JSON_EXTRACT / -> returns JSON null in that case, not SQL NULL, so
"IS NULL" did not match.
Repro: a 1.13 task created before CreateApprovalTaskImpl.withDomains(...) shipped
serializes "domains": null. On upgrade to main, v1129 read 9 thread tasks but
only backfilled 2 — the 7 with explicit JSON null were silently skipped, and the
v200 promotion to task_entity then carried that empty state forward.
Broaden the WHERE on both dialects:
MySQL : JSON_EXTRACT(json,'$.domains') IS NULL
OR JSON_TYPE(JSON_EXTRACT(json,'$.domains')) = 'NULL'
Postgres : json->'domains' IS NULL
OR jsonb_typeof(json->'domains') = 'null'
After backfill, $.domains is written as an array (empty or populated), so neither
clause matches the updated row — no infinite loop. Tests assert both branches of
the OR for each dialect.
* fix(migration): v200 task promotion must resolve inherited domains
queryDomainsForEntity did a raw entity_relationship lookup for
"domain --HAS--> entity" rows. For entities that inherit their domain from a
parent (e.g. glossary terms inheriting from their parent glossary), no direct
HAS row exists — inheritance is computed at read time by the repository layer
(see GlossaryTermRepository.inheritDomains). The raw SQL silently returned an
empty list, so v200 promoted such tasks into task_entity with no domains in the
JSON and no domain HAS task rows in entity_relationship, breaking the activity
feed for domain-scoped users.
Switch resolveDomainsForTaskAbout to load the entity via EntityRepository.get
with FIELD_DOMAINS, then call ei.getDomains(). That path already handles every
entity type's inheritance rule, matching what v1129 does on the thread_entity
side.
Also fix the alreadyExists branch in migrateThreadTasksToTaskEntity. Force-
migrate previously skipped that branch's domain reconciliation entirely, so
tasks already promoted by a pre-fix v200 run would stay broken even after the
fix shipped. Now the alreadyExists path also resolves and inserts domain HAS
rows; insertTaskDomainRelationships swallows duplicate-key on already-present
rows, keeping the call idempotent.
Removes the now-unused queryDomainsForEntity / buildDomainReference helpers.
* perf(migration): cache resolved domains per (entityType, entityId) in v200
When task migration runs against an install with many tasks pointing at a small
number of target entities (the typical pattern — hundreds of tasks per glossary
term), calling EntityRepository.get for each task re-loads the entity and
re-walks its inheritance chain. For 100K tasks across ~100 unique entities,
that is ~100K full-entity loads vs the ~100 actually required.
Add a per-migration HashMap cache keyed by entityType::entityId. The migration
runs single-threaded on startup so a plain HashMap is sufficient. Transient
lookup failures are not cached so a later task can retry the same entity. The
cache lives for the JVM lifetime but only grows during v200.
Empirical cost per task drops from ~5-20ms (cold repo.get) to ~0.1ms (cache
hit) once the working set is loaded.
* fix(migration): bound v200 domain cache and make cached lists unmodifiable
Per gitar review on PR #28013. The static DOMAIN_CACHE was an unbounded
HashMap. Two defensive improvements:
1. Bound the cache via LinkedHashMap with access-order LRU eviction at 10K
entries. A pathological install with millions of unique target entities
(e.g. one task per distinct table) can no longer grow the cache without
limit and OOM the migration step. Each entry is small (~100 bytes), so the
cap costs ~1 MB at saturation while still absorbing the realistic working
set in one pass.
2. Wrap cached lists with Collections.unmodifiableList so that a future
downstream caller mutating the returned list cannot silently corrupt the
cache entry for all subsequent lookups of the same entity.
No synchronization needed; v200 runs single-threaded on startup.
* fix(migration): drop ex.deleted = FALSE from task_entity NOT EXISTS check
Per copilot review on PR #28013. With ex.deleted = FALSE in the NOT EXISTS
subquery, SELECT can yield candidate rows that collide on the PK with an
existing soft-deleted row (deleted is not part of (fromId, toId, relation,
relationType)). INSERT IGNORE silently skips the collision, the affected-row
count drops below BATCH_SIZE, and the while loop terminates early — leaving
later candidates unprocessed.
Tasks are hard-deleted only in this codebase, so a soft-deleted domain HAS
task row is not a state we expect to encounter, but the asymmetry between
the SELECT's NOT EXISTS and the PK's collision behavior is a real correctness
bug for any installation that does have such rows. Drop the inner deleted
filter so NOT EXISTS treats any row (active or soft-deleted) as already
present; the SELECT then only yields genuinely-new candidates, and inserted
count accurately reflects remaining work.
Outer er_about.deleted = FALSE and er_domain.deleted = FALSE filters stay,
since we still don't want to propagate soft-deleted MENTIONED_IN or
soft-deleted domain assignments forward into new HAS rows.
Tests flipped from "ex.deleted = FALSE must be present" to "ex.deleted must
be absent" to pin the new contract.
* fix(migration): use List.copyOf for v200 domain cache entries
Per copilot review on PR #28013. Collections.unmodifiableList wraps the
underlying list but does not snapshot it — if a later read of the same
entity through the repository layer mutates the list backing the cached
reference, the cached value silently changes too.
Switch to List.copyOf which produces an independent immutable snapshot,
so cache entries are genuinely stable for the lifetime of the migration.
Register page.waitForEvent('download') at the top of performTestCaseExport,
before any click actions, to eliminate a race condition.
Test case export always takes the async path: /exportAsync returns a jobId,
the server processes it, then fires a WebSocket COMPLETED event which triggers
downloadFile — a programmatic <a download href="blob:..."> click. For an
empty table the job completes almost instantly. When the WebSocket fires
before Playwright has finished configuring blob-download capture via CDP
(which happens asynchronously after waitForEvent is called), Chromium treats
the blob-URL click as a page navigation instead of a download, closing the
page context and throwing:
Error: page.waitForEvent: Target page, context or browser has been closed
Moving the listener to the very top of the function gives Playwright the full
duration of the subsequent awaits (visibility checks, form wait, button state)
to complete its CDP setup — eliminating the race.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(snowflake): log discovered databases and filter-out reasons
SHOW DATABASES only returns databases the ingestion role can see, so a
missing database could mean either a privilege/share gap on the role or
an exclusion by databaseFilterPattern. The logs distinguished neither:
the raw result was never logged, and filtered databases were collected
into the status summary without a live log line.
Log the SHOW DATABASES result (count + names) at INFO, and log each
filtered-out database at INFO with the value matched and whether FQN
filtering was used. This makes "we see fewer databases than expected"
diagnosable from the ingestion logs alone.
* fix(snowflake): %-style logging, narrow filter_name type, list at DEBUG
- Convert f-string log calls to %-style so the formatting cost is paid
only when the level is enabled (no f-string interpolation on suppressed
DEBUG / large-list payloads).
- Narrow filter_name to str: use new_database when useFqnForFiltering is
on but fqn.build() returned None, so filter_by_database (str-typed
parameter) stops getting str | None. Fixes the basedpyright error
surfaced by extracting the inline expression.
- Move the full SHOW DATABASES name list to DEBUG; INFO keeps the count.
Addresses Copilot's note about INFO log volume on accounts with many
databases.
* fix(powerbi): flush sink buffer before lineage resolution
Adds a Barrier sentinel record that, when yielded from a source,
triggers a synchronous flush of MetadataRestSink's bulk buffer.
PowerBI's yield_dashboard_lineage override yields a Barrier before
delegating to super(), so that target-entity lookups via get_by_name
resolve against committed entities instead of returning None for
items still in the sink's bulk buffer.
Effect: intra-workspace and backward-cross-workspace lineage is
captured on the first ingestion run instead of requiring a re-run.
Forward cross-workspace lineage (target in a workspace not yet
scanned in the current run) remains a separate concern.
* chore(powerbi): satisfy ruff + basedpyright on barrier changes
- Use PEP 604 `str | None` in the new Barrier dataclass (UP045).
- Add explicit strict= to zip() in the new test (B905).
- write_barrier returns Either[Entity] to match _flush_buffer.
- Suppress the basedpyright Either(...) reportCallIssue false positive
(same pattern baselined ~1700x) and the rule-less generator-return
variance on the defensive Barrier register.
- ruff format the touched test files.
kubernetes==36.0.0 regressed in-cluster authentication, breaking the
KubernetesSecretsManager in the hybrid runner.
Mechanism:
- Configuration.auth_settings() in v36 looks up the bearer token under
api_key['BearerToken'], but load_incluster_config() / load_kube_config()
still write it to api_key['authorization']. The mismatch means no
Authorization header is sent — the API treats the request as
system:anonymous and returns 403 Forbidden when reading the secret.
The caller surfaces this as "password authentication failed."
Proof it's the client, not env/RBAC:
- curl with the same mounted SA token returns 200.
- kubernetes 35.0.0 works; 36.0.0 doesn't.
Upstream is open and unfixed:
- https://github.com/kubernetes-client/python/issues/2582
- https://github.com/kubernetes-client/python/issues/2584
The previous unbounded `>=21.0.0` pin caused the post-2026-05-19 image
build to pull 36.0.0. Capping to <36 keeps us on the working 35.x line
and guards against further 36.x regressions until upstream ships a
patch — at which point this becomes `!=36.0.0` or a fixed `>=36.x`.
`Pipeline Alert` called `test.slow()` inside the test body. Combined with
any outer timeout multiplier this produced a 9-minute effective timeout
instead of the expected ~3 minutes. Removed the redundant call.
Entity declarations (`table1`, `table2`, `pipeline`, `domain`) were
module-level `const` constructed at import time. This caused entity
IDs to be frozen before `beforeAll` ran, leading to stale or empty
`fullyQualifiedName` values when observability creation details were
built. Moved declarations to `let` and initialised them inside `beforeAll`
so they always reflect the actual API response.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `showSearch` to the two column `Select` dropdowns in `ParameterForm`
that previously had no search capability:
- Partition column select (tableRowInsertedCountToBeBetween / columnName)
- Generic column select (data.name === 'column')
Fixes#28303
* Add Tests field back
* test(dq): address review feedback on TestSuiteListAfterReindex spec
Trim the list query to the params that drive the exists(tests) filter
and assert the exact 200 status from the async reindexEntities endpoint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(security): upgrade Apache Airflow to 3.2.1 and Flask to 3.1.3 to resolve CVEs
* Fix: Gitar bot comments and failing dependency requirement
* Fix: Failing tests , pycheckstyle and gitarcomment
* Fix: Remove changes not needed after rebasing with main
* Fix: Airflow-api-tests failing due to 'Can't append to data files in parallel mode.'
---------
Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>
Co-authored-by: IceS2 <pablo.takara@getcollate.io>
Test #8 ("should queue concurrent 401s behind a single refresh call") was
passing for the wrong reason. The click on app-bar-item-explore was a
no-op (NavLink to the same route), but the document.body click listener
in AppContainer.tsx was firing analytics.track on every click anywhere —
that PUT to /api/v1/analytics/web/events/collect 401'd, was caught by
the in-app axios interceptor, refreshed, and the test happened to see a
200 on /api/v1/auth/refresh.
PR #28232 removed that global click listener (it was dead-ended analytics
surface area). The hidden trigger is gone, so test #8 has no actual
authenticated request to 401 — waitForResponse for /refresh times out.
Fix: navigate the user away from /explore (back to /my-data) before the
expiry wait, so the subsequent app-bar-item-explore click is a real
route change that fires the page's API calls and 401s through the
in-app refresh path the test name promises.
Co-authored-by: Siddhant <siddhant@MacBook-Pro-751.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(context-center): enable search indexing + vector body text for memories/pages
ContextMemory was indexable in the schema layer but supportsSearch was false,
so live indexing and bulk reindex did not include it. Vector embeddings for
ContextMemory and Page fell through to the default description-only body text
extractor, which produced near-empty embeddings since the actual content lives
in title/question/answer (ContextMemory) and displayName/page payload (Page).
Changes:
- Add ES/OS index mapping for context_memory_search_index across en/ru/zh/jp
- Register contextMemory in indexMapping.json with parentAliases=[all]
- ContextMemoryIndex (TaggableIndex) flattens shareConfig into visibility +
sharedWithIds, normalizes source UUIDs, and populates entity refs with
display names
- Wire SearchIndexFactory.buildIndex() + flip ContextMemoryRepository
supportsSearch=true so create/update/delete fire live indexing
- Flip supportsSearchIndex=true in ContextMemoryIT to inherit BaseEntityIT's
4 search-index tests
- ContextMemoryBodyTextContributor concatenates title/summary/question/answer/
description for the vector embedding instead of just description
- PageBodyTextContributor adds title (displayName) and, for QuickLink pages,
the destination URL alongside the markdown description
- Register both contributors via static initializers in their owning
EntityRepositories, per the VectorBodyTextContributor convention
Tests: 25 new unit tests across ContextMemoryIndexTest (10),
ContextMemoryBodyTextContributorTest (6), PageBodyTextContributorTest (9).
All passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(context-center): address Copilot review feedback on indexing PR
- PageBodyTextContributor: fall back to page.getName() when displayName is
null/blank so vectors always have a title (matches the convention in
SearchIndex.populateCommonFields)
- PageBodyTextContributor: log the exception object (not e.getMessage()) so
the stack trace is available when debug logging is on
- ContextMemoryIndex: null-guard each principal entry in shareConfig.sharedWith
before dereferencing, so a malformed payload cannot NPE the indexer
Added 2 tests covering both behaviors; existing tests adjusted for the new
title-fallback default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(e2e): wait for Resolved incident to be indexed before DQ tab assertion
After posting the Resolved status for the uniqueness test case, the test
immediately navigated to the DQ tab and asserted the incident status
was "Resolved". Because there was no ES indexing wait after the status
transition, the assertion raced against the indexer and saw "New".
Extended `waitForIncidentToBeIndexed` with an optional `expectedStatus`
param so callers can block until a specific resolution status appears in
the API. Used it in the beforeAll hook right after the POST to Resolved.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix checkstyle
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Change date format from 'dMMM, yy' to 'd MMM, yy' so axis ticks
render as '30 Apr, 26' instead of '30Apr, 26'.
Fixesopen-metadata/openmetadata-collate#4184
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(ui-core): add xs button size and link/anchor stories
- Add `xs` size variant to Button with `text-xs`, `px-2 py-1`, `gap-0.5`, `rounded-md` tokens
- Update Sizes and IconOnly stories to include xs
- Add LinkColorWithTrailingIcon story (link-color + trailing icon)
- Add AsLink story covering primary/secondary/tertiary buttons rendered as anchor tags, with icon, and disabled state
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(ui-core): fix xs button icon sizing and add xs size stories
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style(ui-core): format AsLink story props for readability
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* perf(glossary): batch related-term hydration via new /glossaryTerms/byIds endpoint
Sentry on release-1-13 flagged an N+1 on the Glossary Term Relations Graph
tab (transaction /glossary/.../relations_graph, p95 ~1.4s) — eight
sequential GET /api/v1/glossaryTerms/{id}?fields=relatedTerms,children,
parent,owners calls at ~180ms each. Two recursive resolution loops in
useOntologyExplorer.ts (loadNextTermPage:658-683 and fetchGraphDataFromDatabase:822-847)
fan out per-Id getGlossaryTermsById calls to hydrate cross-glossary
related terms after the initial paginated load, recursing up to 5 levels
deep. The customer hit a depth-1 cascade that produced ~8+ HTTP round
trips for a single page visit.
Adds GET /v1/glossaryTerms/byIds?ids=u1,u2,...&fields=... that returns a
single hydrated List<GlossaryTerm>, capped at 200 ids per request to
stay well under URL length limits and to isolate a single bad Id to one
batch. Missing/deleted/unauthorized Ids are silently dropped, matching
the old Promise.allSettled semantics so callers don't need to change
their error handling. Both resolution loops now call the batch endpoint
once per BATCH_SIZE (100) chunk instead of fanning out per-Id; depth-1
goes from 8 round trips to 1.
Tests: backend IT covers happy-path, fields hydration, silent-skip of
missing ids, and empty-input semantics. Playwright spec opens the
Relations Graph tab on a term with a cross-glossary relation and
asserts zero per-Id /glossaryTerms/{id}?fields=relatedTerms... requests
fire — failing if anyone re-introduces the resolution N+1. The new
batch endpoint is asserted to be called at least once, evidencing the
new path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(glossary): address review feedback on byIds endpoint and N+1 fix
Reviewers (gitar-bot + Copilot) raised 7 findings across the backend, the
client hook, and the Playwright spec. Addressed all:
Backend (GlossaryTermResource.byIds):
- Catch AuthorizationException alongside EntityNotFoundException so a
single unauthorized id can't 403 the whole batch — matches the
documented "silently omit missing/unauthorized" contract.
- Switched the OpenAPI response schema from a single GlossaryTerm to
@ArraySchema so generated clients see the correct array shape.
- Dropped `required = true` on the `ids` query param: the implementation
tolerates blank/missing (returns []) and the IT pins that behavior, so
the spec was lying. Description now states the contract explicitly.
Backend tests:
- Added the two negative tests the PR description claimed but the file
was missing: malformed UUID -> 400 and >200 ids -> 400.
Frontend (useOntologyExplorer resolution loops):
- On a whole-batch failure, set `aborted = true`, break out of the
current chunk loop, and clear missingIds before the next pass. Before
this, the same failing batch was silently retried up to
MAX_RESOLUTION_DEPTH - 1 more times for no benefit.
Playwright (GlossaryRelationsGraphPerf):
- Authenticate the new page via AdminClass.login(page) before the
request listener attaches; previously `browser.newPage()` + a
separate `performAdminLogin(browser)` left the test page unauth'd, so
the request listener never saw the API calls and the spec hung on
the wait-for-response timeout.
- Fixed the `relatedTerms` JSON-Patch shape (it stores TermRelation
objects, not bare EntityReferences). With the old shape the relation
never landed, the resolution loop never fired, byIds was never
called, and the spec hit a wait-for-response timeout (the 1.2-minute
retries observed in CI).
- Replaced the dual `/rdf/glossary/graph` OR `/glossaryTerms/byIds`
signal with byIds-only: for `scope === 'term'` the rdf graph endpoint
isn't called even when rdfEnabled, so listening for it just added
flake. Bumped the timeout to 60s for cold-CI runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(glossary): IT failures on byIds — wrong JSON path + URL header limit
Two integration tests added in the previous commit failed in CI:
1) getGlossaryTermsByIds_honorsFieldsParam_hydratesRelatedTerms
The assertion read `relatedTerms.get(0).path("id")` but glossary term
relatedTerms is a List<TermRelation> serialized as
{relationType, term: {id, ...}} — the id is nested under `term`, not
at the top of the array element. Fixed to `.path("term").path("id")`.
2) getGlossaryTermsByIds_tooManyIds_returns400
Sending 201 UUIDs in a query string puts the URL at ~7.5 KB which
trips Jetty's 8 KB request-header limit; the request was rejected
with 431 Request Header Fields Too Large before reaching the
server-side cap check, so the test never saw the documented 400.
Two-part fix:
- Lower MAX_BATCH_BY_IDS from 200 to 100. 100 * 37 chars per UUID +
separators is ~3.7 KB, well below 8 KB. This also matches the
client's BATCH_SIZE in useOntologyExplorer.ts (so the client can
now use the whole window without hitting the cap defensively).
- Test uses 101 ids (still tiny URL) so the cap check actually
fires and returns the documented 400.
Updated Javadoc and the client-side BATCH_SIZE comment to reflect
the new alignment.
The python failure on test_validations_datalake.py reproduces across
3.10/3.11/3.12 in the same parameterized case and is unrelated to the
glossary changes in this PR — pre-existing on main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(glossary): address two new review comments on byIds
1) glossaryAPI.ts comment said the batch endpoint supports up to 200 ids
but the backend cap was lowered to 100 in the previous commit. Updated
the comment to match (and link the rationale — 100 keeps the URL
under Jetty's 8 KB header limit).
2) The two 400-response IT tests asserted `contains("400") || contains
(<substring>)`, which would let a 500 with "invalid" or "too many"
in the response body silently pass. Tightened both to require BOTH
the HTTP 400 status AND the expected message substring.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(glossary): assert HTTP 400 by exception type, not message substring
The previous tightening required `e.getMessage().contains("400")`, but
the SDK's OpenMetadataHttpClient.handleErrorResponse puts ONLY the
error body in the exception message (no "HTTP 400" prefix in the parsed
path), so the assertion failed on real 400 responses with bodies like
"ids parameter contains an invalid UUID" / "Too many ids: 101 (max 100)".
Use assertThrows(InvalidRequestException.class, ...) instead — the SDK
throws InvalidRequestException ONLY for HTTP 400 (other statuses surface
as ApiException or status-specific subclasses like ForbiddenException),
so the type assertion locks the status code as strongly as a body
substring check would. Substring check stays for the body content.
Removes the no-longer-used `Assertions.fail` static import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(glossary): address four review comments on byIds
Backend (GlossaryTermResource.java):
- Add includeRelations query param for parity with GET /{id} so callers
can pass per-relation include controls (owners:non-deleted, etc.)
through the batch path. Forwards to the existing 6-arg getInternal.
- Add a catch (RuntimeException) per-id so unexpected failures
(validation, downstream 5xx surfaced as WebApplicationException, etc.)
don't fail the whole batch. EntityNotFoundException /
AuthorizationException stay on debug; the broader catch logs at warn
so a real bug isn't silently swallowed.
Frontend (useOntologyExplorer.ts):
- Extract the duplicated related-term resolution loop into a top-level
`resolveRelatedTerms(terms)` helper. Both call sites
(loadNextTermPage, fetchGraphDataFromDatabase) now do
`await resolveRelatedTerms(...)`, ~100 fewer LoC and no risk of the
two implementations drifting.
- Change the failure semantics from "abort the whole resolution on
first batch failure" to "remember the failed Ids in a skip set, keep
going". This restores best-effort hydration (matching the old
Promise.allSettled behavior on the client) without falling back into
the gitar-bot-flagged retry-the-same-batch-MAX_DEPTH-times footgun:
the skip set causes collectMissingRelatedTermIds to never hand the
same Ids back on subsequent depth passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(glossary): pin cross-glossary relation surfacing in graph + list paths
User reported the glossary graph endpoint returning partial data (focused
term + a sibling, both marked glossaryTermIsolated, edges = []) when the
focused term has cross-glossary related terms. Adds two ITs that
exercise the exact scenarios.
1) GlossaryTermResourceIT#listGlossaryTerms_hydratesCrossGlossaryRelatedTerms
Hits GET /v1/glossaryTerms?glossary=<A>&fields=relatedTerms with a
focused term in glossary A pointing at a related term in glossary B.
Asserts BOTH the single-entity GET and the bulk-list endpoint return
the cross-glossary relation — pins the bulk hydration path
(GlossaryTermRepository.setFieldsInBulk -> fetchAndSetRelatedTerms)
against the single-entity path (setFields -> getRelatedTerms) as
producing the same shape.
2) RdfGlossaryGraphIT#crossGlossaryRelationSurfacesAsEdgeAndNodeInScopedGraph
Creates focused (in glossary A) + relatedAcross (in glossary B),
adds the relation, hits GET /v1/rdf/glossary/graph?glossaryId=<A>,
asserts the response contains: focused NOT marked
glossaryTermIsolated, relatedAcross as a secondary node, edge
between them.
Both tests pass against current main on a clean DB, which means the
user's reported failure mode does not reproduce from a freshly seeded
fixture. Likely causes:
- Stale RDF data in the user's deployment (Fuseki has the terms but
is missing the relation triples; SPARQL returns nodes but 0 edges
and the nodes.isEmpty() fallback in RdfRepository.java:1618
doesn't fire because nodes ARE present)
- The deployment may be on a code version that predates the canonical
relation storage fix from PR #25886
- User-specific relation type / glossary structure not captured here
The tests now stand as regression guards: any future change that
breaks cross-glossary surfacing on either path will fail CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Revert "test(glossary): pin cross-glossary relation surfacing in graph + list paths"
This reverts commit 460459b00c.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(restore): bulk + async restore for large entity hierarchies
EntityRepository.restoreEntity walked descendants synchronously, taking
4+ minutes on a 12k-table database and exceeding typical proxy timeouts.
restoreChildren now groups CONTAINS children by type and dispatches one
bulkRestoreSubtree per type, batching DB writes, version history,
change events, and cache invalidation; the existing ES cascade handles
descendant index updates in one update_by_query.
Adds an async option (?async=true) on the deep-hierarchy restore
endpoints that returns 202 Accepted with a job id and runs the restore
on AsyncService, emitting WebSocket notifications on
restoreEntityChannel. Java SDK adds .restore().async().execute() fluent
builders on Tables/Databases plus restoreServerAsync on
EntityServiceBase; Python SDK mirrors this with
restore_request().with_async().execute() and restore_async() helpers
on BaseEntity, exposing a new AsyncJobResponse type.
Tests: EntityRepositoryRestoreTest verifies the per-type grouping and
bulk dispatch path; RestoreFluentAPITest covers the Java SDK fluent
behavior; RestoreHierarchyIT exercises sync and async restore against a
real DB→schemas→tables tree end-to-end; test_restore_async.py covers
the Python SDK paths.
Fixes#4003
* docs(ingestion): design for runtime diagnostics subsystem
Proposal for an always-available, opt-in (loggerLevel=DEBUG) diagnostics
layer inside the ingestion framework so connector runs that hang, OOM, or
slow down produce enough live evidence to identify the root cause in
`kubectl logs` — without `py-spy`, `kubectl debug`, or ptrace.
Grounded in three concrete production cases:
- The Snowflake "hang" that was actually a logging recursion bug in
StreamableLogHandler (fixed by PR #28160) but took ~6 hours and one
wrong-theory fix to identify.
- Recurring OOMKills with no last-state evidence and no way to attribute
growth to a specific object type or stage.
- "Is it stuck or just slow?" with no way to answer from outside the pod.
The design is gated entirely on the existing `workflowConfig.loggerLevel`
(no new env vars, no new config fields). When off, the module is dead
code. When on (~250 KB / <0.01% CPU), it provides:
- An operation registry of "what each thread is doing right now"
- SIGUSR1 / SIGUSR2 handlers for on-demand dumps to stderr
- A watchdog thread that auto-logs hangs at 60s and auto-dumps at 300s
- A heartbeat thread emitting one structured progress line every 30s
- A memory tracker (RSS / cgroup / GC top-types on dump)
- Stage-backpressure visibility (queue depths between source/processor/sink)
- HTTP introspection of OMetaClient and DB cursor execute()