Commit graph

761 commits

Author SHA1 Message Date
mohitdeuex
e90a48fadd Update name 2026-05-21 18:37:04 +05:30
mohitdeuex
4f78c6cf05 test(review): address pmbrull nits on PR #28008
- nightly workflow: reformat the Topology comment block (drop the
  column-aligned space padding that read as "weird spaces").
- nightly workflow: hoist the stress cohort sizes (simpleReindex
  tables/topics/dashboards/pipelines, searchAvailable tables) into
  workflow_dispatch inputs with the current values as defaults, so
  they're tunable from the Actions UI per run.
- remove openmetadata-integration-tests/REINDEX_TEST_PLAN.md — a
  planning/tracking doc that doesn't belong in the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:33:43 +05:30
mohitdeuex
b933eee132 test(review): address PR #28008 bot review comments
Real bugs:
- UiTestServer: external mode (OM_URL+OM_ADMIN_TOKEN) now honours the
  operator token instead of minting a local one the external server
  won't trust; no TokenRefresher for the static external token.
- UiSession.uiUrl(): strip the /api REST base before appending UI
  paths instead of relying on URI.resolve (fragile for relative paths
  / trailing-slash bases → /api/<route> 404s).
- CpuSampler.percentile(): index off (length-1); floor(p*length)
  returned the max for small n, overstating p95.
- OidcEnvBuilder: keep OM's own JWKS in AUTHENTICATION_PUBLIC_KEYS
  alongside the mock IdP's — SSO mode still validates OM-minted
  internal/bot tokens.
- DataQualityDashboardPage.tryClickDimensionCard: stop swallowing
  click/navigation failures as "card absent"; only true absence skips.
- UiSessionExtension: don't save a trace for TestAbortedException
  (a skipped assumption is not a failure).

Robustness / cleanup:
- GoogleSsoBootstrapUIIT: build expected authority from
  MockOidcServer.NETWORK_ALIAS/PORT instead of a hardcoded :1080.
- EntityLoaderSmokeUIIT: log load duration instead of asserting a
  wall-clock bound (flaky on shared runners).
- ReindexHelpers.stopAppAndWait: drop unused stopRequestedAt.
- nightly workflow: dedupe apt package list.
- Javadoc fixes (UiSessionExtension AuthStrategy ref, IncidentManager
  seed count 18 -> 20).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:08:46 +05:30
Mohit Yadav
23ccb2dcff
Merge branch 'main' into java-playwrights 2026-05-21 16:44:15 +05:30
IceS2
14f880636a
ci(airflow-apis-tests): migrate Sonar step to sonarqube-scan-action@v7 with retry + add workflow_dispatch (#28292)
* ci(airflow-apis-tests): retry Sonar PR scan on JRE-provisioning flake

Mirror the py-tests pattern: migrate from the deprecated
sonarsource/sonarcloud-github-action@master to
SonarSource/sonarqube-scan-action@v7, mark the PR scan
continue-on-error, and add a sleep+retry step so a transient
'Failed to query JRE metadata' from Sonar's JRE-provisioning
endpoint no longer fails the job on first attempt. Hoist the
shared sonar args into a workflow-level SONAR_OPTS env.

* ci(airflow-apis-tests): allow workflow_dispatch + run Sonar step on it

Add workflow_dispatch trigger so the Sonar retry path can be
exercised from the Actions UI without opening a PR, and extend
the Sonar PR step (plus its wait+retry siblings) to run on the
dispatch event.

* ci(airflow-apis-tests): scope Sonar steps to pull_request_target only

Drop workflow_dispatch from the Sonar PR/retry step conditions so
manual runs don't fire the scanner with empty -Dsonar.pullrequest.*
flags (would create a branch entry in SonarCloud, per gitar-bot
review). Dispatch trigger stays for re-running the build/test
surface; Sonar will only fire on a real PR where the pull-request
context exists.
2026-05-20 10:33:47 +02:00
mohitdeuex
e1d6734acb Update webhook 2026-05-20 00:28:14 +05:30
mohitdeuex
3895cd1f70 Speed up nightly Playwright workflow + fix flaky reindex assertions
Workflow:
- Split into a build-image job that bakes openmetadata-server:jpw-snapshot
  via docker/build-push-action with a shared GHA layer cache, then exports
  the loaded image as a workflow artifact. Both matrix entries
  (elasticsearch + opensearch) download + `docker load` the same image
  and set OM_TEST_IMAGE so ContainerizedServer skips its own `docker build`.
  Result: one image build per workflow run (was 2x duplicated dist build
  + per-launch in-test builds) and stable cross-matrix correctness — the
  binary both shards run against comes from the exact same source SHA.
- workflow_dispatch input `includeSsoBootstrap` toggles whether the
  @Tag("sso-bootstrap") SSO bootstrap UIITs run; default off because they
  spin up their own ContainerizedServer (second OM lifecycle) for an env
  wiring check that doesn't change between most runs.
- Slack notification migrated to slackapi/slack-github-action@v2 with the
  incoming-webhook payload shape it now requires, and guarded behind
  `env.SLACK_WEBHOOK_URL != ''` so a missing secret no-ops instead of
  failing the post-step.
- Publish Test Report step set fail_on_test_failures=false — mvn verify
  already gates the job conclusion, and a flake in the report action
  shouldn't cascade into the Slack step.

Test fixes:
- SearchAvailableDuringReindexUIIT: baseline probe now asserts
  `>= seeded.countOf(TABLE)` instead of strict equality. The OM container
  is shared across the suite so the index can legitimately have residual
  entities from earlier tests; assertEventualConsistency already checks
  that none of *our* baseline entities go missing across the recreate.
- SimpleReindexTriggerUIIT: assertExploreCount now polls via Awaitility
  with a 2-minute budget, re-opening the Explore page on every tick.
  Playwright's `hasText` polled only the DOM, which wedges against a
  stale aggregation cache; re-issuing the search aggregation on each
  retry lets ES catch up after the alias swap.

Tagging:
- @Tag("sso-bootstrap") on GoogleSsoBootstrapUIIT + MockIdpSmokeUIIT, and
  the `ui-it` profile now reads `ui.it.excludedGroups` (default
  `sso-bootstrap`) so default `mvn verify -P ui-it` skips them. Pass
  `-Dui.it.excludedGroups=` to include them.
2026-05-19 22:39:27 +05:30
mohitdeuex
db481fdeac Address remaining PR review comments
Bugs
- IndexFieldExplosionIT: SCHEMA_ALIAS was `databaseSchema_search_index`; the
  canonical indexMapping.json name is `database_schema_search_index`.
- ExplorePage:
  - `tabTestId(GLOSSARY_TERMS)` produced `glossaries-tab`, but the UI builds
    the testid from the i18n label (`Glossary Terms` → `glossary terms-tab`).
  - `Tab.DASHBOARD_DATA_MODELS` path was `dashboardDataModels`; the Explore
    route segment is singular (`dashboardDataModel`).
  - Javadoc {@link} now points at the correct `openWithSearch` overload.
- UiSessionExtension: split video lifecycle so the `Video` handles are
  snapshotted before `context.close()` (pages() is empty after close) but
  `video.path()` is resolved AFTER close (Playwright finalises the file on
  close — calling .path() earlier blocks/fails).
- GoogleSsoSignInUIIT: removed the empty alternative from the
  `(my-data|explore|)` regex; it matched almost any post-auth URL and
  weakened the assertion.
- MockOidcServer: still requires a single fixed port (token `iss` claim has
  to match across container/host/browser), but the port is now overridable
  via `-Dom.mockOidc.port=NNNN` and a fast pre-flight `ServerSocket` probe
  fails clearly when the chosen port is busy. GoogleSsoSignInUIIT now reads
  the port from `MockOidcServer.PORT` instead of hard-coding 1080.

Test hygiene
- SearchAvailableDuringReindexUIIT: replaced `Thread.sleep` polling with
  Awaitility (`.atMost(REINDEX_TIMEOUT).pollInterval(PROBE_INTERVAL)`),
  giving the loop a real deadline and removing the antipattern.
- ClipboardHelper: replaced the fixed `waitForTimeout(300)` with bounded
  paste-retries until the hidden textarea has a non-empty value; textarea
  cleanup moved to a `finally` block.
- SimpleReindexTriggerUIIT / SearchAvailableDuringReindexUIIT: defaults are
  now PR-friendly (200/100/100/100 tables/topics/dashboards/pipelines and
  500 tables respectively) overridable via system properties; the nightly
  workflow sets the historical 5k stress numbers.

Quality
- DistributedAutoTuneReindexUIIT.distributedAutoTuneConfig now returns
  `Map.of(...)` instead of a mutable `HashMap`.
- SearchQueryHelper.SearchProbe defensively copies `ids` / `uniqueIds` to
  immutable collections in the canonical constructor.
- EntityLoader: every parameter and local that doesn't change is now
  `final`.
- AuthAssumptions: `toLowerCase` calls now pin `Locale.ROOT` to stay stable
  under Turkish / other surprising locales.

Docs
- PageObject javadoc: list of rules updated to reflect actual contract
  (Page Objects may expose `Locator`-returning accessors, `rawPage()` is a
  documented escape hatch).
- UI_TEST_CONVENTIONS.md: layering diagram now lists the real packages
  (`playwright.scenarios`, `playwright.ui.pages`, `it.auth`, `it.server`).
  Rule about Locator/Page softened to match the real contract. Headed-debug
  recipe points at `:openmetadata-integration-tests` (the
  `:openmetadata-java-playwright` module was removed). Stale references to
  MIGRATION_TRACKING.md and SearchAfterReindexUIIT replaced with
  REINDEX_TEST_PLAN.md and SimpleReindexTriggerUIIT.
- REINDEX_TEST_PLAN.md: helpers table now flagged as a planning shape with
  an explicit list of what's shipped today vs. what's still aspirational.
2026-05-19 22:03:29 +05:30
mohitdeuex
c6682464df Remove playwright-or 2026-05-19 21:24:47 +05:30
mohitdeuex
b015277df3 Address PR review comments: antlr CLI, URL encoding, catch-block split
- Install antlr4 CLI + native build deps in java-playwright PR and nightly
  workflows (yarn install of openmetadata-ui runs the .g4 → JS codegen,
  which fails with `antlr4: not found` otherwise).
- SearchClient: split combined IOException|InterruptedException catch so
  only InterruptedException re-sets the interrupt flag; an IOException
  shouldn't make unrelated higher-level code think the thread was
  interrupted.
- SearchQueryHelper.probeIndex: URL-encode `query` and `indexAlias` before
  splicing into the query string.
- OidcBackend.acquireToken: URL-encode DEFAULT_USER (contains `@`) and
  DEFAULT_PASSWORD in the password-grant form body.
- openmetadata-integration-tests/pom.xml: mark Playwright dependency as
  test-scoped.
2026-05-19 21:24:21 +05:30
mohitdeuex
0f766ac2dc Fix Java Playwright CI: build local image before runMigrations
Two coupled fixes for the ContainerFetchException seen in
https://github.com/open-metadata/OpenMetadata/actions/runs/26087848414:

1. ContainerizedServer.launch() now materialises the openmetadata-server
   image at the very top via a new ensureServerImageAvailable() helper.
   Previously runMigrations() ran first and tried to start a container
   using the jpw-snapshot tag — testcontainers then attempted a registry
   pull, fell over with ContainerFetchException, and the whole run failed
   before newServer()/buildLocalImageContainer() had a chance to build
   anything. The image build is now done once, before any container needs
   the tag. Honors OM_TEST_IMAGE override (skips local build).

2. Nightly workflow gets an explicit "Build openmetadata-dist tarball"
   step. The previous `mvn install -pl :openmetadata-integration-tests
   -am` doesn't transitively build openmetadata-dist (not a test
   dependency), so openmetadata-*.tar.gz was never produced — meaning
   ensureServerImageAvailable() would still fail in CI at
   locateDistTarball(). Added before the test run, after deps build.
2026-05-19 15:02:41 +05:30
Mohit Yadav
fb954a9141
ci: add Java Playwright UIIT workflow (dispatch-only) (#28251)
Lands java-playwright-nightly.yml on main so the workflow becomes
dispatchable. workflow_dispatch only registers when the workflow file
exists on the default branch; once merged, the suite can be run on
demand against any branch ref. Tracks EPIC #3731.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 14:37:37 +05:30
mohitdeuex
ddfc275b6c ci: disable nightly UIIT cron, keep workflow_dispatch only
Run the UI integration suite on demand while it stabilises; re-add the
schedule trigger once it is green on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 14:07:14 +05:30
mohitdeuex
609269db07 Workflow changes 2026-05-19 14:01:41 +05:30
Mohit Yadav
6463edbb5a
Merge branch 'main' into java-playwrights 2026-05-18 20:03:01 +05:30
Harsh Vador
286a26f81f
ci(security-scan): post Snyk summary to Slack + fail on high/critical (#28200)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + Elasticsearch + Redis / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + Elasticsearch + Redis / integration-tests-postgres-elasticsearch-redis (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* ci(security-scan): post Snyk summary to Slack + fail on high/critical

* fix slack post channel

* mention repo name

* address gitar
2026-05-17 10:36:11 -07:00
Harsh Vador
d5bc00d1da
ci(security-scan): readable Snyk job summary + consolidated Slack alert (#28170)
* generate snyk summary

* address gitar

* address gitar

* generate summary

* remove duplicate notification
2026-05-16 07:05:10 -07:00
Sriharsha Chintalapani
5696286b27
Address Transitive vulnerabilities (#28169)
* Address transitive vulnerabilities

* Address transitive vulnerabilities

* fix(deps): resolve pyOpenSSL/cryptography conflict and align constraint pins

CI dependency resolution failed because pyOpenSSL~=24.1.0 caps cryptography
at <43, conflicting with the cryptography>=44.0.1 bump. Widens pyOpenSSL to
>=24.3.0 (first version compatible with cryptography 44.x) and aligns the
airflow constraint file pins for cryptography and GitPython with the
upstream setup.py bumps so pip install -c can resolve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 00:02:49 -07:00
Harsh Vador
bb5c64658e
ci: consolidate security scan Slack notifications into single combined alert (#28135)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + Elasticsearch + Redis / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + Elasticsearch + Redis / integration-tests-postgres-elasticsearch-redis (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* ci: consolidate security scan Slack notifications into single combined alert

* address gitar

* add env
2026-05-15 21:40:05 -07:00
Sriharsha Chintalapani
64f49c1747
Cache improvements: lineage + search layers, observability, CI gate (#28012)
* cache: lineage cache, per-type metrics, invalidation registry, search-cache

Add Redis-backed lineage response cache and search response cache, both
gated by the existing CACHE_PROVIDER toggle and falling through to direct
computation when the cache is unavailable. The cache remains optional —
verified end-to-end by toggling CACHE_PROVIDER=none on a live stack and
confirming all paths continue to work (just without the L2 hit).

Coverage:
- CachedLineage wraps LineageRepository.getLineage with hybrid TTL +
  direct invalidation (60s default). Direct edits invalidate the affected
  root cache entries; transitive changes fall through to TTL.
- CachedSearchLayer wraps /api/v1/search/query with auth-aware caching
  (cache key includes principal so users with different ACLs don't share
  results). 30s default TTL.

Observability:
- /api/v1/system/cache/stats response now includes a metrics block with
  hits/misses/hitRatio/evictions/errors/writes plus read/write latency
  Timers, and a byType breakdown so coverage gaps are visible per
  entity-type and per cache-layer.

Correctness:
- New Invalidatable interface + CacheBundle registry + invalidateEntity
  helper so future cache layers plug in by implementing one method
  instead of editing multiple mutation paths.
- Edge mutations in LineageRepository.addLineage/deleteLineage invalidate
  both endpoints; entity mutations in EntityRepository.postUpdate /
  postDelete / restoreEntity invalidate the lineage rooted at the entity.
- Pub/sub handler in CacheBundle iterates registered Invalidatables so
  remote-pod evictions flow to all layers automatically.

Tooling:
- docker-compose.cache-off.yml overlay flips CACHE_PROVIDER=none for
  local A/B testing without tearing down DB/ES volumes.
- CachedSearchLayerIT exercises hit-on-second-call, distinct-query
  misses, distinct-page-size misses, and byType shape via the metrics
  endpoint. Each test gracefully no-ops when the cluster runs cache-off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cache: phase 2 ops + correctness — single-flight, slow-read, negative cache, admin endpoints

Builds on the phase 1 commit (c20a29b11b) with operability and correctness
items from .context/cache-improvements-design.md. All four pieces respect
the optional-cache contract: with CACHE_PROVIDER=none they no-op cleanly.

P2.3 — Single-flight on CachedSearchLayer
  Striped<Lock> keyed by SHA-cache-key. 100 concurrent users hitting the
  same uncached query collapse to one ES call instead of N. SearchResource
  now uses loadOrCompute so the lock-and-recheck pattern lives inside the
  cache layer; the supplier is the actual ES call kept tight to minimize
  lock-hold time. Non-200 upstreams bypass cache and refetch.

P2.6 — Slow cache reads logged
  RedisCacheProvider.get/hget timing checked against
  cache.slowReadThresholdMs (default 50ms). Exceeding fires a WARN log
  and bumps a new cache.reads.slow Micrometer counter exposed in
  /cache/stats.metrics.slowReads. Leading indicator of Redis pressure /
  network glitch / hot-key contention.

P2.4 — Negative caching for not-found entities
  NotFoundCache marks "we looked, no such entity" with a short TTL
  (default 30s) so repeated 404 lookups (typo'd FQNs, references to
  deleted entities) don't hammer the DB. Wired into
  EntityRepository.find(UUID) and findByName for the !fromCache path.
  Implements Invalidatable so the postCreate fan-out drops the marker
  on entity create — without that, create-then-immediately-read would
  404 for up to TTL.

  Added CacheBundle.invalidateEntity to EntityRepository.postCreate so
  newly-created entities reach every Invalidatable registry layer.

P2.5 — Admin cache ops endpoints
  GET  /api/v1/system/cache/keys?pattern=...      — SCAN keys, returns count
  POST /api/v1/system/cache/invalidate?pattern=.. — SCAN+UNLINK, returns deleted
  POST /api/v1/system/cache/invalidate/entity?type=&id=&fqn=
                                                  — fan to all Invalidatables

  All admin-only. Pattern endpoints document the "no broad globs" rule —
  we never want a SCAN over om:prod:* on a busy cluster. Per-entity
  endpoint goes through the existing Invalidatable registry so future
  cache layers are reachable from ops without ever touching this code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cache: pipelined mget on CacheProvider + CachedReadBundle.getBatch

Adds a foundational batch-read primitive at the provider layer:

  CacheProvider.mget(List<String>) -> List<Optional<String>>

Default implementation does sequential per-key gets (correct, no batching
benefit). RedisCacheProvider overrides with a true pipelined version: all
GETs are queued under setAutoFlushCommands(false), then flushed once and
awaited as a single TCP round-trip. Records hits/misses through the
existing CacheMetrics counters and respects the slow-read threshold.

Per-key pipelining over true MGET — Redis Cluster requires same-slot keys
for MGET; pipelined per-key GETs work transparently across slots without
the constraint, at the same network cost.

CachedReadBundle.getBatch(entityType, ids) consumes the new primitive
for prefetch use cases (UI prefetch on hover, list-then-detail
navigation warmup). The list endpoint hot path itself does NOT use this
layer — list responses are SQL-batched via EntityRepository.setFieldsInBulk
which calls fieldFetchers in bulk, not per-row CachedReadBundle.get.
That's why bench3 showed list endpoints at neutral cache_off-to-on
ratio: lists already amortize at the SQL layer.

The mget primitive is what later phases will plug into when wiring
batch-prefetch to specific UI flows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(cache): use unique query in sameQueryHitsCacheOnSecondCall to avoid state pollution

Sequential test run on postgres-os-redis caught a flake: the test issued
3 identical "q=*" calls expecting at least 1 cold-write. By the time it
ran, prior tests in the same JVM session had already cached
(q=*, index=table_search_index, size=10), so call 1 was a hit, call 2
hit, call 3 hit — total writes=0, asserts failed.

Switching to a per-invocation nonce ensures we always start cold,
matching the pattern the other 3 tests in this class already use.

Confirmed via subsequent parallel-pass run on the same suite where the
test passed (different test ordering, fresh cache for that key).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* cache: drop search cache TTL from 30s to 2s for create-then-search freshness

Integration tests on the postgres-os-redis profile caught a real correctness
regression: tests that create an entity and Awaitility-poll for it to appear
in search timed out at 30s because our 30s search TTL pinned the
pre-create empty result for the entire test window. Same issue surfaces
in production: a user creates a domain / table / dashboard and immediately
searches for it would see "no results" for up to 30s.

2s caps the staleness while still catching the dominant UI access pattern:
multiple components in the same render frame fire identical search queries.
Those happen within milliseconds, well inside any reasonable TTL.

The longer-term fix is search-cache invalidation on entity writes (a
generation counter per entity-type, search keys include the generation,
writes bump the generation). That's design-doc-tracked in
.context/cache-improvements-design.md but deferred — the 2s TTL is good
enough for now, and the more complete invalidation strategy can be a
follow-up PR with its own dedicated tests.

Failing tests under 30s TTL that this fixes:
  - DomainAssetsColumnExclusionIT (domain create-then-search)
  - LineageImpactAnalysisIT (owner removal reflected in search)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: cache-tests profile runs full IT suite + new postgres+es+redis CI workflow

The cache-tests Maven profile previously ran only the four cache/* IT
classes — too narrow to catch cache-correctness regressions in the rest
of the codebase. Expanded it to mirror the mysql-elasticsearch profile
shape: sequential + parallel failsafe executions, full **/*IT.java
inclusion, postgres + elasticsearch + redis backend, with
cacheProvider=redis system property added so every test path exercises
the cache layer.

Locally, the focused-cache-only run is preserved via
  mvn verify -P cache-tests -Dit.test='**/cache/*IT'

New CI workflow integration-tests-postgres-elasticsearch-redis.yml
mirrors the structure of integration-tests-postgres-opensearch.yml:

  - Same triggers (push to main, PR target, merge_group, workflow_dispatch)
  - Same path filters (openmetadata-service/**, integration-tests/**, etc.)
  - Same Maven cache + JDK 21 setup
  - Runs `mvn verify -pl :openmetadata-integration-tests -Pcache-tests`
  - Surefire-report publication with fail_on_test_failures

Result: PRs touching cache code (or any read path) get automatic CI
coverage with redis enabled. Cache-invalidation and stale-data bugs
that previously only surfaced in production now have a CI gate before
merge — same protection that mysql-elasticsearch and postgres-opensearch
provide for the no-cache code paths.

Smoke verified locally: `mvn verify -P cache-tests -Dit.test='**/cache/*IT'`
runs both sequential and parallel passes (6 tests each), all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): address PR review feedback for cache improvements

Nine review-driven fixes spanning the cache PR (#28012):

RedisCacheProvider.mget (bug):
  - Restructured the auto-flush window so `setAutoFlushCommands(true)` is
    in the OUTER `finally` of the entire op. The previous structure had
    the restoration in an inner finally that only fired around the
    awaitAll call; an exception in the queueing loop or flushCommands()
    would leave the SHARED connection in auto-flush=false mode, making
    every subsequent op from any caller silently buffer indefinitely.

SearchResource (bug):
  - Removed the double-call on the non-cacheable response path. The
    supplier now captures the upstream Response object so the outer code
    can return it directly when the body isn't cacheable (non-200 or
    non-String entity) — previously the caller re-invoked
    searchRepository.search() on every error/non-200, doubling backend
    load for failing queries.

EntityRepository negative cache (edge case):
  - Hoisted the NotFoundCache fast-path OUTSIDE the `!fromCache` guard in
    both `find(UUID,...)` and `findByName(...)`. Default callers go in
    via `find(id, include)` which delegates with fromCache=true; the
    previous gate made the fast-path unreachable for the most common
    caller. Also added negative-cache population from the cached path's
    ExecutionException so repeated requests for a non-existent id do
    short-circuit after the first miss.

SystemResource cache endpoints (security + style):
  - `/cache/keys` and `/cache/invalidate` now validate the glob pattern
    via `validateCachePattern` — rejects pure wildcards or patterns with
    fewer than 6 literal characters before the first wildcard. Stops a
    careless or malicious admin from issuing `*` or `om:*` that would
    block the Redis cluster on a large keyspace. ReDoS-safe: linear
    char scan, no regex backtracking.
  - `/cache/invalidate/entity` now also calls
    `EntityRepository.invalidateCacheForEntity(...)` to evict the Guava
    L1 caches (`CACHE_WITH_ID`, `CACHE_WITH_NAME`) and propagate via the
    existing pub-sub channel — the previous code only invalidated the
    `INVALIDATABLES` registry layers, leaving stale L1 entries.
  - Replaced fully-qualified class names (`org.openmetadata.service.
    cache.CacheMetrics`, `jakarta.ws.rs.QueryParam`, `java.util.UUID`)
    with proper imports per the project style guide.

CachedLineage (edge case):
  - Single-flight stripe lock now keys on the FULL cache key
    `(rootId, upstreamDepth, downstreamDepth, includeDeleted)` instead
    of `rootId` alone. Concurrent requests for different depths or
    include-deleted flags on the same root no longer block each other.

CachedSearchLayer (doc):
  - Javadoc now correctly says default TTL is 2s (was incorrectly 30s)
    and explains why — see commit 41489056ff which dropped it from 30s
    after IT regressions where users couldn't see their own writes for
    half a minute.

CI workflow (bugs + security mitigation note):
  - Removed `if: steps.cache-output.outputs.exit-code == 0` from the
    `Set up JDK 21` and `Install Ubuntu dependencies` steps.
    `actions/cache@v4` exposes `cache-hit`, never `exit-code`; the
    expression always evaluated to false and those steps NEVER ran.
    Maven was using whatever JDK shipped with the runner.
  - Added explicit security note in the workflow header AND on the
    `Checkout` step documenting why `pull_request_target` is intentional
    and what the `safe to test` label gate accomplishes — CodeQL flags
    the pattern, the label gate is the accepted mitigation that mirrors
    every other integration-tests-*.yml workflow in this repo.

Verified:
  - mvn compile -pl openmetadata-service → BUILD SUCCESS
  - mvn test -pl openmetadata-service -Dtest=OpenMetadataAssetServletTest
    → 9/9 pass
  - mvn spotless:apply ran clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): only negative-cache on real EntityNotFoundException

The previous code caught every ExecutionException / UncheckedExecutionException
from the Guava cache loader and (a) populated NotFoundCache for 30s, (b)
rethrew as EntityNotFoundException. That conflated three very different
failure modes:

  1. Entity truly doesn't exist     → loader throws EntityNotFoundException
  2. Entity exists but is invalid   → loader throws IllegalStateException
  3. Transient DB / deser failure   → loader throws JdbiException, IOException

Cases 2 and 3 would poison the negative cache, turning a momentary DB
hiccup or a single bad row into a sustained 30s 404 for every caller that
asks for that id/fqn. Worse, the original cause was masked behind a
synthetic EntityNotFoundException, so logs and clients never saw the real
failure.

This change inspects e.getCause() and:
  - On EntityNotFoundException: populate NotFoundCache, rethrow the
    original (not a synthetic) so the caller's `instanceof` checks and
    message text still work.
  - On any other RuntimeException: rethrow unchanged — DB blips return
    5xx as before, validation errors surface, and the next request can
    re-attempt without hitting a poisoned cache.
  - On checked Throwable cause (rare for these loaders): wrap in
    RuntimeException so the contract is preserved.

Applied symmetrically to find(UUID, …) and findByName(String, …).

Addresses gitar-bot review on PR #28012:
https://github.com/open-metadata/OpenMetadata/pull/28012#discussion_r... (negative cache poisoning)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): copilot review — blank param, javadoc, mget hardening

Four review comments from PR #28012 review 4266159401:

SystemResource.invalidateCacheForEntity (line 1069 → blank query params):
  `?type=X&id=&fqn=` slipped past the required-params check because only
  `null` was treated as absent. Normalize blank id/fqn to null up front
  so the missing-both branch fires correctly and the downstream
  CacheBundle / EntityRepository calls receive a clean null instead of
  an empty string.

CacheKeys.search/childrenPage (line 116 → orphaned Javadoc):
  When the search() helper was added between the children-page Javadoc
  and the childrenPage() method, the Javadoc got stranded above the
  wrong method. Move it back so javadoc tooling generates accurate docs.

RedisCacheProvider.mget (line 610 → shared-connection auto-flush race):
  setAutoFlushCommands(false) toggles state on the shared Lettuce
  connection — two concurrent mgets could overlap and one caller's
  commands would buffer until the other restored auto-flush, surfacing
  as latency spikes / hangs on other paths sharing the connection.
  Wrap the pipeline in a new instance-level ReentrantLock so only one
  mget runs the auto-flush dance at a time. try/finally still restores
  auto-flush unconditionally; lock release sits in an outer finally.

RedisCacheProvider.mget (line 621 → unbounded f.get() on timeout):
  Previously LettuceFutures.awaitAll(...) returned a boolean we ignored;
  if it timed out, the subsequent f.get() calls were unbounded and would
  block the request thread until the Lettuce event loop eventually gave
  up. Capture the boolean, cancel non-done futures on timeout (so f.get()
  returns CancellationException instead of blocking), and log a warning
  with the timeout value and key count for operators.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): mget partial timeout must trip the circuit breaker

The previous mget rewrite cancelled in-flight futures on awaitAll timeout
but still called recordSuccess() at the end of the happy-path. That fed
consecutiveSuccesses on every partial timeout, so a Redis instance that
was consistently slow (answering some keys, dropping others) would
*never* trip the breaker — masking real backend degradation as healthy.

Branch on the captured allCompleted boolean:

  - all futures completed → recordSuccess() as before
  - partial timeout → recordFailure(TimeoutException) and bump
    CacheMetrics.recordError() so the breaker's sliding-window failure
    detector picks it up and the metric reflects the degraded state

No other behaviour change — the per-key fallback Optionals still surface
to callers either way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): mget shorter critical section + cache/stats + cache/keys doc

Three review comments from PR #28012 second copilot pass:

RedisCacheProvider.mget (RedisCacheProvider.java:624 — shared-connection
hold time): previous code held setAutoFlushCommands(false) for the entire
queue+flush+await window. Other paths (single get/set/hget on the same
Lettuce connection) would buffer until our await finished. Shrink the
critical section to just queue+flush: once flushCommands() returns, the
batch is on the wire and we can restore auto-flush and release the
pipelineLock before awaiting. A slow Redis now blocks only the calling
thread, not every concurrent caller using the shared connection.
Cancel-on-timeout and breaker accounting are unchanged.

SystemResource.getCacheStats (line 962 — noisy WARN when cache disabled):
CacheMetrics.getInstance() logs WARN every call when the metrics singleton
isn't initialized, which happens whenever CACHE_PROVIDER=none. An ops
dashboard polling /system/cache/stats on a cache-off deployment would
spam the log. Gate the metrics call on cacheProvider.available() so the
WARN never fires in that configuration. Stats payload still includes
provider-level fields; just no `metrics` key when cache is off.

SystemResource.scanCacheKeys (line 1006 — OpenAPI lies about count param):
Description claimed "bounded by the count parameter" but no count param
exists; scanCount() walks the full cursor. Rewrote the description to
state the actual safety mechanism: the validateCachePattern enforces a
6-character literal prefix before any wildcard, so '*' and 'om:*' are
rejected at validation. Reflects what the endpoint actually does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): copilot review pass 3 — hot-path L1 check + lineage hash + cleanups

Eight comments from the latest copilot review on PR #28012:

1. SystemResource.getCacheStats: gate metrics on cacheConfig.provider != none
   instead of cacheProvider.available(). When Redis is configured but the
   circuit breaker is tripped, app-level counters are exactly what an
   operator needs to diagnose the outage — suppressing them while the
   provider is "down but configured" hides the diagnostic signal. Also
   downgrade CacheMetrics.getInstance() WARN → DEBUG so a poller loop
   doesn't spam logs in the entirely-normal cache-off state.

2. CachedReadBundle.getBatch contract: the method is documented as
   returning a list 1:1 with entityIds, but bypass returned
   Collections.emptyList() and callers indexing by position would shift
   off the rails. Return a same-size list of nulls under bypass so the
   positional contract holds regardless of cache state.

3+4. CacheBundle.invalidateEntity / Invalidatable.invalidate javadocs
   claimed they were called from EntityRepository.postUpdate / postDelete
   / restoreEntity. They are NOT (only postCreate, the pub-sub handler,
   and the admin endpoint reach this path). Updated both javadocs to
   reflect actual call sites so future Invalidatables aren't built on a
   wrong invalidation contract.

5+6. EntityRepository.find / findByName: check Guava L1 (getIfPresent)
   FIRST, NotFoundCache only on L1 miss. The previous shape consulted
   NotFoundCache before L1, adding one Redis GET per cached read — a
   regression on the hottest read path. L1 hit now serves with zero
   Redis traffic; the negative cache short-circuits only when the loader
   would otherwise pay for a DB / Redis-L2 round trip.

7. CachedLineage redesign: variants for one root now live as fields of a
   single Redis hash (HSET / HGET) instead of separate keys. Invalidate
   is one DEL — O(1) — instead of SCAN-and-iterate (O(N) over keyspace).
   This matters because invalidate fires on the hot write path (entity
   updates and lineage-edge mutations) and the SCAN cost grew linearly
   with cache size. CacheKeys.lineageGraphPattern is gone; new helpers
   are lineageGraphHash(rootId) and lineageGraphField(up, down, incDel).

8. SystemResource.invalidateCacheForEntity: when only fqn is supplied,
   resolve to id server-side via Entity.getEntityRepository(type).
   findByName(...) before fanning out. Id-keyed cache layers (lineage,
   CACHE_WITH_ID, NotFoundCache id-side) need the UUID; the previous
   shape silently skipped them. Lookup failures are logged at DEBUG and
   the request still proceeds with fqn-only invalidation — admin
   force-invalidate is best-effort by design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): lineage hash TTL claimed only by first writer (EXPIRE NX)

Previous shape called `hset(hashKey, fields, ttl)` which translated to
HSET + EXPIRE under the hood. Every variant write therefore reset the
hash's expiry — variant A cached at T=0 with TTL=60, variant B cached at
T=55, and A's effective lifetime jumped to 115s instead of the intended
60s. Under a constant trickle of variant writes on a hot root, the
"stale" variant could effectively live forever.

Split the operation:

  - CacheProvider.hset(key, fields) — new overload, no TTL touch.
    Defaults to a 365-day TTL so providers that don't override get
    a long-lived key rather than an immortal one.
  - CacheProvider.expireIfAbsent(key, ttl) — EXPIRE … NX semantics:
    set the TTL only when the key has no prior expiry. Default
    returns false (providers that can't express NX get no extension
    benefit, but no regression).
  - RedisCacheProvider implements both: HSET without expire, then
    EXPIRE with ExpireArgs.Builder.nx(). Falls back gracefully on
    Redis < 7.0 (logs at DEBUG, returns false).

CachedLineage.safeHset now uses the split shape — the first writer
to seed a hash establishes the 60s window; subsequent variant writes
leave the expiry alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): mget unavailable-path alignment + lineage deser fallback

Two copilot review comments on PR #28012:

RedisCacheProvider.mget (line 646): when `available == false` we returned
`Collections.emptyList()`, violating the 1:1 positional contract that
callers (CachedReadBundle.getBatch and friends) rely on. Match the
error-fallback branch: return one Optional.empty() per requested key so
caller-side indexing stays aligned regardless of provider health.
Truly-empty input keeps returning empty list (no positions to align).

LineageRepository.getLineage (line 1345): unconditional readValue on the
cached JSON would throw and fail the request if Redis held a
partial/corrupted/old-schema value — turning cache corruption into a
persistent 500 until TTL expiry. Wrap the deserialize in try/catch; on
failure log WARN with the root id and depth, invalidate the affected
root's lineage hash, and fall through to a fresh computeLineage(). User
sees the same answer as cache-off; subsequent requests repopulate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): expireIfAbsent falls back to plain EXPIRE on NX failure

The previous shape returned false silently when EXPIRE … NX wasn't
supported (Redis < 7.0 syntax error, transient failure). That meant the
preceding HSET-without-ttl call could leave the lineage hash key with no
expiry at all, accumulating in Redis memory until the next manual
invalidation.

Catch the NX failure, log at DEBUG, and issue a plain EXPIRE so the key
still gets a bounded lifetime. The trade-off: on older Redis, every
variant write extends the expiry — strictly worse than the NX semantics
on a 7.0+ deployment, but vastly better than the alternative of
permanent unbounded keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cache): copilot review pass 5 — dedicated mget conn + breaker + IT isolation + key collision

Five comments from the latest copilot review on PR #28012:

RedisCacheProvider.expireIfAbsent breaker bookkeeping (line 432, gitar-bot):
the NX-fallback path issued a plain EXPIRE without recordSuccess() /
recordFailure(), so a real network blip there was invisible to the
sliding-window failure detector. Both success and failure now feed the
breaker, consistent with every other Redis-calling method in the class.

RedisCacheProvider.mget shared-connection hazard (line 692): even with
pipelineLock, single-key callers using syncCommands/asyncCommands on the
*same* connection had their commands buffered for the duration of the
auto-flush-off window. Switched to a dedicated `pipelineConnection` /
`pipelineAsyncCommands` created at init time and closed on shutdown. The
shared connection's auto-flush is never toggled now, so unrelated request
paths can't be starved by mget. pipelineLock still serializes mget vs
mget on the dedicated connection.

SystemResource.invalidateCacheForEntity fqn→id resolution (line 1113):
the resolution call used `findByName(fqn, ALL, fromCache=true)`. That
path consults NotFoundCache and the L1/L2 caches, which an admin force-
invalidate is explicitly trying to recover from — a poisoned negative
entry would short-circuit the resolution and silently skip every id-keyed
cache layer. Switched to fromCache=false so the resolution always goes
to the DB; only then can we trust the id we hand to CacheBundle /
EntityRepository invalidation.

CachedSearchLayerIT.java parallel-execution flakiness (line 50): the
test assertions depend on deltas in the *global* /system/cache/stats
counters. Under @Execution(CONCURRENT) other ITs issuing searches in
parallel inflate the counters and the deltas either don't show up (false
negative) or come from someone else's hits (false positive that masks
broken cache keying). Marked @Isolated + ExecutionMode.SAME_THREAD so
the class runs alone within its window.

CachedSearchLayer.buildKey ambiguous encoding (line 220): fields were
joined with a raw `|` delimiter, no escaping. A query string containing
`|idx=foo` would produce the same preimage as a different (principal,
index, query) tuple — cache-key collision → wrong cached response served
to the wrong user. Added length-prefixed field encoding
(`name=<utf8-bytes>:value|`); two distinct logical tuples can no longer
serialize to the same hash input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2026-05-13 06:41:09 -07:00
Harshit Shah
77a85bffde
[CI] Add on-demand Playwright search-nightly workflow (#27908)
* test(ci): add on-demand playwright search-nightly workflow

Create a manual Playwright search-nightly workflow with the same bootstrap, reporting, Slack notification, and cleanup structure as the SSO nightly job. Add a dedicated search-nightly Playwright project and a basic nightly search smoke spec without using issue-closing keywords for #3792.

* address comments

* revert changes

* minor updates
2026-05-13 12:18:31 +05:30
Mohit Yadav
8ac53bfecc
Merge branch 'main' into java-playwrights 2026-05-11 19:48:37 +05:30
mohitdeuex
74bda64340 Merge openmetadata-java-playwright into openmetadata-integration-tests
Folds the UI integration test module into the canonical integration-tests
module under a `ui-it` Maven profile. One test home, one classpath, no
more test-jar reinstall dance or cross-module IntelliJ classpath quirks.

Why: most of the value the UI test module shipped was reusable backend
infra (factories, search helpers, server harness) that worked fine
without a browser. Keeping it in a separate module forced multiple
unnecessary boundaries — test-jar publication, IntelliJ test-classes-
jar-tests phantom paths, src/test placement for AuthBackend code that
should have been in src/main, "where does this test go?" friction.

Layout in integration-tests:
  org/openmetadata/it/auth/    JwtAuthProvider + AuthBackend / OidcBackend
                                / AuthSession / TokenRefresher / ...
  org/openmetadata/it/server/   ContainerizedServer / ServerHandle /
                                ExternalServer / sso/ profile records
  org/openmetadata/it/search/   ReindexHelpers / SearchClient /
                                SearchAssertions / SearchQueryHelper
  org/openmetadata/it/ui/       SessionBrowser / UiSession /
                                UiSessionExtension / TraceRecorder /
                                ClipboardHelper / pages/
  org/openmetadata/it/scenarios/  *UIIT.java tests
  org/openmetadata/it/util/     SdkClients + UiTestServer / OssTestServer
  org/openmetadata/it/factories/  existing + EntityLoader

Build:
  - integration-tests pom gains com.microsoft.playwright:playwright
    (test scope). Other testcontainers / jwt deps already there.
  - test-jar publish-test-harness includes pattern expanded to ship
    server/, search/, ui/ packages alongside auth/, util/, factories/,
    bootstrap/. Downstream consumers (collate) inherit the full UI
    test harness, not just backend factories.
  - New `ui-it` profile runs `**/*UIIT.java` with skip.embedded.bootstrap
    =true, PW_VIDEO=true, per-method parallel @ 0.5 factor. Mirrors the
    failsafe execution from the old playwright module.
  - Existing parallel-tests executions across all profiles gain a
    `**/*UIIT.java` exclude so embedded-mode IT runs don't pick up UI
    tests they can't run.

Module removal:
  - openmetadata-java-playwright/ deleted.
  - parent pom <modules> entry removed.
  - .github/workflows/java-playwright-nightly.yml updated to build and
    test `openmetadata-integration-tests -P ui-it` instead.

Docs:
  - MIGRATION_TRACKING.md and CONVENTIONS.md from the old module are
    UI_MIGRATION_TRACKING.md / UI_TEST_CONVENTIONS.md at the
    integration-tests root.

No test code semantics changed — pure reorganization. The 4-5 backend-
flavored *UIIT.java tests we identified as misplaced (running against
SDK with vestigial UI checks) still live under scenarios/ for now; a
follow-up will rename them to *IT.java and have them target the
embedded TestSuiteBootstrap directly to drop their ~3-minute Docker boot
overhead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 19:13:29 +05:30
Sriharsha Chintalapani
d3bbbefe37
fix(rdf): dedupe lineage edges, surface Fuseki failures, port distributed-mode improvements (#27999)
* fix(rdf): dedupe lineage edges and broaden PROV-O coverage

The RDF Knowledge Graph endpoint was emitting two edges per lineage
relationship — once as `om:UPSTREAM` (forward) and once as
`prov:wasDerivedFrom` (reverse) — because the parser preserved each
predicate's native subject/object orientation instead of canonicalizing
both into a single `(upstream, downstream)` edge.

Also extend PROV-O coverage so external SPARQL clients can use the W3C
Provenance vocabulary directly:
- `prov:Entity` / `prov:Activity` / `prov:Agent` class typing on
  datasets / pipelines / users
- `prov:wasAttributedTo` mirror of `om:owners`
- `prov:generated` (inverse of existing `wasGeneratedBy`) and `prov:used`
  on lineageDetails so the Entity → Activity → Entity chain is complete
- `prov:hadPlan` + `prov:Plan` for SQL transformation recipes
- `prov:startedAtTime` / `prov:endedAtTime` on Activity instances
- `prov:wasAssociatedWith` Activity → Agent linking
- `prov:invalidatedAtTime` on soft-deleted entities

Other RDF cleanups in the same area:
- LineageDetails URIs are now deterministic (driven by from/to ids
  instead of a timestamp), so re-indexing collapses duplicate Activity
  resources via the existing DELETE+INSERT idempotency
- Skip emitting the redundant `om:owners` JSON-string literal — the
  mapped path already produces clean `om:hasOwner <agent>` triples
- Skip empty `[]` array literals in the unmapped path
- Propagate failures from `RdfRepository.{addRelationship,
  addLineageWithDetails, bulkAddRelationships,
  bulkAddGlossaryTermRelations}` instead of silently swallowing them,
  so downstream callers can surface the failure

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf-index-app): surface Fuseki failures in app run record

Per-entity and per-batch failures from the RDF index app used to be
logged via SLF4J only — they never made it into the AppRunRecord, so
the UI/run history showed "completed" even when every entity had
silently failed to write to Fuseki.

- `RdfBatchProcessor.processEntities` now captures the last error per
  entity, returns it in `BatchProcessingResult.lastError`, and
  accumulates relationship-processing failures into the same result.
- Relationship and lineage processing methods (`processBatchRelationships`,
  `processLineageRelationship`, `processGlossaryTermRelations`) return
  structured results with failure counts and last-error messages instead
  of `void`, so failures are visible to the partition worker.
- `RdfIndexApp` records the failure on `jobData` for both the
  distributed and non-distributed code paths, so users see a real
  error message in the run history (e.g.
  "Failed to write entity X to Fuseki: ConnectException").

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* perf(rdf-index-app): port distributed-mode improvements from SearchIndex

The RDF distributed-indexing fork was lagging behind several SearchIndex
improvements that addressed concrete reliability and throughput issues.
Port them across:

Core perf / reliability
- Precomputed partition start cursors: coordinator walks each entity
  once via keyset pagination at job init and caches the boundary cursor
  per (jobId, entityType, rangeStart). Workers consult the cache before
  falling back to the OFFSET-based path. Eliminates the previous O(N²)
  per-partition cursor lookup.
- `cancelInFlightPartitions` + `requestStop` + `checkAndUpdateJobCompletion`
  on the coordinator. Stop now cancels both PENDING and PROCESSING
  partitions in a single SQL update and immediately drives the job
  status from STOPPING → STOPPED, so the UI status no longer hangs
  while workers drain.
- Selective field hydration: `RdfPartitionWorker.readEntitiesKeyset`
  uses `ReindexingUtil.getSearchIndexFields(entityType)` instead of
  `List.of("*")`, avoiding expensive fetchers (e.g. fetchAndSetOwns)
  per batch.
- Partition heartbeat thread: virtual thread refreshes
  `lastUpdateAt` every 30s for partitions actively being processed by
  this server, so the stale reclaimer no longer interrupts active work.
- `MAX_IN_FLIGHT_PARTITIONS_PER_SERVER = 5` backpressure: claim path
  rejects when the server already holds 5 PROCESSING partitions, giving
  fair distribution across pods. Verified the existing claim DAO uses
  `FOR UPDATE SKIP LOCKED` for both MySQL and Postgres.
- Gate WebSocket stat broadcasts during the STOPPING phase so the
  Quartz-scheduler-driven STOPPED status push isn't overwritten.

Multi-server scaffolding (single-pod is unaffected)
- `RdfPollingJobNotifier`: DB-polling discovery for other server pods
  to find an in-flight RDF reindex they can join.
- `RdfEntityCompletionTracker`: per-entity-type partition tracking with
  callback firing once all partitions for an entity complete, foundation
  for early per-entity index promotion.

Tests: precomputed-cursor cache lookup, in-flight backpressure,
cancelInFlight delegation, completion tracker callback semantics,
notifier start/stop.

DAO additions on `rdf_index_partition`:
- `cancelInFlightPartitions(jobId, now)` — covers both PENDING and
  PROCESSING in one statement
- `countInFlightPartitionsForServer(jobId, serverId)` — backpressure
- `countPartitionsByStatus(jobId, status)` — used by completion check

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ui-apps): hide misleading data on synthetic 'CurrentConfig' row

When an app has no run history, AppRunsHistory fabricated a synthetic
placeholder row that looked like a real run — `runType: "CurrentConfig"`,
a fake `Run At` timestamp pulled from `appData.updatedAt`, an
ever-growing `Duration` (`now − updatedAt`), and an active `Stop` button
that targeted nothing.

Render `--` for `Run At`, `Run Type`, and `Duration` on synthetic rows,
and hide the `Stop` button so users no longer see "Run now → 19-minute
Running with Stop button" when the actual job never registered. Real
app runs are unaffected — they still display `runType` from the
backend (OnDemandJob, Hourly, Daily, Custom, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): address PR review findings

Four issues raised in PR #27999 review:

- **Cursor format consistency in walkAndRecord** (bug):
  The defensive branch produced cursors via a custom `{name, id}` map
  while the regular path used `repo.getCursorValue()`. For entities
  with quoted names these encodings diverge — a quoted-name entity
  could land in the cache with a cursor incompatible with what the
  worker fetches via keyset pagination. Track the last seen entity
  reference and run it through `repo.getCursorValue()` in both paths.
  `encodeBoundaryCursor` is removed.

- **Adaptive scheduling in RdfPollingJobNotifier** (perf):
  The previous implementation woke the scheduler thread every 1s and
  short-circuited inside the poll method when idle. Reschedule the
  task at the appropriate interval (1s active / 30s idle) when
  `setParticipating` flips, so the thread genuinely sleeps when idle.

- **Cursor cache cleanup on startup recovery** (edge case):
  `partitionStartCursors` was only evicted by `refreshAggregatedJob`
  / `checkAndUpdateJobCompletion`. If a coordinator crashed mid-job
  and never reached either, the cache entry leaked until process
  restart. Add `evictStaleCursorCacheEntries()` invoked by
  `performStartupRecovery` that drops entries for jobs that no longer
  exist in the DB or are already terminal.

- **Consolidate describeError helpers** (quality):
  `describeError`, `describeBulkError`, and `describeLineageError` in
  `RdfBatchProcessor` all walked the cause chain and formatted a
  prefixed message with the same logic. Reduced to a single
  `describeError(prefix, error)` plus a thin `describeEntityError`
  adapter for the per-entity call site.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf-index-app): avoid double workerExecutor.shutdownNow() in stop()

stop() called workerExecutor.shutdownNow() inline AND through
cleanupLocalExecution -> shutdownWorkerExecutor, which broke the
DistributedRdfIndexExecutorTest.stopAndCoordinatorCleanupOnlyTearDownLocalExecutionOnce
verify(workerExecutor, times(1)).shutdownNow() expectation. Drop the
inline call — cleanupLocalExecution is the single owner of the
shutdown path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci: drop redundant DB matrix from openmetadata-service unit tests

The {mysql, postgresql} strategy matrix on openmetadata-service unit
tests doubled CI cost without adding signal: both jobs ran the same
surefire suite. The `-Pmysql` / `-Ppostgresql` profiles are defined
only in `openmetadata-sdk/pom.xml` (lines 190-206), set a single
`test.database` property, and that property is consumed exclusively by
the failsafe plugin (integration tests `*IT.java` / `*IntegrationTest.java`),
which only runs under `-Pintegration-tests` — not enabled here.

`openmetadata-service` itself has zero tests that read `test.database`
or use `MySQLContainer`/`PostgreSQLContainer` (verified by grep). The
only testcontainer-based DB code in the repo lives in
`openmetadata-integration-tests`, a different module that this workflow
doesn't build.

Run the unit suite once. The `openmetadata-service-unit-tests-status`
required-check aggregator is unaffected (it depends on the renamed job
which still has the same name).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): address Copilot PR review findings

Six correctness issues raised on PR #27999:

- **Lineage-details DELETE was too broad** (RdfRepository): the cleanup
  step deleted *all* `<fromUri> om:hasLineageDetails ?d` triples,
  so reindexing one (fromId, toId) edge wiped lineage-details links
  for every other downstream of the same source entity. Pin the
  delete to the specific `<fromUri> om:hasLineageDetails <detailsUri>`
  triple. Same with prov:generated cleanup — anchor it to the
  specific detailsUri instead of any details resource.

- **Predicate not flipped during canonicalization** (RdfRepository):
  `parseEntityGraphEdgesFromResults` swapped subject/object for
  reverse-direction predicates (`prov:wasDerivedFrom`,
  `prov:wasInfluencedBy`) but kept the original predicate URI on the
  resulting EdgeInfo. Exported graphs could carry semantically
  invalid triples like `<upstream> prov:wasDerivedFrom <downstream>`.
  Add `forwardEquivalentPredicate` to substitute the OM-native
  forward predicate when the direction flips.

- **`dct:modified` was an invalid xsd:dateTime** (RdfPropertyMapper):
  `entity.getUpdatedAt().toString()` returns the epoch-millis Long as
  a string, but the literal was tagged `xsd:dateTime`. Convert via
  `Instant.ofEpochMilli(...).toString()` so the lexical form matches
  the type — same fix already in place for prov:invalidatedAtTime.

- **Unmapped EntityReference arrays were dropped entirely**
  (RdfPropertyMapper): the previous fix to skip noisy JSON-string
  literals also dropped fields like `domains`, `reviewers`, `voters`
  for entity contexts that don't have a JSON-LD mapping for them —
  the unmapped path was the only path emitting them, so nothing
  landed in RDF. Expand each array element through
  `addEntityReference` so the data still produces proper
  `om:<fieldName> <ref>` triples; mapped-path duplicates are
  collapsed by Jena's Model dedupe.

- **Partition failure detection missed reader errors**
  (DistributedRdfIndexExecutor): the EntityCompletionTracker was fed
  `result.errorMessage() != null`, but `RdfPartitionWorker` can
  increment `failedCount` from `readerErrors` without ever setting
  `lastError`. Use `result.failedCount() > 0` so partitions whose
  failures came from `ResultList.getErrors()` are also marked as
  failed when promoting an entity.

- **`COMPLETED_WITH_ERRORS` was hidden when failedRecords == 0**
  (RdfIndexApp): the coordinator marks a job COMPLETED_WITH_ERRORS
  whenever any partition is FAILED or CANCELLED, including for
  user-initiated stops where no record-level failures accrued. The
  monitor's `completedWithErrors` gate required `failedRecords > 0`,
  so those terminal states never hit `jobData.setFailure(...)` and
  the run record showed success. Drop the failedRecords precondition
  and tailor the fallback message based on whether there are
  record-level failures or partition-level only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): separate relationship failures + type lineage as prov:Activity

Two more PR review findings on #27999:

- **Relationship failures inflated failedRecords stat**: `processEntities`
  was folding relationship/lineage edge failures into `failedCount`,
  which becomes `failedRecords` in the index stats. Records there mean
  entities, computed from entity counts in `totalRecords`. Counting
  per-edge relationship failures could push `failedRecords` above
  `processedRecords`/`totalRecords` and produce nonsensical
  per-entity stats.

  Track them separately: add `relationshipFailureCount` to
  `BatchProcessingResult` and `PartitionResult`. `failedCount` now stays
  entity-level. The completion tracker is fed the broader
  `result.hasAnyFailure()` so partitions where relationship triples
  failed don't get prematurely promoted as success even though their
  entity writes succeeded.

- **`detailsResource` wasn't typed as prov:Activity**: the resource
  carries Activity-shaped predicates (prov:startedAtTime,
  prov:endedAtTime, prov:used, prov:hadPlan, prov:wasGeneratedBy,
  prov:wasAssociatedWith) but only the OM-specific
  `om:LineageDetails` rdf:type. Add an explicit
  `rdf:type prov:Activity` so PROV-O reasoners and federated SPARQL
  clients recognize it as an Activity without having to learn the
  OM type.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): label lineage edges relative to focal node

The Knowledge Graph view was labeling every edge with relation
type "upstream" as "Upstream" regardless of direction relative to the
focal node. For a focal node F, the raw stored relation `(F, X, upstream)`
means "F is upstream of X" — i.e. X is *downstream* of F. The previous
output labeled both `F → X` and `X → F` edges as "Upstream", which made
bidirectional lineage look like a duplicated relation.

Re-orient the label in `convertEdgesToGraphData` based on whether the
focal is the edge's source or target:
- focal → X → "Downstream"
- X → focal → "Upstream"
- non-focal-touching edges keep the raw relation label.

Reported on a sample-data table with a circular lineage cycle
(`dim_customer ↔ fact_orders`) where both directions showed "Upstream".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): close remaining Copilot review gaps

Three findings from PR #27999's third review pass — all about failure
signals being silently dropped between layers:

- **`RdfIndexApp.processTask` ignored relationship failures**: only
  `result.failedCount() > 0` was treated as a failure, so partitions
  whose Fuseki relationship/lineage writes failed (incrementing
  `relationshipFailureCount` but not `failedCount`) never wrote
  `jobData.failure`. Switch to `result.hasAnyFailure()` and report the
  combined count.

- **`checkAndUpdateJobCompletion` ignored partition `lastError`**: a
  partition can finish COMPLETED with `lastError` set when a relationship
  bulk write was caught and recorded but didn't bump `failedRecords` or
  flip the partition to FAILED. The job would then go to COMPLETED even
  though there were real failures. Treat the presence of any
  `rdf_index_partition.lastError` as an error signal — promote to
  COMPLETED_WITH_ERRORS and aggregate sample errors into the job's
  errorMessage if it was blank.

- **`forwardEquivalentPredicate` mapped to a non-existent
  `om:DOWNSTREAM` URI**: OpenMetadata only stores lineage with
  `om:UPSTREAM` (forward) and `prov:wasDerivedFrom` (reverse PROV-O
  pair); there is no `om:DOWNSTREAM` predicate written anywhere — the
  downstream view is derived by reading the same UPSTREAM edge from the
  other side. Map both `prov:wasDerivedFrom` and `prov:wasInfluencedBy`
  to `om:UPSTREAM` (both are reverse-direction causation predicates: in
  `B wasDerivedFrom A` / `B wasInfluencedBy A` the source is A and
  effect is B, so the canonical forward predicate is the same).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix RDF tag mapper

* Fix all the comments

Cherry-picked from #27562 (without bin/ autogenerated noise).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Align RdfPropertyMapper tests with refactor and isolate ontology export IT

RdfPropertyMapperTest still referenced the removed addVotes helper and
expected addStructuredProperty to dispatch votes — both gone after votes
was added to IGNORED_PROPERTIES. Update the assertions accordingly.

GlossaryOntologyExportIT timed out on the full suite because it flips a
global RDF singleton in @BeforeAll and each test blocks a server thread on
synchronous Fuseki writes. SAME_THREAD only serialized methods within the
class — concurrent classes still raced for server threads. Adding @Isolated
matches the pattern already used by RdfResourceIT for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rdf): align addCertification typing + relationType after predicate flip

Two findings on PR #27999 from the post-cherry-pick review pass:

- **`addCertification` mis-typed glossary-source certifications and
  skipped skos:Concept**: it always emitted `om:Tag` regardless of
  source, even though `resolveTagResource` returns a glossaryTerm URI
  when the certification points at a glossary term. It also didn't add
  `skos:Concept` (or the `createTypeResource("tag")` `skos:Concept` for
  classification tags), so SPARQL queries filtering certification
  targets by `a skos:Concept` missed them while `addTagLabel`-emitted
  tags were findable. Mirror `addTagLabel`: branch on source
  (`Glossary` vs `Classification`), emit the right primary type plus
  `skos:Concept` (glossary) or `om:Tag` (classification), and include
  `om:tagSource`.

- **`relationType` left stale after predicate flip**: when
  `parseEntityGraphEdgesFromResults` flipped subject/object for a
  reverse-direction predicate and rewrote `canonicalPredicate` to
  `om:UPSTREAM`, it kept the original `relationType` derived from the
  reverse predicate. So `prov:wasInfluencedBy` produced an EdgeInfo
  with `relationType=downstream` + `predicate=om:UPSTREAM` —
  internally inconsistent, and the mismatched `edgeKey` prevented
  dedup against an existing UPSTREAM edge with the same endpoints.
  Re-derive `relationType` from the canonical predicate after the
  flip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): close 2 review findings + add parser-helper unit tests

Two outstanding Copilot findings on PR #27999 plus targeted unit
coverage for the helpers that drive lineage canonicalization.

Findings:

- **`colLineageUri` collision risk** (RdfRepository): the deterministic
  key replaced non-alphanumerics in `toColumn` with `_`, so distinct
  column names (e.g. `a-b` vs `a_b`) collapsed onto the same URI, which
  would lose / overwrite column-lineage resources during reindex.
  Append the loop index as a tiebreaker so distinct columns keep
  distinct URIs.

- **`createTypeResource` missing dprod prefix** (RdfPropertyMapper):
  the `getNamespace` switch didn't recognize `dprod`, so
  `RdfUtils.getRdfType("dataProduct")` (returns `dprod:DataProduct`)
  produced an invalid `dprod:DataProduct` URI on the wire. Added the
  `DPROD_NS = https://ekgf.github.io/dprod/` constant and a `dprod`
  case in the switch.

Coverage:

- New `RdfParserHelpersTest` exercises the canonicalization helpers
  via reflection: `isReverseDirectionPredicate` (recognizes
  PROV-O causation predicates, ignores forward predicates),
  `forwardEquivalentPredicate` (both `wasDerivedFrom` and
  `wasInfluencedBy` collapse to `om:UPSTREAM` so dedup works),
  `relativeRelationLabel` (focal-relative Upstream/Downstream
  flipping with all the boundary cases — non-focal edges,
  non-lineage relations, null focal).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): merge array contexts before per-field resolution

The third (low-confidence "suppressed") finding on review 4256830399
turned out to be a real duplication: when a field is mapped in one
context map of an array context but absent from another, the previous
processArrayContext ran processContextMappings once per map. The pass
where the field IS mapped emits the proper `om:hasOwner <ref>` triples
(plus `prov:wasAttributedTo`); the pass where the field is absent
falls through to processUnmappedField and emits an additional
`om:owners <ref>` triple. Net: two predicates for the same logical
relationship.

Verified on the live Fuseki: 113 `om:hasOwner` triples vs 112
`om:owners` triples — one set per pass.

Fix: flatten all context maps in the array into a single merged map
once, then iterate entity fields exactly once against that combined
view (later contexts win on key conflicts, matching JSON-LD context
merge semantics). Each field is resolved against the union of
mappings, so the unmapped fallback only fires for fields truly absent
from every context. Net effect: `prov:wasAttributedTo` count is
unchanged, `om:hasOwner` is unchanged, and the redundant `om:owners`
triples disappear.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(rdf): close 2 review findings on coordinator finalization race

Two findings from PR #27999 review 4259628860:

- **`checkAndUpdateJobCompletion` early-returned before lastError check
  could promote**: `refreshAggregatedJob` already marks the job COMPLETED
  when partitions all finish without `failedRecords`/`failedPartitions`,
  so `checkAndUpdateJobCompletion`'s subsequent `if (job.isTerminal())`
  short-circuit silently dropped the lastError signal. Move the
  partition-lastError check INTO `refreshAggregatedJob` so both code
  paths produce consistent terminal status — a partition that finished
  COMPLETED but carries a non-null lastError now correctly promotes the
  job to COMPLETED_WITH_ERRORS regardless of which finalizer wins the
  race.

- **`completePartition` / `failPartition` overwrote CANCELLED state**:
  the unconditional partition row update lost a concurrent Stop's
  CANCELLED status if a worker finished its batch after the Stop
  request landed but before noticing it. Add a status-guarded
  `updateIfProcessing` DAO method (UPDATE ... WHERE id = :id AND
  status = 'PROCESSING') and have both completion paths use it; if 0
  rows update, log and skip the side effects (no server-stat increment,
  no refreshAggregatedJob call) so the authoritative CANCELLED status
  stays. Mirrors the pattern SearchIndex's coordinator uses for the
  same race.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2026-05-11 06:14:50 -07:00
Harsh Vador
86e1d88386
security: Include branch name in security scan Slack alerts and fail only on high vulnerabilities (#27977)
* Add branch context to security scan Slack alerts and upload CSV findings summary

* change failing severity from medium to  high & address gitar

* fix csv formatting

* revert flattening changes
2026-05-11 10:41:48 +05:30
Sriharsha Chintalapani
b837ade95a
docs(github): require issue link, design, tests, UI recording in PR template (#27891)
Expands `.github/pull_request_template.md` to require a linked issue, a
high-level design (for large PRs), a structured Tests section (use cases,
unit + coverage %, backend/ingestion integration tests, Playwright, manual
steps), and a UI screen recording for any UI change. Adds a `/pr-checklist`
skill that walks the template, gathers evidence, and drafts the PR body
before opening via `gh pr create`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 08:05:56 +02:00
mohitdeuex
fefa998b0a Add MockOidcServer testcontainer for SSO test infrastructure (S0 spike)
First step of the SSO-flow testing initiative. Wraps navikt/mock-oauth2-server
as a testcontainer wired into the OM Docker network under alias om-mock-idp
on port 1080.

The same URL — http://om-mock-idp:1080/<issuer> — is used by:
  - the OM container (via Docker network alias)
  - the Playwright browser on the host (via /etc/hosts loopback entry)
  - the iss claim in tokens issued by the mock IdP

so token validation, browser redirects, and OIDC discovery all line up
against one source of truth — required for the public/id_token flow where
the browser receives the token directly and iss is derived from the URL it
hit.

Setup cost: one /etc/hosts line (127.0.0.1 om-mock-idp), added once per
machine. CI workflow does it automatically. MockOidcServer.launch() throws
with a clear remediation message if the entry is missing.

MockIdpSmokeUIIT validates the network premise end-to-end: starts the
container standalone and confirms discovery + JWKS endpoints respond from
the host JVM with the expected om-mock-idp issuer URL.

Next (S1): SsoProfile sealed interface, ContainerizedServer.launch(profile)
overload, and the first Google SSO end-to-end test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:49:21 +05:30
mohitdeuex
a0e501ba11 Add openmetadata-java-playwright scenario test module
Phase 1 of EPIC #3731: Java-driven E2E scenarios for reindex + UI tests.
Reuses TestSuiteBootstrap as a test-jar dependency. Three execution modes:
- embedded (in-JVM, fast, backend-only)
- containerized (Testcontainers + prod server image, UI capable)
- external (connect to a running stack)

Includes 3 backend reindex scenarios (full / incremental / orphan cleanup),
1 Playwright UI scenario (search-after-reindex in Explore), and 2 CI
workflows (PR path-filter + nightly cron).

Satisfies #3767 and #3792.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:16:51 +05:30
Ariel Schulz
297c01cea7
Fix (#27660): Re-enable Exasol cli-e2e-tests after fixing issues (#27661)
* Re-enable Exasol cli-e2e-tests after fixing issues

* Revert accidental changes from branch switch

* Adapt exasol.yml for tests

* Add get_table_comment setup and re-enable test_vanilla_ingestion

* Add type hints to maintain signature

* SQLA-E does not include get_all_table_comments and will come later, so ignore for now

* Add return type too
2026-05-06 17:11:53 +05:30
Mayur Singal
60a2e6546e
Migrate Databricks from sqlalchemy-databricks to databricks-sqlalchemy (#26896)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
* Update Databricks Dependency to databricks-sqlalchemy

* Update generated TypeScript types

* address comments and pyformat

* pyformat

* fix log filtering

* address comments

* fix static unit tests

* fix rule for static type

* pyformat

* update baseline

* revert basepyright changes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
2026-05-04 18:53:24 +05:30
Sid
ca2d0122db
test(playwright): add nightly SAML session renewal coverage (#27619)
* test(playwright): add nightly SAML session renewal spec

Covers OM's JWT refresh behavior for SAML sessions end-to-end against
the local Keycloak fixture: silent refresh after expiry, concurrent
401s queuing behind a single refresh call, and forced re-login when
the server-side SAML HttpSession is gone.

Reuses the snapshot/restore mechanism and keycloak-azure-saml provider
helper introduced in #27164; shortens samlConfiguration.security.token
Validity to 10s so the suite observes multiple expiry cycles in <60s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update openmetadata-ui/src/main/resources/ui/playwright/utils/sessionRenewal.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* test(playwright): drop expiry wait from refresh-on-reload SSO specs

The reactive 401 refresh path races with the AuthProvider useEffect that
wires tokenService.renewToken from authenticatorRef — if the 401 from
/users/loggedInUser lands before that effect commits the populated ref,
refreshToken() returns null and the user is logged out instead of refreshed.

With tokenValidity=10s (< EXPIRY_THRESHOLD_MILLES=60s), the UI's proactive
timer in startTokenExpiryTimer fires immediately on every mount, so
/auth/refresh is exercised on each reload regardless of expiry state.
Assertions on token rotation and session continuity still cover "silent
refresh works end-to-end".

The SAML-session-gone case still waits for expiry — it needs to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(playwright): trigger refresh via SPA nav in SSO renewal specs

page.reload() remounts React and re-races the axios interceptor setup
in AuthProvider — the useEffect that wires authenticatorRef.renewIdToken
onto TokenService has a ref-typed dependency that doesn't reliably
re-run, so the first 401 after reload sometimes finds renewToken=null
and the interceptor silently logs the user out instead of refreshing.

Click the Explore sidebar link instead. The click triggers authenticated
API calls while staying inside the already-mounted React tree, so the
interceptor always reaches the wired TokenService. Spec now passes
10/10 locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Siddhant <siddhant@MacBook-Pro-621.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-05-04 11:48:45 +05:30
Chirag Madlani
d095413ed1
fix(ci): nightly workflow running stale project getting failed [skip-ci] (#27849)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
2026-05-04 10:53:16 +05:30
miriann-uu
7b01731754
GEN-5164: Add cherry pick matrix (#27674) 2026-04-29 10:39:31 +05:30
Teddy
11e5ac95d4
chore: update sqlalchemy to 1.0.0 (#27776)
Some checks are pending
Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run
Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions
Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run
Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions
Java Checkstyle / java-checkstyle (push) Waiting to run
Maven Collate Tests / maven-collate-ci (push) Waiting to run
OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions
OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions
OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions
Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run
2026-04-28 11:07:26 -07:00
IceS2
e9c87c6adb
chore(ingestion): drop pylint, expand ruff (#27774)
* chore(ingestion): drop pylint, expand ruff to Stage 2c

Replace pylint with a coherent ruff-only stack (Stage 2c of the modernize
roadmap). Pylint is dropped from dev deps and CI workflows; ruff selected
ruleset expanded to ~22 families covering style, bug catchers, hygiene,
and the pylint port (PLE/PLC/PLW/PLR with the noisy "too-many-X"
complexity caps + magic-value disabled).

What's selected (with rationale in pyproject.toml):
  E, W, F, I, N         — style + correctness baseline + naming
  UP                    — pyupgrade (py>=3.10 modernizations)
  B, C4, C90, RET, SIM, TRY  — bug catchers
  PIE, ICN, T20, TC, TID, PTH, PERF  — hygiene
  PLE, PLC, PLW, PLR    — pylint port (PLR complexity caps ignored)
  RUF                   — ruff-native (incl. RUF100 unused-noqa)

What's removed:
  - .pylintrc (root) — duplicate of the ingestion pylint config
  - [tool.pylint.*] block in ingestion/pyproject.toml (~140 lines)
  - ingestion/plugins/{print_checker,import_checker}.py + tests + README
    (replaced by built-in T20 + TID251 banned-api respectively)
  - pylint dep from ingestion/setup.py and openmetadata-airflow-apis/pyproject.toml
  - `make lint` Makefile target + the pylint invocation in py_format_check
  - dead pylint TODO comment + ignored test entry in noxfile.py

Cwd-stable config: ruff is invoked both from the repo root (pre-commit,
CI) and from ingestion/ (`make py_format_check`). The `src`,
`extend-exclude`, and per-file-ignores entries are listed twice — once
relative to ingestion/ and once with the `ingestion/` prefix — so
first-party isort detection and exclusions match in both invocations.

Grandfathering: ran `ruff check --add-noqa` once + format-stable
iteration. ~12,130 noqa directives across ~1,400 files. Cleanup is
deferred to follow-up PRs that drop noqas one rule at a time.

Documentation sweep: replaced `make lint` references in CLAUDE.md,
AGENTS.md, DEVELOPER.md, copilot-instructions, and 6 SKILL files with
the apply+verify shape `make py_format && make py_format_check`.
`make py_format` is NOT a strict superset of pylint — it only applies
auto-fixable violations; `make py_format_check` catches the rest.

Basedpyright baseline regenerated: ruff format reflowed multi-line
signatures in ~70 files, shifting type-error column positions. The
basedpyright baseline matches by (file path, error code, range), so
column shifts caused 19 entries to mis-align. Net diff is small
(154 lines in/out of the 13MB baseline.json) — purely positional.

Verified locally:
  - make py_format_check         → All checks passed
  - nox --no-venv -s static-checks → 0 errors, 0 warnings, 0 notes

* chore(ingestion): finish ruff swap — nox lint session + skill docs

Three remaining stale-tooling references after Stage 2c:

  - `ingestion/noxfile.py` `lint` session was still calling `black --check`,
    `isort --check-only`, `pycln --diff`. Those tools aren't installed
    anywhere (we dropped them from dev deps). Replace with the ruff
    equivalents that mirror `make py_format_check`.
  - `skills/standards/code_style.md`: stack listed as `black + isort +
    pycln`; line length claimed 88 (black default). Both wrong: stack is
    ruff, line length is 120.
  - `skills/connector-building/SKILL.md`: `make py_format` comment said
    `# black + isort + pycln`. Same swap.

* chore(ingestion): keep main's baseline + globally ignore TRY400

Per gitar-bot's review on PR #27774:

1. Main's PR #27728 promoted ~60 `logger.warning()` → `logger.error()`
   inside `except` blocks. Those changes landed on main with their own
   baseline updates. Our PR doesn't promote anything — the merge from
   origin/main brought those `error` calls along with their baseline
   entries.

   The bot interpreted the `# noqa: TRY400` we added next to those lines
   as us silencing the rule case-by-case. Cleaner: globally ignore
   TRY400 in pyproject.toml, with a comment explaining why the codebase's
   `logger.error(...)` + separate `logger.debug(traceback.format_exc())`
   pattern is intentional. Strip ~430 per-line `# noqa: TRY400` markers
   from source.

2. Document that `S101` in `per-file-ignores` is a forward-looking
   entry — flake8-bandit (`S`) is not yet selected, so the rule is
   no-op today; the entry stays so when `S` lands later, tests don't
   immediately error.

Reverts the platform pin and Linux Docker–generated baseline. Keep
main's baseline intact and let CI surface the exact column-shifted
entries; the team will decide whether to fix in-place (revert format
on affected files) or add per-line `# pyright: ignore` markers.

* chore(ingestion): regen baseline for new connector type debt

Main's baseline was stale relative to recently-added connectors
(McpConnection, CustomDriveConnection) that lack common attributes
like `hostPort`, `database`, `catalog` etc. — all sites that access
those attributes via the union-typed `serviceConnection.root.config`
fire `reportAttributeAccessIssue` errors that aren't baselined.

71 errors + 58 warnings absorbed. Local macOS regen; pushing to see
CI's drift count. Per the basedpyright-baseline-and-ci PR experience,
macOS↔Linux column drift on this size of regen has historically been
1-7 residuals.
2026-04-28 07:21:59 +02:00
IceS2
84ed278720
chore(ingestion): enable basedpyright across the codebase via baseline (#27755)
* chore(ingestion): enable basedpyright across the codebase via baseline

Removes the ~25 paths from `[tool.basedpyright] ignore` (which excluded
roughly 90% of the codebase from type checking) and grandfathers the
existing violations into a baseline file. New violations in any
previously-ignored file now fail CI.

Changes:
- ingestion/pyproject.toml: drop the entire `ignore = [...]` block
- ingestion/setup.py: bump `basedpyright~=1.14` to `~=1.39.0`
- ingestion/.basedpyright/baseline.json (new, ~13MB): captures the
  starting violation set (~18.8K errors + ~37.4K warnings) so the
  migration is behavior-preserving. Regenerate with
  `cd ingestion && basedpyright -p pyproject.toml --baselinefile
  .basedpyright/baseline.json --writebaseline`. basedpyright analysis
  has minor non-determinism (similar to ruff's), so re-running
  --writebaseline a few times converges the baseline.
- ingestion/noxfile.py: pass `--baselinefile .basedpyright/baseline.json`
  to the basedpyright invocation in the `static-checks` session so CI
  honors the grandfathering. CI already runs the session via
  `cd ingestion && nox --no-venv -s static-checks` (py-tests.yml).
- ingestion/Makefile: `make static-checks` now delegates to
  `nox -s static-checks` so local invocations match CI exactly. Also
  drops the dead Python 3.9 / OM_SKIP_SDK_PY39 branch (we require
  Python >=3.10 since the previous modernization PR).
- .gitignore: add `.serena/` (local language-server cache)

* chore(ingestion): add nox to the dev dependency set

The static-checks Makefile target and the py-tests CI job both delegate
to `nox -s static-checks`, but nox was being installed as a separate
side step (`pip install nox` in `install_dev_env`, `uv pip install nox`
in the test-environment composite action). Listing it in dev extras
means a plain `pip install ingestion[dev]` brings it in.

* chore(ingestion): pin basedpyright analysis to py3.10; CI runs once

Following the basedpyright + multi-Python-version research:

- ingestion/pyproject.toml: add `pythonVersion = "3.10"` to
  [tool.basedpyright] so type-checking always analyzes for the lowest
  supported Python version. Forward-incompatible code (tomllib usage,
  PEP 695 generics, etc.) is caught at type-check time regardless of
  which Python interpreter runs the checker.
- .github/workflows/py-tests.yml: gate the "Run Static Checks" step on
  `matrix.py-version == '3.10'`. With pythonVersion pinned, results are
  identical across the matrix; running once avoids redundant work and
  keeps the baseline file deterministic. Unit tests still run on the
  full 3.10/3.11/3.12 matrix to verify runtime compatibility.
- ingestion/.basedpyright/baseline.json: regenerated cleanly with the
  new pythonVersion config (~18.8K errors / ~37.3K warnings, similar
  scale to the previous baseline). Aligns with the canonical
  type-check-on-floor / test-on-matrix pattern used by Pydantic, CPython,
  and other major Python projects.

* chore(ingestion): pin basedpyright pythonPlatform to Linux + regen baseline

CI's previous run still surfaced ~9 issues (2 errors + 7 warnings) that
weren't in the baseline. Root cause: my local environment differs from
CI's in three ways that affect type inference — Python interpreter
(3.11 vs 3.10), platform (Darwin vs Linux), and pip-resolved package
versions (couchbase, avro, trino, sqlalchemy stubs all differ slightly).

This commit closes the platform gap and regenerates the baseline from a
fresh CI-equivalent environment:

- ingestion/pyproject.toml: add `pythonPlatform = "Linux"` to
  [tool.basedpyright] so type-checking uses the Linux subset of stdlib /
  third-party stubs regardless of where the analyzer runs.
- ingestion/.basedpyright/baseline.json: regenerated against a fresh
  Python 3.10 venv installed via `uv pip install ingestion[test]` (the
  same install path CI's setup-openmetadata-test-environment composite
  action uses). New scale: ~18.7K errors / ~37.5K warnings — same
  ballpark as the previous baseline, with column positions now matching
  CI's environment.

Local-developer note: when running `make static-checks` from a venv
that doesn't mirror CI exactly (e.g. macOS, Python 3.11, different
package versions), you may see drift errors. The supported workflow for
regenerating the baseline is to mirror CI:
  python3.10 -m venv /tmp/ci-mirror
  source /tmp/ci-mirror/bin/activate
  uv pip install --upgrade pip "setuptools<81"
  uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
  uv pip install -e "ingestion[test]"
  uv pip install "basedpyright~=1.39.0" nox
  cd ingestion && basedpyright -p pyproject.toml \
      --baselinefile .basedpyright/baseline.json --writebaseline

* chore(ingestion): drop pythonPlatform pin and regen baseline from CI-mirror

The previous attempt added `pythonPlatform = "Linux"` thinking it would
make the local-generated baseline match CI. It did the opposite — Linux
platform stubs activate additional conditional code paths that weren't
analyzed before, so CI saw 101 errors instead of the prior 2 errors.

Reverting:
- Drop `pythonPlatform = "Linux"` from [tool.basedpyright]. Without it,
  basedpyright analyzes for the host platform; on CI's ubuntu-latest
  runner that's Linux automatically, but type-stub coverage stays the
  same as before (matching the d9196dff6b baseline).
- Regenerate ingestion/.basedpyright/baseline.json against a fresh
  Python 3.10 venv installed via `uv pip install ingestion[test]`
  (mirroring CI's setup-openmetadata-test-environment composite action).
  ~18.8K errors / 37.7K warnings captured — same scale as the working
  d9196dff6b version.

Local-developer note: any baseline regeneration done on macOS will drift
from CI's Linux env (different transitive package versions, different
stubs). The supported local mirror procedure:
  python3.10 -m venv /tmp/ci-mirror
  source /tmp/ci-mirror/bin/activate
  uv pip install --upgrade pip "setuptools<81"
  uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
  uv pip install -e "ingestion[test]"
  uv pip install "basedpyright~=1.39.0" nox
  cd ingestion && basedpyright -p pyproject.toml \\
      --baselinefile .basedpyright/baseline.json --writebaseline

* chore(ingestion): regen baseline from full CI install (mac arm64 mirror)

Prior CI-mirror only installed [test], skipping [all] and the four
--no-deps SA pins (sqlalchemy-redshift/databricks/ibmi, pydoris-custom).
That left ~75 connector packages out of the analysis env, so basedpyright
couldn't resolve types from databricks.sqlalchemy, GE 0.18 Batch,
sklearn BaseEstimator, airflow SQLAlchemy models, pandas/numpy stubs,
etc. CI saw 129 errors absent from the baseline.

Regenerated against a fresh py3.10 venv that mirrors
.github/actions/setup-openmetadata-test-environment exactly:
  uv pip install ./ingestion[dev]
  make generate
  uv pip install "setuptools<81"
  uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
  uv pip install --no-deps sqlalchemy-redshift==0.8.14 \
                            sqlalchemy-databricks==0.2.0 \
                            sqlalchemy-ibmi==0.9.3 \
                            pydoris-custom==1.1.0
  uv pip install ./ingestion[all]
  uv pip install ./ingestion[test]
  uv pip install nox

First run: 128 errors, 272 warnings — within 1 error of CI's 129/272.
Wrote baseline with 56,100 entries across 1,035 files. Verify run with
the new baseline reports 0/0/0.

macOS arm64 vs Linux x86_64 wheel resolution may leave a small residual
(~3-7 errors per the d9196dff6b precedent). Re-run --writebaseline 2-3x
if any show up in CI.

* chore(ingestion): silence avro.py:95 basedpyright residual

CI's Linux fastavro stub returns Schema as `str | List[Any]`, while
the macOS arm64 wheel narrows to `str` — the only error not absorbed
by the regenerated baseline. Add a targeted pyright: ignore on the
parse_avro_schema call instead of broadening behavior.

* chore(ingestion): tolerate cross-platform pyright ignore drift

CI's `--baselinemode=lock` (default) requires the baseline to match
exactly — neither up nor down. Two related issues:

1. The avro.py noqa silenced not just the surfaced error but 10
   cascading entries at line 95 (sub-errors propagating from the
   unresolved `schema` arg type). Baseline went `down by 10` → lock
   violated → exit 3 even with `0 errors` reported. Regenerate baseline
   so the 10 stale entries are dropped.

2. The macOS arm64 fastavro stub doesn't surface that error in the
   first place, so basedpyright treats the noqa as
   `reportUnnecessaryTypeIgnoreComment` locally — causing the opposite
   lock mismatch on CI (a warning entry that doesn't exist there).
   Disable the rule so platform-specific residuals can land without
   flapping between local and CI.

* chore(ingestion): use --baselinemode=discard for cross-platform tolerance

CI's implicit default is `lock`, which fails on any baseline change in
either direction (errors going up *or* down) via console.error → exit 3.
That cannot accommodate macOS arm64 vs Linux x86_64 stub drift: a
baseline regenerated locally always carries some entries that don't fire
on CI (and vice versa).

`auto` would tolerate the drift but silently overwrites the baseline
file — unacceptable in CI, where unreviewed changes never get committed
back.

`discard` is the right balance:
  - New errors not in the baseline still fail the run (early-return path
    in BaselineHandler.write before the lock/discard branch).
  - Stale baseline entries (errors that no longer fire on the current
    platform) print an info message and exit 0.
  - The baseline file is never modified.
2026-04-27 17:15:44 +02:00
IceS2
1fa0c79d27
chore(github): migrate issue templates to structured forms (#27710)
* chore(github): migrate issue templates to structured forms

- Convert bug_report, feature_request, doc_update to GitHub issue forms (YAML)
- Add connector_bug form with free-text Connector field
- Drop epic and feature_task templates (stale since 2022, no usage evidence)
- Add auto-label workflow that maps the Connector field to a namespaced
  connector:<name> label, falling back to connector:other on 0 or 2+ matches
- Labels are applied exclusively and auto-created with a grey "Connector"
  description when missing

* chore(github): drop redundant pipeline type field from connector_bug form

Feature area already covers metadata/lineage/profiler/usage distinction.

* fix(github): address PR review feedback

- bug_report.yml: add labels: ["bug"] for pattern consistency
- label-connector.yml: add contents: read permission (needed by checkout)
- label_connector.py: raise on unexpected HTTP status; accept 404 for
  idempotent GET-label and DELETE-label-from-issue; stop echoing the
  raw Connector field value into workflow logs
2026-04-24 14:08:20 +02:00
Mayur Singal
878421a644
fix: enable subprocess coverage tracking for CLI E2E tests (#27329)
* fix: enable subprocess coverage tracking for CLI E2E tests

CLI E2E tests run connectors via `subprocess.Popen("metadata ingest")`
but the subprocess coverage data was silently lost. Two issues:

1. Missing `parallel = true` in coverage config — parent pytest process
   and child subprocess both wrote to the same `.coverage` file, causing
   data collision. With parallel mode, each process writes to its own
   `.coverage.<pid>` file that `coverage combine` can merge.

2. `COVERAGE_PROCESS_START` used a relative path (`ingestion/pyproject.toml`)
   in sitecustomize.py. Resolved to absolute using `GITHUB_WORKSPACE`.

Evidence: Metabase (zero unit tests, only E2E) shows 53.6% on SonarCloud
with client.py at 17.2% — inspection of .coverage.metabase confirms only
import-time + in-process setup lines are present, with zero method body
coverage from the subprocess execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove -a (append) flags incompatible with parallel coverage mode

`coverage run -a` and `coverage combine -a` conflict with `parallel = true`
in the coverage config. In parallel mode each process writes to its own
`.coverage.<pid>` file, and `coverage combine` merges them — no append needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* MINOR: Fix snowflake e2e (#26677)

* MINOR: Fix snowflake e2e

* fix pyformat

* improve snowflake test

* fix count

* mark flaky auto classification test

* improve test address comment

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 06:57:30 +02:00
Sid
0a98f5bf32
test(playwright): add nightly SSO login spec starting (#27164)
* test(playwright): add nightly SSO login spec starting with Okta

Extends Playwright coverage end-to-end for SSO login flows. Today's SSO
coverage (Features/SSOConfiguration.spec.ts) only asserts the config
form UI. This adds a new suite that configures OpenMetadata to an
external identity provider, drives a real login through the provider's
hosted UI, and validates the resulting session against the OM API.

Phase 1 ships Okta only (integrator-9351624.okta.com). Additional
providers (Auth0, Azure, Cognito, SAML, Google) plug into the same
dispatcher by adding a ProviderHelper implementation.

## What's new

- playwright/e2e/Auth/SSOLogin.spec.ts — two-test suite tagged @sso
  1. Asserts the SSO sign-in button renders on /signin with the correct
     brand label and that the basic-auth form is not shown.
  2. Clicks the button, drives the provider's login widget, follows the
     OAuth callback, completes first-run self-signup when needed,
     lands on /my-data, then verifies the JWT by calling
     GET /api/v1/users/loggedInUser and asserting the returned email
     matches SSO_USERNAME.

- playwright/utils/ssoAuth.ts — provider-agnostic orchestration:
  applyProviderConfig (PUT /api/v1/system/security/config),
  restoreBasicAuth, buildAuthContextFromJwt, verifyLoggedInUserMatches.
  Composes existing getApiContext/getAuthContext/getToken helpers — no
  token extraction or HTTP plumbing is reimplemented.

- playwright/utils/sso-providers/{index,okta}.ts — ProviderHelper
  interface plus the Okta Identity Engine widget driver. Defaults the
  dev tenant values from the committed openmetadata.yaml snippet so the
  spec only needs SSO_USERNAME/SSO_PASSWORD to run locally.

- playwright/constant/ssoAuth.ts — env var key constants,
  PROVIDER_BUTTON_TEXT map, and the BASIC_AUTH_CONFIG payload used for
  cleanup.

- playwright.config.ts — new 'sso-auth' project matching
  playwright/e2e/Auth/**/*.spec.ts with its own serial workers, and
  '**/Auth/**' added to the chromium project's testIgnore so these
  tests never run in the default suite.

## How provider switching works

beforeAll logs in as admin via basic auth, captures the admin JWT via
getToken(page) BEFORE the swap, then PUTs the Okta config. The admin
JWT survives the provider swap because OM's internal JWKS stays in
publicKeyUrls and the admin user's isAdmin flag is persisted in the DB.
afterAll rebuilds an API context from that JWT and restores basic auth,
making the spec fully idempotent — the same OM instance can run the
suite repeatedly without any manual cleanup.

## Running locally

    export SSO_PROVIDER_TYPE=okta
    export SSO_USERNAME='<okta-test-user>'
    export SSO_PASSWORD='<okta-test-password>'
    npx playwright test playwright/e2e/Auth/SSOLogin.spec.ts \
      --project=sso-auth --workers=1

Verified end-to-end against integrator-9351624.okta.com — both tests
pass in ~12s on an already-provisioned user, ~14s on first-run
self-signup. Cleanup leaves the server in basic-auth mode.

## Notes for reviewers

- The existing .github/workflows/playwright-sso-tests.yml already wires
  up the CI matrix and secret names; this change intentionally does
  NOT enable the cron schedule. That lands in a follow-up once one
  provider is stable for a few nightly runs.
- OKTA_SSO_CLIENT_ID / OKTA_SSO_DOMAIN / OKTA_SSO_PRINCIPAL_DOMAIN env
  vars can override the baked-in dev tenant defaults if a different
  Okta tenant is used in CI.

* ci: add dedicated SSO Login Nightly workflow

Adds .github/workflows/playwright-sso-login-nightly.yml, a standalone
workflow that runs the new SSOLogin spec nightly at 03:00 UTC instead
of piggy-backing on playwright-sso-tests.yml.

The existing playwright-sso-tests.yml is left untouched — it still
covers the SSO configuration form UI via SSOConfiguration.spec.ts and
its matrix/secrets wiring is unchanged. The new workflow complements
it with a real end-to-end login round-trip:

- Schedule: cron '0 3 * * *'
- Provider matrix: okta only for Phase 1 (extended as helpers ship)
- Invokes playwright/e2e/Auth/SSOLogin.spec.ts under the new
  sso-auth Playwright project with workers=1
- Wires provider credentials via secrets with the existing
  {PROVIDER}_SSO_USERNAME / {PROVIDER}_SSO_PASSWORD convention plus
  optional OKTA_SSO_CLIENT_ID / OKTA_SSO_DOMAIN /
  OKTA_SSO_PRINCIPAL_DOMAIN overrides
- Uses the shared setup-openmetadata-test-environment composite
  action, PostgreSQL, ingestion disabled — matching the existing SSO
  tests workflow
- Uploads the HTML report as an artifact on every run and cleans up
  the docker stack in a final always-run step

* refactor(playwright): simplify ssoAuth helpers

- verifyLoggedInUserMatches now asserts directly on the lowercased
  email field instead of building a candidate array and feeding it a
  long stringified failure message. The assertion failure already
  shows expected vs received, so the wrapper string was just noise.

- Drop buildAuthContextFromJwt — it was a one-line wrapper around
  getAuthContext. The spec calls getAuthContext directly now.

* refactor(playwright): address SSO suite review feedback

- Extract OM_BASE_URL from PLAYWRIGHT_TEST_BASE_URL (with the same
  http://localhost:8585 default as playwright.config.ts) and export
  it from constant/ssoAuth.ts. okta.ts and BASIC_AUTH_CONFIG both
  consume it, so callbackUrl, the OM JWKS entry in publicKeyUrls, and
  the basic-auth restore payload all match the test target — including
  CI runs against non-default hosts.

- Drop PROVIDER_BUTTON_TEXT. It was exported but never imported; the
  ProviderHelper.expectedButtonText field is the only source of truth
  for the SSO sign-in button label and the spec already reads from it.

- Restore the OM convention adminPrincipals: ['admin'] in the Okta
  config (matches conf/openmetadata.yaml's AUTHORIZER_ADMIN_PRINCIPALS
  default). The previous code was granting admin to whichever IdP user
  ran the suite — verifyLoggedInUserMatches only needs an authenticated
  session, not admin, so the elevation was unnecessary. This also drops
  the now-unused requireEnv on SSO_USERNAME inside okta.ts; the spec
  itself still gates on the env var via test.skip.

- Set workers: 1 on the sso-auth Playwright project. fullyParallel:
  false alone wasn't enough — the global workers: 3 on CI could still
  fan out across multiple Auth/**/*.spec.ts files in the future. The
  explicit limit enforces full isolation as more provider specs land.

* ci: avoid CodeQL "Excessive Secrets Exposure" in SSO Login Nightly

Replaces the dynamic secret lookup

    secrets[format('{0}_SSO_USERNAME', upper(matrix.provider))]

with a static reference

    secrets.OKTA_SSO_USERNAME

CodeQL flagged the dynamic indexing because GitHub Actions can only
mask & scope secrets that are referenced statically. With a computed
key, the runner has no way to know which single secret is needed and
conservatively materializes EVERY org and repo secret into the step's
environment — even though the test only reads OKTA_SSO_*. Static
references let GitHub expose only the two credentials this step
actually uses.

Phase 1's matrix is okta-only so the change is two lines. The added
inline comment documents the convention for future providers: add a
sibling step gated by `if: matrix.provider == '<provider>'` with that
provider's static secret references — do not bring back the
secrets[format(...)] pattern.

* refactor(playwright): capture/restore real security config in SSO suite

- Snapshot /system/security/config in beforeAll, restore exact payload in
  afterAll instead of PUTting a hand-rolled basic-auth baseline (preserves
  allowedDomains, forceSecureSessionCookie, adminPrincipals, etc.)
- Strip ldap/saml subtrees from the snapshot: GET returns empty-string
  placeholders the PUT validator rejects
- Require OKTA_SSO_{CLIENT_ID,DOMAIN,PRINCIPAL_DOMAIN} via getRequiredEnv;
  no more hardcoded tenant defaults
- Fail fast in beforeAll if admin JWT capture returns empty string so the
  server is never left stuck in SSO mode
- Shrink Okta provider override to just the fields Okta needs; sibling
  authorizer fields come from the captured snapshot

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(sso-login): extract per-provider composite action

Restructures the nightly workflow so provider credentials stay statically
referenced for CodeQL while making it trivial to add new providers:

- New composite action .github/actions/sso-login-run bundles all shared
  setup + test-run logic; pulls non-secret provider config from the
  caller's vars context dynamically (${PROVIDER_UPPER}_SSO_*)
- playwright-sso-login-nightly.yml becomes a thin dispatcher with one
  real job per provider. Each job declares environment: test so it can
  resolve its password via a static secrets.<PROVIDER>_SSO_PASSWORD
  reference (no secrets[format(...)] dynamic lookup, CodeQL clean)
- Adding a provider = copy the okta job stanza, swap the secret name,
  add the provider to the dispatch input choices, register the helper
  in sso-providers/index.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(playwright): move Okta tenant config to a repo constant

The Okta tenant identifiers (clientId, domain, principalDomain) are
non-secret OAuth public values — visible on the hosted login page
during any sign-in. Keeping them in GitHub environment variables cost
setup friction (5 env vars to configure locally, each a potential typo)
without any security benefit. Move them back to a committed OKTA_TENANT
constant in okta.ts where a reviewer can see exactly which tenant the
suite is exercising.

Net effect:
- Local runs only need SSO_PROVIDER_TYPE, SSO_USERNAME, SSO_PASSWORD.
- The test environment in GH Actions keeps OKTA_SSO_USERNAME (variable)
  and OKTA_SSO_PASSWORD (secret); the three tenant variables are no
  longer consumed.
- Composite action drops the jq-based dynamic var extraction; the
  caller passes sso_username directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(sso-login): move timeout-minutes from composite step to job level

Composite actions don't support timeout-minutes on individual steps —
that's a runner job field only. Move the 30-minute test timeout up to
the dispatcher job and bump to 45 minutes to cover docker + maven setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(sso-login): consolidate dispatcher + composite action into one file

Collapse the dispatcher workflow + composite action split into a single
~115-line workflow using a strategy matrix and dynamic
vars[format(...)] / secrets[format(...)] credential resolution keyed on
the matrix provider name.

Trade-off:
- CodeQL "Excessive Secrets Exposure" (low severity) will re-flag the
  dynamic secret lookup. Accepted in exchange for a single source of
  truth and true zero-workflow-churn multi-provider support.

Onboarding a new provider is now:
  1. Add its name to the matrix array + dispatch options list.
  2. Add <PROVIDER>_SSO_USERNAME (variable) + <PROVIDER>_SSO_PASSWORD
     (secret) in the test environment.
  3. Register the helper in sso-providers/index.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(sso-login): drop provider-prefix bash step; use case-insensitive lookup

GitHub secret and variable names are case-insensitive, so
format('{0}_SSO_PASSWORD', matrix.provider) with the lowercase matrix
value resolves correctly against the uppercase conventional names like
OKTA_SSO_PASSWORD. That removes the need for a separate "Compute
provider prefix" step and its cross-step env-context plumbing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(sso-login): drop redundant case-insensitivity comment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci(sso-login): pin playwright install to 1.57.0 to match package.json

The previous 1.51.1 pin was stale vs. the @playwright/test version in
package.json. The mismatch caused browser cache path divergence — the
install step wrote browsers under 1.51.1's cache and the test run
looked for them under 1.57.0's cache and failed with "browsers not
installed."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(playwright): address SSO suite review comments [skip ci]

- Drive Okta tenant (clientId, domain, principalDomain) from env vars,
  falling back to the existing nightly tenant values as defaults
- Use redirectToHomePage as the final assertion in the SSO login step
- Document why the /signup vs /my-data branch is conditional

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* saml

* test(playwright): add SAML providers to SSO login nightly

Extend the nightly SSO login matrix with Azure AD SAML and a self-contained
Keycloak SAML fixture (Azure-profile + Google-profile realms), so the suite
exercises the full SAML flow end-to-end without relying on a hosted IdP.

- docker/local-sso/keycloak-saml: Keycloak 26.3.3 compose + pre-imported
  realms bound to OM at localhost:8585, port-overridable via
  KEYCLOAK_SAML_PORT.
- playwright sso-providers: azure-saml helper (hosted tenant, non-secret
  federation metadata committed) and keycloak-saml factory that fetches the
  realm's IdP X509 at runtime.
- SSO assertion matches OM's actual SAML sign-in label ("Sign in with
  SAML SSO"), since providerName isn't propagated into the store for the
  SAML provider branch of getAuthConfig.
- Workflow starts/stops the Keycloak stack only for keycloak-* matrix rows
  and injects the fixture credentials inline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(playwright): fetch Azure SAML IdP cert at runtime

Drop the committed Azure Federated SSO X509 certificate and the
AZURE_SAML_IDP_CERTIFICATE env fallback from the azure-saml provider.
The cert now comes from Azure's federation metadata XML endpoint at test
start, mirroring how the Keycloak provider resolves its realm cert, so the
suite stays aligned with Azure's ~3-year cert rotations automatically.

- New saml-metadata.ts exporting fetchIdpX509Certificate(descriptorUrl,
  label), reused by azure-saml and keycloak-saml.
- azure-saml.buildConfigPayload is now async and pulls the cert from
  https://login.microsoftonline.com/<tenantId>/federationmetadata/2007-06/federationmetadata.xml
  before building the SAML payload.
- keycloak-saml drops its inline cert-fetching helpers and delegates to
  the shared util.
- Trim narration comments across the SSO suite to keep only the
  non-obvious rationale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(playwright): drop hosted Azure SAML provider

The nightly Keycloak SAML fixture with Azure-profile attribute claims
exercises the same OM SAML code path as the hosted Azure AD tenant. The
hosted provider added external tenant/cert coupling without unique
coverage, so this removes it.

Drops the azure-saml helper, its env keys (AZURE_SAML_TENANT_ID /
AZURE_SAML_PRINCIPAL_DOMAIN), the dispatcher registration, and the
workflow dispatch option. Keycloak Azure/Google realms remain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(playwright): cover SSO session lifecycle end-to-end

Extends the SSO login spec beyond "can you log in" to the full session
round-trip: reload survives, same-context tabs inherit auth, sidebar
logout (with modal confirm) lands on /signin, and post-logout refresh
stays signed out.

Adds a describe-scoped userContext/userPage created in beforeAll so
tests 2-6 inherit the IdP-backed session; test 1 keeps its fresh
fixture for the unauthenticated assertion. Cleanup closes the user
context before restoring the server security config.

Verified locally against keycloak-azure-saml and keycloak-google-saml
realms: 6 passed each (was 2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* remove slow from individual spec

* remove slow from beforeAll

* style(playwright): fix SSOLogin spec prettier issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(playwright): tighten SSO sign-in locator and await logout response

Address Copilot review comments on PR #27164:
- Use button.signin-button to match the pattern in SSOAuthentication.spec.ts.
- Await /api/v1/users/logout POST alongside the /signin navigation in
  the logout test to remove the race against the server response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix

* Update openmetadata-ui/src/main/resources/ui/playwright/e2e/Auth/SSOLogin.spec.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix

* test(playwright): resolve SSO creds via env vars, drop keycloak-google-saml

Route Keycloak credentials through the same `vars[format(...)]` /
`secrets[format(...)]` indirection as Okta via an `env_prefix` matrix
column, removing the hardcoded fixture literals from the workflow.
Password lookup falls back `vars || secrets` so fixture passwords can
live as vars while real provider secrets stay in secrets.

Also drop the keycloak-google-saml variant — same IdP and realm shape
as the Azure variant, so it adds CI cost without meaningful coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(playwright): post SSO login nightly results to Slack

Adds a per-provider Slack notification step mirroring the pattern used
by the postgresql/mysql nightly workflows — reuses the existing
`slack-cli.config.json` and `playwright-slack-report` CLI against the
`results.json` that the global JSON reporter already emits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(playwright): drop logout response wait in SSO spec

OktaAuthenticator.logout clears tokens locally with no backend call, and
GenericAuthenticator (SAML) hits `GET /auth/logout` — neither triggers
the `POST /api/v1/users/logout` the test was waiting on. The listener
never matched, so `Promise.all` hung past the 180s test timeout even
though the page had already navigated to /signin.

Rely on `waitForURL('**/signin')` + the signin button assertion, which
are the actual cross-provider success signals.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Siddhant <siddhant@MacBook-Pro-457.local>
Co-authored-by: Siddhant <siddhant@MacBook-Pro-529.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Siddhant <siddhant@MacBook-Pro-621.local>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-17 13:09:54 +05:30
Aniket Katkar
12ce3b614d
Chore(UI): consolidated UI checkstyle fix commands and modify workflow comment (#27402)
* feat: add consolidated UI checkstyle commands for all and changed files

* update prt to pr

* test commit to fail ui-checkstyle

* update the comment

* Revert "test commit to fail ui-checkstyle"

This reverts commit ed056f0629.

* Revert "update prt to pr"

This reverts commit 0666fa51a3.

* Worked on comments

* pull request target remove

* Revert "pull request target remove"

This reverts commit b61e98c16b.

* Worked on comments
2026-04-16 17:18:22 +05:30
Teddy
50c17502cf
MINOR - Enable merge group GH event (#27371)
* chore: added merge_group for github merge queue

* chore: remove unnecessary merger group on team labeler

* fix: added gates for merge queue and pull request events
2026-04-15 07:42:08 -07:00
Pere Miquel Brull
1dedc0cf15
Add k8s-operator unit tests to PR CI (#27387)
* Add k8s-operator unit tests to PR CI pipeline

The k8s operator tests only ran during manual release builds.
Add a path-filtered job so they run on PRs touching
openmetadata-k8s-operator/**, following the same Detect Changes
pattern used by the service unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove -DfailIfNoTests=false — we want to catch missing tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix k8s-operator tests: add surefire includes and remove unnecessary stub

Parent POM surefire includes only match org.openmetadata.service.*,
so operator tests under org.openmetadata.operator.* were silently
skipped. Override with **/*Test.java in the operator pom.xml.

Also remove unused KubernetesClient mock stub from
CronOMJobReconcilerTest.setUp — no test reaches the code path
that calls context.getClient(), causing UnnecessaryStubbingException.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Rename k8s-operator to k8s_operator in workflow outputs

Hyphens in output names are parsed as subtraction in GitHub Actions
expressions dot notation, so the job condition would never trigger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix filesystem paths — underscore rename only applies to output keys

The replace_all incorrectly changed directory names from
openmetadata-k8s-operator to openmetadata-k8s_operator. Only the
GitHub Actions output key needs the underscore; all file paths must
use the actual hyphenated directory name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Drop -am flag from k8s-operator test command

openmetadata-service is a provided-scope dependency, so -am tries
to compile it including shaded ES/OS jars that aren't available in
a clean CI environment. The operator module compiles fine on its own.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix invalid YAML in conf/openmetadata.yaml

The CSP policy line has unescaped colons inside the value which the
YAML parser interprets as mapping indicators. Use a folded block
scalar (>-) so the value is parsed as a plain string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Build k8s-operator deps before running tests

The operator depends on openmetadata-service (provided scope) which
won't be in the Maven cache on a cold CI runner. Build with -am
-DskipTests first, then run operator tests separately — same pattern
as docker-k8s-operator.yml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Reintroduce lenient client mock to prevent flaky NPE

The reconcile flow is time-dependent — tests using "0 * * * *" can
reach context.getClient() near the top of the hour. Stub the full
client.resources().inNamespace().resource().create() chain as lenient
so early-return tests aren't penalized but happy-path tests won't NPE.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert conf/openmetadata.yaml — fix belongs in a separate PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:48:18 +02:00
Harsh Vador
f4c939869d
ci(security): add Retire.js workflow to detect bundled JS vulnerabilities (#27315)
* ci(security): add Retire.js workflow to detect bundled JS vulnerabilities

* address gitar

* add om existing security scan workflow

* address gitar

* add slack support & remove PR check

* address gitar

* change job name

* address comment

* address comment
2026-04-15 19:12:53 +05:30
Sriharsha Chintalapani
bb0daa180e
RDF, cleanup relations and remove unnecessary bindings, add distributed mode for RDF reindex (#26902)
* RDF, cleanup relations and remove unnecessary bindings, add distributed mode for RDF reindex

* Update generated TypeScript types

* Address comments from copilot

* Update generated TypeScript types

* fix test issues

* Fix minor UI bugs

* Add the missing filters

* Fix RDF export API error

* Add export functionality

* Fix ui-checkstyle

* Fix java checkstyle

* Fix unit tests

* Fix and increase the coverage for KnowledgeGraph.spec.ts

* Fix tests

* Remove rdf as default in playwright and local docker

* fix ui-checkstyle

* Address comments

* Potential fix for pull request finding 'CodeQL / Artifact poisoning'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Address copilot comments

* Address copilot comments

* FIx tests

* FIx docker

* Update openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/rdf/distributed/DistributedRdfIndexCoordinator.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address copilot review comments: license headers, JSON escaping, type safety, border-color, stop semantics

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/c026e52e-162b-4c9a-9874-43791d4aaac1

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

* Show error toast for unsupported export format in KnowledgeGraph

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/c026e52e-162b-4c9a-9874-43791d4aaac1

Co-authored-by: harshach <38649+harshach@users.noreply.github.com>

* Fix docker

* Fix docker for playwright

* Fix docker for playwright

* Fix tests

* Fix tests

* Fix docker

* Fix docker

* Fix glossary and pagination spec flakiness

* update the missing translations

* Fix docker

* Fix docker

* Fix integration test

* Fix fuseki not starting

* Fixed the run local docker script

* worked on comments

* Fix flakiness in knowledge graph tests

* Fix checkstyle

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
2026-04-14 13:24:41 -07:00
Chirag Madlani
4f7be5f014
fix(ci): filter blob pattern causing failure to sonarcloud (#27357)
* fix(ci): filter blob pattern causing failure to sonarcloud

* fix(ci): add missing backslash continuation in sonar-scanner command

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/88d229f2-81dd-4662-8295-a3bb0df03815

Co-authored-by: chirag-madlani <12962843+chirag-madlani@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-14 20:06:21 +05:30
Aniket Katkar
3428dfbd6a
Chore(UI): Fix rbac tests not running on PR checks (#26994)
* Fix rbac tests not running on PR checks

* update the dependency

* Update the SearchRBAC dependency
2026-04-14 17:53:59 +05:30
Pere Miquel Brull
f6258819e7
ci: reduce checkout history footprint in PR workflows (#27221)
* ci: reduce checkout history footprint in PR workflows

Optimize actions/checkout usage to avoid downloading the full repo blob
history on every PR run. The repo is large, so cloning everything just
to run tests wastes minutes of CI time per job.

- py-operator-build-test.yml: drop fetch-depth: 0 (no history needed)
- openmetadata-service-unit-tests.yml: drop fetch-depth: 0 (Sonar is
  explicitly skipped via -Dsonar.skip=true); shallow-fetch PR base ref
- airflow-apis-tests.yml, py-tests.yml, yarn-coverage.yml: add
  filter: blob:none to Sonar jobs so commits/trees remain available
  for blame while blobs are fetched lazily on demand
- ui-checkstyle.yml: add filter: blob:none to all jobs that rely on
  tj-actions/changed-files (needs commit/tree metadata, not blobs)

* ci: drop fetch-depth: 0 from jobs that don't walk history

Follow-up audit after the initial pass. Four jobs were still declaring
fetch-depth: 0 (plus filter: blob:none in two cases) without actually
needing any history beyond HEAD.

ui-checkstyle.yml
- i18n-sync: runs 'yarn i18n' then 'git status --porcelain'. git status
  compares the working tree to HEAD; no history walk. Default depth 1
  is sufficient.
- app-docs: same pattern with 'yarn generate:app-docs'.

py-sonarcloud-nightly.yml
- py-unit-tests: only uploads a coverage artifact, no Sonar invocation.
- py-integration-tests: same.
- py-combine-coverage: does run SonarSource/sonarqube-scan-action, so
  it genuinely needs the commit graph — added filter: blob:none for
  parity with the PR Sonar jobs.

* ci: remove unused 'Fetch PR base branch' step from service unit tests

Copilot review flagged that the step was using --depth=1 while the main
checkout is also shallow, which would break any merge-base operation.
On investigation, nothing downstream actually uses the base ref: the
only command that runs after the checkout is 'mvn ... -Dsonar.skip=true',
which has no git dependency. The step was preserved defensively in the
previous commit, but it's dead code — cleanest fix is to delete it.
2026-04-13 10:46:17 -07:00
Chirag Madlani
917a36c6a4
Potential fix for code scanning alert no. 1842: Artifact poisoning (#27220)
* Potential fix for code scanning alert no. 1842: Artifact poisoning

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Pin Yarn version to 1.22.18 to fix artifact poisoning alert

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/29aebdb5-eef0-4a2a-be01-489deef48d2b

Co-authored-by: chirag-madlani <12962843+chirag-madlani@users.noreply.github.com>

* Fix artifact poisoning in update-playwright-e2e-docs.yml: replace npm install -g yarn with pinned corepack

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/550fba5a-bb13-45da-a144-b67599c9eaa4

Co-authored-by: chirag-madlani <12962843+chirag-madlani@users.noreply.github.com>

* Remove corepack prepare to eliminate artifact poisoning: use only corepack enable (bundled yarn)

Agent-Logs-Url: https://github.com/open-metadata/OpenMetadata/sessions/90f6ed8d-3f2b-4c3d-9a34-cd1f57c4d89c

Co-authored-by: chirag-madlani <12962843+chirag-madlani@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-10 16:12:28 +05:30
Sriharsha Chintalapani
b2b49db75e
MSAL Token Renewal Fix — Safari Session Loss (#27214)
* MSAL Token Renewal Fix — Safari Session Loss

* MSAL Token Renewal Fix — Safari Session Loss

* MSAL Token Renewal Fix — Safari Session Loss

* apply lint

* MSAL Token Renewal Fix — OIDC fix

* wait for token update

* fix unit tests

* Add SSO playwright tests

* Add tests

---------

Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
2026-04-09 17:45:00 -07:00
Mohit Yadav
3ec31e3e68
Make OpeNMetadata Service Unit Test Required (#27099) 2026-04-09 15:58:50 -07:00