Elgato_dark/OpenMetadata

Fork 0

mirror of https://github.com/open-metadata/OpenMetadata synced 2026-05-24 09:39:11 +00:00

Sriharsha Chintalapani 22a6c10072

Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Waiting to run

Details

Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Blocked by required conditions

Details

Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Waiting to run

Details

Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Blocked by required conditions

Details

Java Checkstyle / java-checkstyle (push) Waiting to run

Details

Maven Collate Tests / maven-collate-ci (push) Waiting to run

Details

OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run

Details

OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions

Details

OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions

Details

OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions

Details

OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions

Details

Publish Package to Maven Central Repository / publish-maven-packages (push) Waiting to run

Details

Context center (#27558 )

* Add Context Center: Migrate Knowledge Center , Images/ PDFs document support

* Add Context Center: Migrate Knowledge Center , Images/ PDFs document support

* Address PR #27558 review comments

- KnowledgePageRepository: null-safe pageType in getHierarchyWithSearch
  and getHierarchyWithSearchForActivePage so the /search/hierarchy
  endpoint no longer NPEs when the pageType query param is omitted. The
  ES/OS client helpers already skip the pageType term when the value is
  null or empty, so this is a pure null-guard.
- ContextFileResource.uploadFile: when a failure happens after the
  ContextFileContent row is created (e.g. inside extractionService.submit),
  the cleanup path now hard-deletes that content row so the DB is not
  left with an orphaned record.
- ContextFileResource: replace the raw Content-Disposition string with a
  buildContentDisposition helper that emits both the legacy quoted
  filename= and the RFC 5987 filename*=UTF-8'' parameter with
  percent-encoded bytes, so international filenames round-trip while
  staying header-injection safe. sanitizeFileName also falls back to
  "download" on null/blank input.
- ContextFileResourceTest: new cases for sanitizeFileName null/blank
  fallbacks and for buildContentDisposition ASCII/unicode/space/injection
  behaviour (18 tests, all passing).

* Address copilot review comments on PR #27558

- AssetRepository.getByFqnPrefix: swap arguments so (assetType, fqnPrefix)
  matches the DAO signature — previous ordering always missed the index.
- FolderResource / ContextFileResource getEntitySpecificOperations: return
  List.of() instead of null so callers iterating the returned list cannot
  NPE.
- SearchUtils.getPageHierarchy: replace UUID.fromString with a parseUuid
  helper that returns null for missing/malformed values and logs a warning
  instead of failing the whole hierarchy response.
- DaoListFilter: qualify the pageType column with the caller-provided
  tableName, rename getArticleCondition to getPageTypeCondition (legacy
  no-arg method kept as @Deprecated wrapper for compatibility).
- Elastic/OpenSearch client processPageHierarchyHits: replace the per-hit
  getChildrenCountForPage search (N+1) with a single pass over the batch
  that derives childrenCount from pages whose parent is in the same
  result set. Also drops the now-unused helper and its throws clause.
- openmetadata-sdk/pom.xml: mark JWT, JAX-RS client, Apache HttpClient,
  jakarta.json, parsson, and JUnit Jupiter as <optional>true</optional>
  so they don't leak into SDK consumers that only use the core client.
- InMemoryAssetService: use the shared AsyncService executor for upload
  /read/delete instead of the JVM common ForkJoinPool.
- sample-pricing.xlsx: replace the plain-text placeholder with a real
  minimal XLSX workbook so detection-based and extraction-based code
  paths see a valid Microsoft Excel 2007+ file.

* Use one filters aggregation for page hierarchy childrenCount

Follow-up to b8458e2868. The previous fix derived childrenCount from
pages whose parent appeared in the same batch — that worked for
listPageHierarchyForActivePage (which fetches all depths) but always
returned 0 on the plain listPageHierarchy path (which only fetches one
depth), so top-level listings lost the count semantically.

Replace with a single `filters` aggregation keyed by page id: each
named bucket matches descendants via a fullyQualifiedName prefix query
against the page's FQN. That gives accurate direct-descendant counts
for every returned page in one aggregation round-trip, still O(1)
additional search requests regardless of batch size.

* Add allowedFields entries for contextFile, folder, page

Fixes SearchSettingsHandlerTest.testEveryAssetTypeHasCorrespondingAllowedFields.

searchSettings.json already had assetTypeConfigurations for contextFile,
folder, and page but no matching allowedFields entries, so the test that
asserts every assetType has a corresponding allowedFields block failed
with 'Asset type contextFile has no corresponding allowedFields entry'.

Adds the three missing blocks with the fields that each index actually
exposes — name / displayName (with .keyword and .ngram variants),
description, fqn, fqnParts, tags/tier/domains/dataProducts, plus
entity-specific fields (fileType/contentType/extractedText for
contextFile, parent.displayName for folder/page, pageType for page).

* Fix ui checkstyle

* Fix Java checkstyle

* Address PR #27558 copilot review round 2

- ES/OS populateChildrenCounts: add fqnDepth == parentDepth + 1 to the
  per-page filter so childrenCount is direct children only, matching the
  field name and the UI's isLeaf check semantics. Previously matched all
  descendants.
- ES/OS buildPageNestedSearchHierarchy: filter out hits with a null id
  before Collectors.toMap, which would otherwise NPE when SearchUtils
  drops a malformed UUID.
- SearchUtils.getPageHierarchy: wrap PageType.fromValue in a parsePageType
  helper that logs and returns null on unknown values, so a single bad
  hit can no longer break the whole hierarchy response.
- TestSuiteBootstrap.setupMinIO: pin minio/minio to
  RELEASE.2024-01-16T16-07-38Z instead of :latest so a newly-published
  image cannot break integration tests without a code change.
- createContextFile.json: rewrite the assetId description to be provider
  agnostic (S3 / Azure Blob / in-memory / no-op) and flag it as the legacy
  path, preferring headContentId / ContextFileContent.

* Update generated TypeScript types

* Address PR #27558 copilot review round 3

- bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql:
  - asset_entity: add PRIMARY KEY (id); mark all generated columns STORED
    for consistency with the other drive/knowledge tables in the same
    migration; compute deleted as a real boolean via
    IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) so the boolean index
    behaves correctly.
  - knowledge_center: mark name, updatedAt, updatedBy, pageType as STORED
    and apply the same deleted expression so the existing indexes on
    name and (fqnHash, deleted) are reliable on fresh installs.
  - drive_folder / context_file / context_file_content: update the
    deleted generated column to use the same boolean-safe expression.
- ElasticSearch/OpenSearch hierarchy search: add an explicit sort on
  fullyQualifiedName ASC with _id ASC as tiebreaker so from/size
  pagination is deterministic and cannot skip/duplicate pages between
  requests.

* Fix UI checkstyle

* Address PR #27558 copilot review round 4

- createPage.json: rewrite the field descriptions for name, displayName,
  owners, reviewers, and entityStatus. They were copy/pasted from other
  schemas ('query', 'tag') and were misleading in generated docs and
  clients.
- NoOpAssetService.generateDownloadUrlWithExpiry: return asset.getUrl()
  instead of a synthetic 'https://cdn.example.com/...' URL. The old
  behaviour let clients attempt downloads that would never resolve when
  object storage was disabled; returning the asset's own (empty) URL
  surfaces the misconfiguration cleanly.
- AzureAssetService: normalize the prefix path the same way S3 does.
  Previously a null/blank prefix produced the literal 'null/' prefix,
  writing blobs under the wrong key. New formatPrefix returns "" for
  null/blank and ensures exactly one trailing '/' for a real prefix.
- AssetRepository.getByFQN: treat null *or* empty list as 'not found',
  matching getByFqnPrefix. Callers previously received an empty list
  silently when the DAO returned [] instead of a 404.

* Update generated TypeScript types

* Fix UI checkstyle

* Address PR #27558 copilot review round 5

- AssetDAO.update / AssetRepository.update: switch the UPDATE target from
  fqnHash to id. Two assets can share the same fullyQualifiedName (e.g.
  successive revisions of the same context file), so the old SQL could
  silently update sibling rows.
- ContextFileExtractionService: run the extraction pipeline on a
  dedicated fixed thread pool instead of AsyncService.getExecutorService.
  process() blocks on assetService.read(...).join(), and S3/Azure reads
  are themselves scheduled on AsyncService — sharing the same bounded
  pool risks starving those reads (and deadlocking) once every thread is
  busy running extractions.
- postgres/schemaChanges.sql: wrap the generated deleted column in
  COALESCE((json ->> 'deleted')::boolean, false) (and the asset_entity
  CAST variant) so an absent 'deleted' key is stored as FALSE, not NULL.
  Otherwise "non-deleted" filters based on the boolean index drop rows
  silently. Matches the MySQL IFNULL(..., FALSE) side of the migration.
- ContextFileUploadSupport.sanitizeEntityName: treat null/blank input as
  'file' instead of NPE-ing on replaceAll. Multipart uploads can arrive
  without filename metadata; the upload should still succeed with a
  stable generated name.

* Remove macOS-only @rollup/rollup-darwin-arm64 dev dep

I pinned this during local troubleshooting to get a Vite dev server
running on macOS (rollup's optional native binary was missing). CI runs
on Linux, where yarn install --frozen-lockfile refuses the package
('The platform \"linux\" is incompatible with this module'), which
broke license-header, lint-src, lint-playwright, i18n-sync, app-docs,
and ui-coverage-tests for PR #27558.

rollup re-resolves its native binary per platform — there's no reason
to pin the darwin one. Remove it from package.json and drop the
matching '@rollup/rollup-darwin-arm64@^4.60.2' block from yarn.lock.

* Re-declare optional SDK test deps on integration-tests classpath

KnowledgeCenterIT failed in CI with
'java.lang.NoClassDefFoundError: org/glassfish/jersey/apache/connector/ApacheConnectorProvider'
after I marked the JAX-RS client stack in openmetadata-sdk as
<optional>true</optional> during review round 2. That change stops the
deps from leaking to every SDK consumer, but integration-tests actually
uses org.openmetadata.sdk.test.util.RestClient, so the optional deps
must be re-declared on its own classpath.

Adds jakarta.ws.rs-api, jersey-client, jersey-apache-connector,
httpclient, jakarta.json-api, and parsson to
openmetadata-integration-tests/pom.xml as <scope>test</scope>.

* Fix IT failures from CI integration-tests-mysql-elasticsearch

1. MySQL deleted column: revert the IFNULL wrapper to plain
   (json -> '$.deleted'). My earlier
   IFNULL(JSON_EXTRACT(json, '$.deleted'), FALSE) hit
   'Incorrect integer value: false for column deleted' on fresh installs
   because MySQL cannot coerce the resulting JSON scalar into TINYINT(1)
   when the column is STORED. The bare '(json -> '$.deleted')' form is
   what other OM tables already use, and MySQL converts JSON true/false
   to 1/0 directly for the BOOLEAN column. STORED + PRIMARY KEY stay
   in place.
2. DriveFileUploadIT: raise the four short atMost(5s) awaits to 20s
   with explicit pollDelay(ZERO) + pollInterval(200ms).
   K8sOMJobOperatorIT sets a global Awaitility pollInterval of 5s at
   class setup; any subsequent test with atMost <= 5s hits
   'Timeout must be greater than the poll delay'. Overriding the
   per-call poll settings insulates these asserts from the global
   leak.

* Document SDK test-utility optional deps

In review round 2 we marked jersey-client, jersey-apache-connector,
jakarta.ws.rs-api, httpclient, jakarta.json-api, parsson, java-jwt, and
junit-jupiter-api as <optional>true</optional> on openmetadata-sdk so
that core SDK consumers don't inherit a heavy JAX-RS + JUnit stack.
openmetadata-integration-tests hit this immediately with
NoClassDefFoundError from RestClient; its own pom now re-declares the
deps.

Add a "Test utilities" section to the SDK README that lists the
optional deps downstream test-utility consumers must re-declare (with
the concrete <scope>test</scope> XML snippet) and explains the error
they'd otherwise see.

* NoOpAssetService: never return null from generateDownloadUrlWithExpiry

In review round 4 I changed this method to return asset.getUrl() when
the asset is non-null. But Asset.url is optional in the schema, so
asset.getUrl() itself can be null — which breaks the implied "never
returns null" contract downstream callers rely on (AttachmentResource
only null-checks defensively).

Normalize null and blank URLs to an empty string so the method's
non-null, non-blank contract holds even when storage is disabled and
the asset was never populated with a URL.

* AssetServiceFactory: swap to NoOp when re-initialized with storage off

init(...) previously only assigned NoOpAssetService when instance was
null. On a re-init with object storage toggled off (config reload, test
teardown, etc.), the previously wired S3/Azure/InMemory provider stayed
live and kept serving real IO against a backend the operator thought
was disabled.

Replace the instance with a fresh NoOp when storage is disabled unless
the instance is already a NoOp (idempotent on repeated disabled
inits).

* Type create-request domains arrays as fullyQualifiedEntityName

The three new KC/Drive create schemas (createFolder, createContextFile,
createPage) had domains as an array of unconstrained strings. The rest
of the OM API models domain references as FQNs, and the shared
basic.json#/definitions/fullyQualifiedEntityName is the convention for
this.

Point all three items refs at fullyQualifiedEntityName so generated
clients see a consistent FQN type and requests get validated for
non-empty length/format rather than any string.

* Update generated TypeScript types

* Address PR #27558 copilot review 4144965142

- ContextFileExtractionService: switch the default thread pool to
  a static final DEFAULT_EXECUTOR, so every production instance of the
  service reuses the same pool instead of leaking a fresh fixed pool
  per construction (tests especially create multiple instances).
  Threads remain daemons, so the pool never blocks JVM shutdown.
- ObjectDeleteQueueService: when queueCapacity is 0, use a
  SynchronousQueue so "reject-if-all-workers-busy, no buffering" holds.
  Previous Math.max(1, queueCapacity) silently allocated a 1-slot
  ArrayBlockingQueue, contradicting the caller's stated capacity and
  potentially buffering one task past the semaphore's accounting.

Not fixing:
- SearchUtils @Slf4j 'LOG' vs 'log'. OM's openmetadata-service/lombok.config
  sets 'lombok.log.fieldName = LOG', so @Slf4j correctly generates
  'LOG' for every class in this module. The reviewer's concern only
  applies to projects without that directive. Verified clean compile.

* Address PR #27558 copilot review 4144917449

- knowledgeCenterTags.json: change mutuallyExclusive from the string
  "false" to the JSON boolean false. The Classification schema declares
  this as `"type": "boolean"`; jackson's lenient string->boolean
  coercion masked it until now, but strict validators would reject and
  the other OM bootstrap tag files that use the correct boolean
  (piiTagsWithRecognizers.json) model what this should look like.

- ContextFileExtractionService.process: guard the updateContent
  updater with the same head-content check already used in
  updateFile. Previously, if headContentId flipped between the
  initial check and the status writes, updateFile would no-op while
  updateContent still marked the now-stale content "Analyzing",
  leaving it stuck once the later early-return fires.

- AzureAssetService.upload: stream the InputStream straight to the
  blob using the known asset.getSize() instead of reading the whole
  payload into a byte[] via IOUtils.toByteArray. Matches the S3
  streaming behaviour and avoids full-file heap pressure / OOM risk
  on larger files. Buffered fallback retained when size is unknown.

- Size fields modeled as integer: flip fileSize / size on
  createContextFile.json, contextFile.json, asset.json,
  createAsset.json, and contextFileContent.json from
  "type": "number" to "type": "integer" with "format": "int64" and
  "minimum": 0. Byte counts are inherently whole numbers; floating
  point loses precision above 2^53 and makes validation murky.
  Update the (double) call sites in ContextFileResource,
  ContextFileUploadSupport, and AttachmentResource to match.

Not fixing:
- ContextEntityPromptService "unused Authorizer import" — false
  positive, the class uses it in the constructor.
- NoOpAssetService.generateDownloadUrlWithExpiry null return — already
  fixed earlier in commit a4a2dcc91d (returns "" when url is
  null/blank).

* AssetService.read: run inline instead of hopping through AsyncService

Every caller of AssetService.read(...) immediately .join()s on the
returned future:

- ContextFileExtractionService.process reads + extracts
- ContextFileResource.downloadFile reads + streams back
- AttachmentResource.serveAsset reads + streams back
- QueuedDeleteAssetService just delegates

None of them exploit the async nature, but the S3/Azure/InMemory
implementations all wrapped the blocking fetch in
AsyncService.executeAsync or CompletableFuture.supplyAsync on a
bounded pool. That created a starvation path when any caller thread
was already running on AsyncService (or could monopolize it under
load) — join() would block the caller while the submitted read
task fought for a free worker.

Switch S3, Azure, and InMemory read() to execute on the caller's
thread and return CompletableFuture.completedFuture(...). Interface
is unchanged so existing .join() callers keep working; the extra
thread hop and the potential for AsyncService starvation are both
gone. Combined with the dedicated context-file-extraction pool, the
extraction pipeline no longer touches AsyncService for any
asset-read step.

* Address PR #27558 copilot review 4151211562

- FolderIndex / ContextFileIndex: stop re-setting entityType, deleted,
  owners, totalVotes inside buildSearchIndexDocInternal. Those common
  fields are populated by populateCommonFields in the SearchIndex
  template method (Phase 1) before Phase 3 calls the entity-specific
  internal builder, so the explicit puts were redundant and silently
  overrode the template output. Aligns with PageIndex convention and
  updates the unit tests to assert the internal builder sets only
  entity-specific fields.

- ContextFileTextExtractor: bound the Tika BodyContentHandler at
  MAX_CANONICAL_TEXT_LENGTH instead of passing -1 (unbounded) so a
  pathological image cannot drive OCR to accumulate arbitrary output
  on the heap.

- ContextFileExtractionService: replace the unbounded
  Executors.newFixedThreadPool backing queue with a ThreadPoolExecutor
  using an ArrayBlockingQueue + AbortPolicy. Without a bounded queue
  the RejectedExecutionException handling in submit(...) was dead
  code; with it, an overloaded server surfaces a "retry later"
  failure status instead of silently accumulating work.

- S3AssetService / AssetService / AssetServiceFactory /
  QueuedDeleteAssetService: make AssetService extend AutoCloseable
  with a default no-op, override close() in S3AssetService to release
  the S3Client and S3Presigner connection pools, and register a
  shutdown hook in AssetServiceFactory that closes the current
  provider on JVM exit (and on re-init when the provider changes).

- bootstrap 2.0.0 MySQL schemaChanges: change the deleted generated
  column from (json -> '$.deleted') to
  (JSON_EXTRACT(json, '$.deleted') IS TRUE) so rows where the JSON
  key is absent resolve to FALSE instead of NULL. Avoids filter misses
  on the composite (fqnHash, deleted) index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Java checkstyle

* Fix integration test compile + S3 generateDownloadURL

ContextFileIT / DriveFileUploadIT compile failures came from the
fileSize schema switch to integer/int64 — the generated setter/getter
is now Integer. Replace the double literals with ints and the
assertEquals(double, ...) sites with intValue() so the (int, int)
overload resolves unambiguously.

Also override S3AssetService.generateDownloadURL to return a
short-lived presigned URL (mirroring AzureAssetService) instead of
inheriting the default, which would return the raw S3 key from
asset.url. Addresses review 4151282021.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert MySQL deleted column back to bare json -> expression

The JSON_EXTRACT(...) IS TRUE form broke integration tests — GET after
create started returning 404, consistent with MySQL evaluating the
IS TRUE predicate against the JSON scalar in a way that stored 1
instead of 0 for freshly-created rows (deleted=false).

Restoring the bare (json -> '$.deleted') expression used pre-review.
Rows with the key missing will store NULL on the generated column,
which is a theoretical concern the review flagged but does not affect
current code paths (all inserts write json.deleted explicitly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Transi18next import path in KnowledgeCenter components

Two KnowledgeCenter files imported Transi18next from
'utils/CommonUtils', which is where Collate's UI re-exports it from.
OpenMetadata core exports Transi18next from 'utils/i18next/LocalUtil'
(same path every other core file uses). The Collate-style import
broke the production Vite/Rollup build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Harden ContextFileIT.testFileAppearsInSearch against async indexing

The test used a fixed Thread.sleep(2000) then a single assertEquals
on the status code. That was flaky two ways: ES indexing is async
and the 2s window is not always enough, and on a fresh cluster the
context_file_search_index itself may not exist yet at first query
(yielding 500).

Replace with an await() loop that polls every 200ms for up to 30s
and asserts both status==200 AND that the newly-created file's UUID
appears in the response. Matches the assertSearchContainsFile
helper in DriveFileUploadIT.

Also URL-encode the namespaced query string so the uniqueName
does not break the query parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Make playwright editor shortcuts platform-aware

The SHORTCUTS constant in playwright/constant/KnowledgeCenter.constant.ts
hard-coded "Meta+b" / "Meta+z" / etc. On macOS Meta is Cmd and those
shortcuts trigger bold / undo / copy as expected, but on the Linux CI
runners Meta is the Super (Windows) key — so every ProseMirror
formatting and history test just pressed Super+b, which does nothing,
and the test then fails waiting for the <strong>…</strong> element
(or for the undone text to disappear).

Detect the runner platform and use Meta on macOS, Control everywhere
else — matching the same pattern in src/constants/KnowledgeCenter.constant.ts.

Unblocks the 6 KnowledgeCenterTextEditor failures across Admin / Data
Consumer / Data Steward roles (Text Formatting + Undo/Redo). Slash
commands keep passing because they don't depend on modifier keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Run prettier on DateTimeUtils.ts

CI's lint-src job fails because ESLint+Prettier --fix produces a
non-empty diff against the committed tree. Local prettier pass
trimmed the indentation and added a trailing comma in the imports
block. No behavioral change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Knowledge Page entity-link + DAO filter regressions from the port

Downloaded the failing playwright traces from the PR's postgres e2e run
and walked each one. Three distinct bugs, all present because the
Collate-side overrides (overrides/EntityUtilClassCollate.ts and the
DaoExtension.KnowledgeExtensionDAO custom SQL) were not carried over
into OpenMetadata core when KnowledgeCenter was merged up.

1) CollectionDAO.KnowledgePageDAO: override listCount / listBefore /
   listAfter (plus helper SQL queries) so that
   `GET /v1/knowledgeCenter?entityId=X&entityType=topic` actually INNER
   JOINs entity_relationship and returns only pages whose
   relatedEntities contains the target entity. Without this the base
   EntityDAO ignored entityId/entityType entirely and returned every
   page, which is why the "Knowledge Articles" widget on a data asset
   page showed the 15 fixture articles instead of the one just attached
   — and why updateDataAsset timed out waiting for the linked article.
   Uses OWNS relation for user/team filters (same semantics Collate
   uses) and HAS for every other entity type.

2) EntityUtilClassBase + EntityUtils.getEntityLinkFromType: add
   EntityType.KNOWLEDGE_PAGE cases that route to getKnowledgePagePath.
   Before this, mention notifications for Knowledge Pages fell through
   to the default `/table/<fqn>` branch (confirmed in the captured
   page-snapshot: the mention link pointed at `/table/Article_eEqrWeeU`),
   which 404'd on the Table API and rendered an error page — so the
   entity-header-display-name textarea never appeared and the User
   Mentions test timed out. Search results on Explore had the same
   problem, rendering every Knowledge Page result card with href="/".

3) EntityUtilClassBase.getEntityByFqn / ENTITY_PATCH_API_MAP /
   getResourceEntityFromEntityType: handle KNOWLEDGE_PAGE end-to-end so
   the detail-page fetch, patches, and policy lookups all route through
   the knowledgeCenter REST API rather than falling back to the generic
   entity utilities (which don't know about the 'page' entity type).

Verified against the real trace artifacts from CI run 24790718035:
- shard 3 Knowledge Center page test — widget shows 10 unrelated
  "Article_*" fixture items instead of the created one → root cause
  is the missing DAO JOIN (#1).
- shard 3 User Mentions test — notification link is /table/, not
  /knowledge-center/ (#2).
- shard 3 Reviewer Workflow — data consumer's knowledge-center goto
  renders "No data available" because getEntityByFqn fell back to a
  table fetch for a page FQN (#3).
- shard 5 ExplorePageRightPanel_KnowledgeCenter (22 failures) —
  search result card links are "/explore/" (empty), same root cause
  as (#2) inside getEntityLinkFromType default branch.

Compiles: mvn -pl openmetadata-service -q -DskipTests compile passes;
tsc --noEmit reports no new errors in the touched files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address remaining PR #27558 review feedback

Seven actionable fixes drawn from the still-open review threads; the
rest of the open threads in copilot's bot reviews are either already
addressed in earlier commits or stale against the current code and
are being resolved on the review UI alongside this commit.

- AssetRepository.getByFQN: the LOG.error message said "asset with id"
  but was printing the FQN. Relabel to "asset with FQN" for accurate
  troubleshooting (thread #42).

- KnowledgePageMapper.createToEntity: stop mutating the inbound
  CreatePage by calling create.withRelatedEntities(...). Build the
  effective list as a local variable and pass it to copy(...). Prevents
  the Organization fallback from leaking into the caller's request
  object, which is surprising when the request is re-used or logged
  (thread #43).

- FolderIndex: default childrenCount to 0 when the entity hasn't yet
  had its children recomputed (e.g. a freshly created folder). Prevents
  the numeric field from being indexed as missing, which broke range
  and sort queries that assume it is always present (thread #46).

- NoOpAssetService and InMemoryAssetService: override
  generateDownloadURL to delegate to generateDownloadUrlWithExpiry,
  matching S3/Azure. Without this, callers using the non-expiry API
  got asset.getUrl() (often empty for these providers), yielding broken
  download links (threads #39, #45).

- ObjectDeleteQueueService: register a JVM shutdown hook in the
  singleton's initializer that calls stop(). The service already
  implements Dropwizard Managed, but nothing currently wires it into
  the application lifecycle, so non-daemon delete-worker threads were
  at risk of keeping the JVM alive after ungraceful termination. The
  hook is a belt-and-suspenders fallback to the Managed path
  (threads #52, #53).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add java-checkstyle skill for Claude + Codex agents

CI keeps surfacing "Java checkstyle failed — please run mvn spotless:apply"
comments on PRs (including this branch). CLAUDE.md and AGENTS.md already
mentioned the command, but a one-line prose note in the middle of each file
wasn't enough to make it a reliable habit.

This commit:

- Adds a dedicated invocable skill at .claude/skills/java-checkstyle/SKILL.md
  (for the Claude Code harness) and a mirror at
  .agents/skills/java-checkstyle/SKILL.md (for Codex-style agents). Both
  describe the same procedure: when / why to run spotless, the `-pl <module>`
  scoping option, the verify-only `spotless:check` form, the expected
  diff shape, and the rule to never hand-edit formatting around a plugin
  error.

- Promotes the existing one-liners in CLAUDE.md and AGENTS.md to explicit
  "run before finishing any Java task" instructions, pointing at the skill so
  agents have a reusable procedure to invoke rather than improvising.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Harden AttachmentResource upload/download against three regressions

Carried over from the latest AttachmentResource review. Three issues:

1. Content-Disposition header injection (security) — downloadAsset() built
   the header by direct string interpolation of asset.getFileName(). A
   filename containing double-quotes or CRLF could inject arbitrary HTTP
   headers. ContextFileResource already has a sanitize + RFC-5987 encode
   helper; rather than duplicate it, promote
   ContextFileUploadSupport.sanitizeFileName / buildContentDisposition to
   public, delete the duplicates from ContextFileResource (now delegators),
   and reuse the shared helpers from AttachmentResource.

2. Unbounded upload buffering (performance / DoS) — createAssetFromUpload
   read the full multipart body into a byte[] via IOUtils.toByteArray
   before checking against MAX_FILE_SIZE. An attacker could send an
   arbitrarily large body and exhaust heap before the validation ran.
   Replace with ContextFileUploadSupport.bufferUpload(), which streams to
   a bounded temp file and throws MaxFileSizeExceededException the moment
   the configured limit is passed; translate that into the same
   AttachmentException size-validation error the previous code raised.
   Promoted BufferedUpload and MaxFileSizeExceededException to public so
   the attachments package can consume them.

3. Startup NPE when objectStorage is null (bug) — initialize() called
   config.getObjectStorage().getMaxFileSize() without a null guard, so a
   deployment that doesn't configure object storage would NPE on server
   start. Added the same guard ContextFileResource.initialize() already
   uses, gave MAX_FILE_SIZE a safe 5 MiB default, and also null-guarded
   the S3-configuration branch of the CDN URL lookup so a pure-Azure or
   pure-NoOp setup doesn't fall off the end of the ternary.

Ran mvn spotless:apply — picks up formatting-only changes in
CollectionDAO.java and FolderIndex.java as a side effect of the shared
helper additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add ui-checkstyle skill + fix residual import-order drift

CI's UI Checkstyle workflow has three per-area jobs (lint-src,
lint-playwright, lint-core-components) that reformat the files changed
in the PR and fail if the reformat produces a diff. CLAUDE.md and
AGENTS.md didn't previously document this flow, so re-running the fix
was a guessing game — the two lint-core-components and lint-playwright
failures on this branch came from stale import order left over from the
main→context_center merge.

This commit:

- Adds a dedicated invocable skill at .claude/skills/ui-checkstyle/SKILL.md
  (Claude Code harness) and a mirror at .agents/skills/ui-checkstyle/SKILL.md
  (Codex-style agents). Both describe the exact three-command sequence CI
  runs — organize-imports-cli → eslint --fix → prettier --write — the
  per-area file scoping, the `--check` dry-run mode, and the rule that
  organize-imports must run BEFORE prettier (otherwise the indentation /
  trailing-comma round-trip leaves a dirty diff).

- Promotes the existing one-liner in CLAUDE.md and AGENTS.md to an explicit
  "run before finishing any UI task" instruction that points at the skill.

- Fixes two residual import-order drifts (KnowledgePagesHierarchy.tsx,
  EntityUtilClassBase.ts) surfaced by running the skill's sequence locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix UI checkstyle on EntityUtilClassBase.ts

ESLint --fix inserted a blank line between the KNOWLEDGE_PAGE guard and the
fallback return in getEntityByFqn. Committing the formatted version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix ContextFileIT.testFileAppearsInSearch flaky 500 from query_string parsing

The previous polling search used the namespaced unique name as a free-text
q= argument. The namespace prefix contains '-' which the ES 9.x query_string
parser treats as a NOT operator, producing a deterministic 500 across the
full 30s polling window even when the document was indexed.

Switch to the direct get-by-id endpoint (/v1/search/get/{index}/doc/{id}),
which performs a real-time ES GET with no query_string parsing and no
analyzer involvement — the most reliable signal that the document was
indexed. Bump the timeout to 60s and capture the response body on any
non-200 so future regressions surface the real ES error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix knowledge center icon

* update knowledge center to context center

Co-authored-by: Copilot <copilot@github.com>

* Revert "update knowledge center to context center"

This reverts commit f0cca5fd65.

* Fix UI checkstyle: sort tag*-related imports in SearchClassBase

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix Jest coverage failures in KnowledgeCenter Layout and right panel

KnowledgeCenterLayout was importing i18n directly from LocalUtil, but the
global setupTests mock for that module only exposes t/on. Switch to the
useTranslation() hook so it picks up the react-i18next mock that already
provides i18n.dir(), matching how LeftSidebar and RichTextEditor use the
direction.

EntityRightPanelClassBase.getKnowLedgeArticlesWidget now returns the
KnowledgePages component instead of null. Update the corresponding test
case to assert the new return value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix playwright tests and bugs

Co-authored-by: Copilot <copilot@github.com>

* Fix checkstyle

* Fix /knowledgeCenter/search/hierarchy 500 by removing _id sort

ES 9.x and OpenSearch 3.x reject sorts on the _id field by default
(indices.id_field_data.enabled is false), causing every call to
listPageHierarchy{,ForActivePage} to fail the search_phase_execution_exception
"all shards failed" we see in the screenshot. The _id sort was added
in 4a75852a7e as a tiebreaker for from/size pagination, but
fullyQualifiedName is already a keyword field with doc_values and is
unique per page (name is unique within a parent's children) — so no
tiebreaker is needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Cascade hard-delete to descendant pages in search index

KnowledgeCenter pages are nested via FQN (parent.fqn -> parent.fqn.child),
not via a parent.id field on the child doc. The default deleteOrUpdateChildren
case for entity type "page" uses page.id field matching, which doesn't exist
on child page docs — so a recursive hard-delete on the parent removed the
parent from search but left every descendant orphaned in the index. Stale
docs only disappeared on a full reindex.

This logic was overridden in the collate fork's SearchRepositoryExt; it was
lost during the migration when the override class was removed. Fold the
override into the base SearchRepository as a Page-specific case that calls
deleteEntityByFQNPrefix, which deletes by fullyQualifiedName.keyword prefix
match — covering every descendant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Add page/folder/contextFile/securityService to SearchIndexingApp picker

The Search Indexing Application's "Entities" picker shows "No data" when
typing "Page" because the enum in src/utils/ApplicationSchemas/SearchIndexingApplication.json
does not include the Knowledge Center / Drive entity types added on this
branch. The collate fork carried these in SearchIndexingApplication-collate.json
(included page); folder, contextFile and securityService are new on this
branch and never made it into the picker enum during the migration.

Without them in the enum, users cannot select these entity types for
targeted reindex, even though every other reindex code path supports them.

src/jsons/applicationSchemas/* is generated by parseSchemas.js from
src/utils/ApplicationSchemas/* at build time and is gitignored, so only
the source schema is updated here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Restore live index settings on per-entity distributed-promote path

DefaultRecreateHandler exposes two finalization paths:

  - finalizeReindex(...)        — centralized end-of-job promotion. Calls
                                  applyLiveServingSettings + maybeForceMerge
                                  before the alias swap, reverting the bulk
                                  overrides (refresh_interval=-1, replicas=0,
                                  async translog) back to live values
                                  (refresh=1s, replicas=1, durable translog).

  - promoteEntityIndex(ctx, ok) — per-entity promotion. Used by the distributed
                                  search-indexer's "promote as soon as all
                                  partitions for an entity complete" callback
                                  (DistributedSearchIndexExecutor.promoteEntityIndex).
                                  Swaps the alias and cleans up old indices —
                                  but never restored live settings.

When an entity finishes its partitions before the final reconciliation
(typically the smallest entities — e.g. knowledge `page` with ~11 rows),
its index is promoted via the per-entity path, the alias swap succeeds,
and the bulk-build overrides become the new live settings. refresh_interval
stays at -1 in production, so live writes after the reindex are buffered in
the translog and never reach searchable segments until a manual _refresh.
Externally this surfaces as "create an article, hierarchy is empty until I
re-trigger reindex" — exactly the user-reported bug.

Mirror the finalizeReindex sequence by calling applyLiveServingSettings
(and maybeForceMerge for parity) at the top of the promote block in
promoteEntityIndex, before the alias swap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Wire jobData into per-entity reindex promotion handler

DefaultRecreateHandler.applyLiveServingSettings reads from the handler's
jobData field (live + bulk index-settings overrides on the EventPublisherJob).
The per-entity distributed-promotion path in DistributedSearchIndexExecutor
created its own DefaultRecreateHandler instance and never called
withJobData(jobData) on it. With jobData=null, buildRevertJson returns null
and applyLiveServingSettings silently no-ops — meaning the previous fix
(b272de85f9) never actually re-applied live settings on the per-entity
promote path, even though the call was reached.

currentJob.getJobConfiguration() is the EventPublisherJob the strategy
created. Wire it into the new handler at construction time, mirroring the
withJobData call DistributedIndexingStrategy already makes on the strategy's
own handler instance.

With this change, the per-entity promote path now logs

  "Applying live serving settings to staged index '...' for entity 'page':
   {\"number_of_replicas\":1,\"refresh_interval\":\"1s\", ...}"

before the alias swap, and post-promotion `_settings` show
refresh_interval=1s instead of the stuck -1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Fix delete failure

* Fix java checkstyle

* Fix article deletion issue

* refactor(test): streamline Knowledge Center List setup and teardown processes

* Fix GlossaryTags

* Add missing pieces in knowledge articles

* Fix checkstyle

* Remove reviewer workflow spec

* remove unused util

* Fix the localization changes

* Fix unit tests

* deleted unused svg

* added missing svg

* improved ux of save button & autofocus on title

* lint fixes

* Update page index

* Make calculateFqnDepth static

* fixed the kc imports

* import fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com>
Co-authored-by: Rohit0301 <rj03012002@gmail.com>
Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com>
Co-authored-by: Harsh Vador <harsh.vador@somaiya.edu>

2026-05-08 10:56:04 -07:00

26 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

About OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance. This is a multi-module project with Java backend services, React frontend, Python ingestion framework, and comprehensive Docker infrastructure.

For architecture deep dives, entity/repository/resource patterns, and end-to-end checklists for adding new entities or connectors, see DEVELOPER.md.

Architecture Overview

Backend: Java 21 + Dropwizard REST API framework, multi-module Maven project
Frontend: React + TypeScript, built with Webpack and Yarn; component library via openmetadata-ui-core-components (Tailwind CSS v4 with tw: prefix, react-aria-components foundation)
Ingestion: Python 3.10-3.11 with Pydantic 2.x, 75+ data source connectors
Database: MySQL (default) or PostgreSQL with Flyway migrations
Search: Elasticsearch 7.17+ or OpenSearch 2.6+ for metadata discovery
Infrastructure: Apache Airflow for workflow orchestration

Environment Setup

Python Virtual Environment (REQUIRED)

You MUST activate the Python venv before any Python work. OpenMetadata supports Python 3.10-3.11; 3.11 is recommended.

# First-time setup (creates venv at repo root):
# python3.11 -m venv env

# ALWAYS activate before running Python, make generate, make install_dev, etc:
source env/bin/activate

# Verify:
python --version   # Should show Python 3.10.x or 3.11.x

In worktrees: When Claude Code creates a Git worktree, the venv from the main repo is NOT copied. You need to either:

Create a new venv in the worktree: python3.11 -m venv env && source env/bin/activate && cd ingestion && make install_dev
Or symlink the main repo's venv: ln -s /path/to/main-repo/env env

Initial Dev Environment Setup

After activating the venv, install all dependencies:

source env/bin/activate

# Install ingestion module with all dev dependencies (required before make generate)
cd ingestion
make install_dev_env            # Full dev environment (edit mode + all extras)
# OR for lighter install:
make install_dev                # Just dev dependencies
cd ..

# Generate Pydantic models from JSON schemas (required after schema changes)
make generate

# Install UI dependencies
make yarn_install_cache

Other Environment Notes

Java: Java 21 required. Use mvn (Maven) for backend builds.
Node/Yarn: Use yarn (not npm) for frontend. Frontend root is openmetadata-ui/src/main/resources/ui/.
Docker services: Development services (MySQL, Elasticsearch, etc.) run via docker/development/docker-compose.yml:
```
docker compose -f docker/development/docker-compose.yml up -d
```

Essential Development Commands

Prerequisites and Setup

make prerequisites              # Check system requirements
source env/bin/activate         # ALWAYS activate venv first
cd ingestion && make install_dev_env  # Install Python dev dependencies
make generate                  # Generate Pydantic models from JSON schemas
make yarn_install_cache        # Install UI dependencies

Frontend Development

cd openmetadata-ui/src/main/resources/ui
yarn start                     # Start development server on localhost:3000
yarn test                      # Run Jest unit tests
yarn test path/to/test.spec.ts # Run a specific test file
yarn test:watch               # Run tests in watch mode
yarn playwright:run            # Run E2E tests
yarn lint                      # ESLint check
yarn lint:fix                  # ESLint with auto-fix
yarn build                     # Production build

Frontend CI Checkstyle (run before PR to match CI)

cd openmetadata-ui/src/main/resources/ui
yarn ui-checkstyle:changed         # One-shot checkstyle for changed files (excludes tsc)
yarn organize-imports:cli <files>  # Sort and organize imports
yarn lint:fix                      # ESLint auto-fix
yarn pretty:base --write <files>   # Prettier formatting
yarn license-header-fix <files>    # Add Apache 2.0 license headers
yarn i18n                          # Sync all 17 locale files with en-us.json
yarn generate:app-docs             # Regenerate application documentation
npx tsc --noEmit                   # TypeScript type check (catches errors early)

Backend Development

mvn clean package -DskipTests  # Build without tests
mvn clean package -DonlyBackend -pl !openmetadata-ui  # Backend only
mvn test                       # Run unit tests
mvn verify                     # Run integration tests
mvn spotless:apply             # Format Java code

Python Ingestion Development

cd ingestion
make install_dev_env           # Install in development mode
make generate                  # Generate Pydantic models from JSON schemas
make unit_ingestion_dev_env    # Run unit tests
make py_format                 # Apply ruff lint-fix + format
make py_format_check           # Verify lint + format (matches CI; catches non-auto-fixable issues)
make static-checks             # Run type checking with basedpyright

Full Local Environment

./docker/run_local_docker.sh -m ui -d mysql        # Complete local setup with UI
./docker/run_local_docker.sh -m no-ui -d postgresql # Backend only with PostgreSQL
./docker/run_local_docker.sh -s true               # Skip Maven build step

Testing

make run_e2e_tests             # Full E2E test suite
make unit_ingestion            # Python unit tests with coverage
yarn test:coverage             # Frontend test coverage

Backend Integration Tests

All backend API integration tests MUST be placed in openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ directory. Tests should:

Use naming convention *IT.java (Integration Test)
Extend BaseEntityIT<T, K> for entity CRUD tests
Be designed to run concurrently (use @Execution(ExecutionMode.CONCURRENT))
Use TestNamespace for test isolation
Use SdkClients for API calls (e.g., SdkClients.adminClient().tables().create(...))

# Run a specific integration test
mvn test -pl openmetadata-integration-tests -Dtest=TaskResourceIT

# Run all integration tests
mvn test -pl openmetadata-integration-tests

Code Generation and Schemas

OpenMetadata uses a schema-first approach with JSON Schema definitions driving code generation:

make generate                  # Generate all models from schemas
make py_antlr                  # Generate Python ANTLR parsers
make js_antlr                  # Generate JavaScript ANTLR parsers
yarn parse-schema              # Parse JSON schemas for frontend (connection and ingestion schemas)

Schema Architecture

Source schemas in openmetadata-spec/ define the canonical data models
Connection schemas are pre-processed at build time via parseSchemas.js to resolve all $ref references
Application schemas in openmetadata-ui/.../ApplicationSchemas/ are resolved at runtime using schemaResolver.ts
JSON schemas with $ref references to external files require resolution before use in forms

Key Directories

openmetadata-service/ - Core Java backend services and REST APIs
openmetadata-ui/src/main/resources/ui/ - React frontend application
ingestion/ - Python ingestion framework with connectors
openmetadata-spec/ - JSON Schema specifications for all entities
bootstrap/sql/ - Database schema migrations and sample data
conf/ - Configuration files for different environments
docker/ - Docker configurations for local and production deployment

Development Workflow

Schema Changes: Modify JSON schemas in openmetadata-spec/, then run mvn clean install on openmetadata-spec to update models
Backend: Develop in Java using Dropwizard patterns, test with mvn test, format with mvn spotless:apply
Frontend: Use React/TypeScript with components from openmetadata-ui-core-components, test with Jest/Playwright
Ingestion: Python connectors follow plugin pattern, use make install_dev_env for development
Full Testing: Use make run_e2e_tests before major changes

Frontend Architecture Patterns

React Component Patterns

File Naming: Components use ComponentName.component.tsx, interfaces use ComponentName.interface.ts
State Management: Use useState with proper typing, avoid any
Side Effects: Use useEffect with proper dependency arrays
Performance: Use useCallback for event handlers, useMemo for expensive computations
Custom Hooks: Prefix with use, place in src/hooks/, return typed objects
Internationalization: Use useTranslation hook from react-i18next, access with t('key')
Component Structure: Functional components only, no class components
Props: Define interfaces for all component props, place in .interface.ts files
Loading States: Use object state for multiple loading states: useState<Record<string, boolean>>({})
Error Handling: Use showErrorToast and showSuccessToast utilities from ToastUtils
Navigation: Use useNavigate from react-router-dom, not direct history manipulation
Data Fetching: Async functions with try-catch blocks, update loading states appropriately

State Management

Use Zustand stores for global state (e.g., useLimitStore, useWelcomeStore)
Keep component state local when possible with useState
Use context providers for feature-specific shared state (e.g., ApplicationsProvider)

Styling

Component Library: Use components from openmetadata-ui-core-components for all new UI work. This is the canonical component library — do not use MUI or introduce new MUI dependencies.
Available Components: Button, Input, Select, Modal, Table, Tabs, Pagination, Badge, Avatar, Checkbox, Dropdown, Form, Card, Tooltip, Toggle, Slider, Textarea, Tags, and more — all in openmetadata-ui-core-components/src/main/resources/ui/src/components/
Tailwind Classes: All Tailwind utility classes must use the tw: prefix (e.g., tw:flex, tw:text-sm, tw:bg-blue-500) to avoid conflicts with existing Ant Design/Less styles
Design Tokens: Use CSS custom properties defined in openmetadata-ui-core-components/src/main/resources/ui/src/styles/globals.css. Never use hardcoded color or spacing values. Semantic tokens include:
- Text: --color-text-primary, --color-text-secondary, --color-text-tertiary, --color-text-error-primary, etc.
- Border: --color-border-primary, --color-border-secondary, --color-border-error, --color-border-brand, etc.
- Background: --color-bg-primary, --color-bg-secondary, --color-bg-error-primary, --color-bg-brand-solid, etc.
- Shadows: --shadow-xs through --shadow-3xl
- Border radius: --radius-none through --radius-full
MUI: Do not use MUI — we are actively removing MUI from the codebase. Do not import from @mui/* or @emotion/*
Legacy: Ant Design components remain in existing code but should be replaced with openmetadata-ui-core-components equivalents when refactoring
Do not add unnecessary spacing between logs and code.
In Java, avoid wildcards imports (e.g., use import java.util.List; instead of import java.util.*;)
Custom styles in .less files with component-specific naming (legacy pattern, avoid for new code)
Follow BEM naming convention for custom CSS classes when writing raw CSS

UI considerations

Do not use string literals at any place. You should use useTranslation hook and use it like const {t} = useTranslation(). And for example if you want to have "Run" as string, you should be using { t('label.run') }, this label is defined in locales.

Application Configuration

Applications use ApplicationsClassBase for schema loading and configuration
Dynamic imports handle application-specific schemas and assets
Form schemas use React JSON Schema Form (RJSF) with custom UI widgets

Service Utilities

Each service type has dedicated utility files (e.g., DatabaseServiceUtils.tsx)
Connection schemas are imported statically and pre-resolved
Service configurations use switch statements to map types to schemas

Type Safety

All API responses have generated TypeScript interfaces in generated/
Custom types extend base interfaces when needed
Avoid type assertions unless absolutely necessary
Use discriminated unions for action types and state variants

Database and Migrations

Flyway handles schema migrations in bootstrap/sql/migrations/
Use Docker containers for local database setup
Default MySQL, PostgreSQL supported as alternative
Sample data loaded automatically in development environment

Security and Authentication

JWT-based authentication with OAuth2/SAML support
Role-based access control defined in Java entities
Security configurations in conf/openmetadata.yaml
Never commit secrets - use environment variables or secure vaults

Code Generation Standards

Comments Policy

Do NOT add unnecessary comments - write self-documenting code
NEVER add single-line comments that describe what the code obviously does
Only include comments for:
- Complex business logic that isn't obvious
- Non-obvious algorithms or workarounds
- Public API JavaDoc documentation
- TODO/FIXME with ticket references
Bad examples (NEVER do this):
- // Create user before createUser()
- // Get client before SdkClients.adminClient()
- // Verify domain is set before assertNotNull(entity.getDomain())
- // User names are lowercased when the code toLowerCase() makes it obvious
If the code needs a comment to be understood, refactor the code to be clearer instead

Java Code Requirements

Always run mvn spotless:apply before you finish any task that touched .java files. CI runs mvn spotless:check and will fail the PR otherwise — the bot's exact suggestion is "Please run mvn spotless:apply in the root of your repository and commit the changes to this PR." Scope the run with -pl <module> for speed if only one module changed. When asked to "fix checkstyle" / "fix Java formatting" / "apply spotless", invoke the java-checkstyle skill (see .claude/skills/java-checkstyle/) rather than hand-editing formatting.

Method Size and Complexity (Kafka-Grade Standards)

Methods must be 15 lines or fewer (excluding blank lines and braces). If a method is longer, break it into smaller focused methods with descriptive names.

Maximum 3 levels of nesting. Use early returns to reduce nesting:

// BAD: deeply nested
if (entity != null) {
    if (entity.isActive()) {
        if (hasPermission(entity)) {
            process(entity);
        }
    }
}

// GOOD: early returns, flat
if (entity == null) return;
if (!entity.isActive()) return;
if (!hasPermission(entity)) return;
process(entity);

Maximum 10 cyclomatic complexity. Extract complex conditions into named methods:

// BAD: complex inline boolean
if (entity.getStatus() == ACTIVE && entity.getOwner() != null
    && !entity.isDeleted() && entity.getVersion() > 0.1) { ... }

// GOOD: self-documenting
if (isEligibleForProcessing(entity)) { ... }

Maximum 5 parameters. Introduce a parameter object or builder for more.
Each method does one thing. If you can describe what a method does using "and" or "then", it should be two methods.

Naming and Readability

Names should make code read like prose — if you need a comment, the name isn't good enough
Methods: verb phrases — calculateScore(), findByName(), isValid()
Booleans: question-form — isActive, hasPermission, canRetry (never flag, status, check)
Variables: descriptive, no abbreviations — entityReference not er, retryCount not rc
Constants: UPPER_SNAKE_CASE — MAX_RETRY_COUNT, DEFAULT_PAGE_SIZE
No single-letter variables except in short lambdas or loop indices

Immutability and Defensive Design

Use final on local variables and parameters that don't change (which is most of them)
Use final on fields set in the constructor
Return Collections.unmodifiableList() / List.copyOf() from public methods, never expose internal mutable collections
Utility classes must be final with a private constructor
Prefer record for immutable data carriers where appropriate

Error Handling

No empty catch blocks — at minimum, log the exception
No catch (Exception e) — catch the specific type you expect
No e.printStackTrace() — use the logger
Error messages must include context: "Table '%s' not found in database '%s'" not just "Not found"
No throw or return inside finally blocks — they mask the original exception
No exceptions for flow control — use conditionals for expected cases

No Magic Strings — Define Constants

Never use raw string literals in .equals(), .contains(), or switch cases — define a constant or use an existing enum
If an enum already exists in openmetadata-spec/ schemas for those values, use it
If the same string appears in more than one place, it must be a named constant
One definition, one location — don't define the same constant in multiple classes

Prefer enums over string constants when the values form a closed set:

// BAD: magic strings scattered everywhere
if (taskStatus.equals("Open")) { ... }
if (config.getResources().get(0).equals("all")) { ... }

// GOOD: use existing enums or define constants
if (taskStatus == TaskStatus.OPEN) { ... }
private static final String RESOURCE_ALL = "all";

No Convoluted if/else Chains

More than 3 else if branches means the structure is wrong — refactor:
- else if chain on instanceof → switch with pattern matching (Java 21)
- else if chain on enum values → switch expression
- else if chain on .equals("string") → Map dispatch or enum lookup
- else if chain on .contains("string") → Map or list of predicates

Repeated compound conditions (same multi-part &&/|| expression in multiple places) → extract into a named method or Set.contains()

// BAD: 3-part condition repeated 3 times across the file
if (!tenantId.equals("common") && !tenantId.equals("organizations")
    && !tenantId.equals("consumers")) { ... }

// GOOD: define once, use everywhere
private static final Set<String> MULTI_TENANT_IDS =
    Set.of("common", "organizations", "consumers");

private boolean isSingleTenant(String tenantId) {
    return !MULTI_TENANT_IDS.contains(tenantId);
}

No Code Duplication

If the same logic exists in two places, extract to a shared method
Near-identical methods (e.g., same logic for OpenSearch and ElasticSearch) should share a common implementation with only the engine-specific parts varying
Copy-pasted blocks within the same file should be extracted into a parameterized method

Class Size

Classes should be under 500 lines. Over 1000 lines is a design problem.
If a class is large, look for clusters of methods that operate on the same subset of fields — extract them into a new focused class
Resource classes should be thin orchestrators
Repository classes handle data access, not business logic

Modern Java (Java 21)

Use try-with-resources for all AutoCloseable objects
Use diamond operator <> — new ArrayList<>() not new ArrayList<String>()
Use pattern matching: if (obj instanceof String s) instead of cast
Use switch expressions instead of if/else if chains on enums or types
Use List.of(), Map.of(), Set.of() for immutable collection literals
Use Optional correctly: never as a field type, never as a parameter, never assign null to it
Use text blocks """ for multi-line strings

Common Bug Patterns to Avoid

equals() without hashCode() (or vice versa)
equals() on arrays — use Arrays.equals()
Ignoring return values of String.replace(), File.delete()
collection.size() == 0 — use collection.isEmpty()
String concatenation inside loops — use StringBuilder
synchronized on non-final fields — the lock reference can change
toLowerCase() without Locale — always use toLowerCase(Locale.ROOT)
Double map lookups — use computeIfAbsent() or getOrDefault()

Testing

Generate production-ready code, not tutorial code
Create integration tests in openmetadata-integration-tests for new API endpoints
Never use Thread.sleep() in tests — use condition-based waiting or Awaitility
Bug fixes must include a test that fails without the fix
90% line coverage target on changed classes

Structure

Do not use Fully Qualified Names in code (e.g., org.openmetadata.schema.type.Status) — import the class instead
Do not import wildcard packages — import exactly the required classes
No commented-out code — version control maintains history
No TODOs without a ticket reference
One statement per line — no if (x) return y; on one line

TypeScript/Frontend Code Requirements

Always run the UI checkstyle sequence before you finish any task that touched .ts/.tsx/.js/.jsx/.json under openmetadata-ui/src/main/resources/ui/src/, .../playwright/, or openmetadata-ui-core-components/src/main/resources/ui/src/. CI's UI Checkstyle / lint-src|lint-playwright|lint-core-components jobs fail the PR otherwise. The order matters — run organize-imports-cli, then eslint --fix, then prettier --write; reversing organize-imports and prettier leaves a dirty diff (organize-imports uses 4-space indentation, prettier uses 2 + trailing commas). When asked to "fix UI checkstyle" / "run prettier" / "fix UI lint", invoke the ui-checkstyle skill (see .claude/skills/ui-checkstyle/) rather than hand-editing formatting.

NEVER use any type in TypeScript code - always use proper types
Use unknown when the type is truly unknown and add type guards
Import types from existing type definitions (e.g., RJSFSchema from @rjsf/utils)
Add // eslint-disable-next-line comments only when absolutely necessary
Import Organization — use yarn organize-imports:cli to auto-sort. Order:
1. External libraries (React, etc.)
2. Internal absolute imports from generated/, constants/, hooks/, etc.
3. Relative imports for utilities and components
4. Asset imports (SVGs, styles)
5. Type imports grouped separately when needed

CI Checkstyle Rules (enforced on every PR)

These checks run automatically in CI. Code that violates them will not merge.

No console.log/warn/error — no-console rule is enforced. Use the logger or remove.
Use === not == — eqeqeq (smart mode, except for null checks)
Max 200 characters per line — break long lines
Self-closing components — <Div /> not <Div></Div>
Sort JSX props alphabetically — callbacks last
Space after // in comments — // comment not //comment
Blank lines before function, class, export, return statements
Use it() consistently in tests — don't mix test() and it()
Blank lines around describe, it, beforeEach in test files
JSON keys sorted alphabetically in locale files (src/locale/**/*.json)
Apache 2.0 license header on every new source file — run yarn license-header-fix
i18n keys synced — after adding keys to en-us.json, run yarn i18n to sync all 17 locales
Prettier formatting — 2-space indent, single quotes, strict HTML whitespace

Playwright Test Rules (lint-playwright)

No waitForLoadState('networkidle') — flaky, use web-first assertions
No page.pause() — remove before committing
No .only on tests — blocks all other tests in CI
Prefer expect(locator).toBeVisible() over manual waitForSelector checks
Don't use { force: true } — fix the locator instead
Use locators, not element handles

Python Code Requirements

Use pytest, not unittest - write tests using pytest style with plain assert statements
Use pytest fixtures for test setup instead of setUp/tearDown methods
Use unittest.mock for mocking (MagicMock, patch) - this is compatible with pytest
Test classes should not inherit from TestCase - use plain classes prefixed with Test
Use assert x == y instead of self.assertEqual(x, y)
Use assert x is None instead of self.assertIsNone(x)
Use assert "text" in string instead of self.assertIn("text", string)

Python Ingestion Connector Guidelines

Keep connector-specific logic in connector-specific files, not in generic/shared files like builders.py
Example: Redshift IAM auth should be in ingestion/src/metadata/ingestion/source/database/redshift/connection.py, not in ingestion/src/metadata/ingestion/connections/builders.py
This keeps the codebase modular and prevents generic utilities from becoming cluttered with connector-specific edge cases
Use model_str() for Pydantic RootModel to string conversion — OpenMetadata schema types like ColumnName, EntityName, FullyQualifiedEntityName, and UUID are Pydantic RootModel[str] subclasses where str() returns "root='value'" instead of the raw value. Always use model_str() from metadata.ingestion.ometa.utils instead of manual hasattr(x, "root") / str(x.root) checks.

Testing Philosophy

Test real behavior, not mock wiring - if a test requires mocking 3+ classes just to verify a method call, it's testing the wrong thing
Prefer integration tests over heavily-mocked unit tests. This project has full integration test infrastructure (OpenMetadataApplicationTest, Docker containers, real OpenSearch). Use it.
Mocks are for boundaries, not internals - mock external services (HTTP clients, third-party APIs), not your own classes. If you're mocking static methods left and right to test internal plumbing, write an integration test instead.
A test that mocks everything proves nothing - it only verifies that your mocks are wired correctly, not that the system works
Ask "what breaks if this test passes but the code is wrong?" - if the answer is "nothing, because everything real is mocked out", delete the test and write a better one
Test the outcome, not the implementation - assert on observable results (API responses, database state, stats values) rather than verifying internal method calls with verify()

Response Format

Provide clean code blocks without unnecessary explanations
Assume readers are experienced developers
Focus on functionality over education

26 KiB Raw Blame History

CLAUDE.md

About OpenMetadata

Architecture Overview

Environment Setup

Python Virtual Environment (REQUIRED)

Initial Dev Environment Setup

Other Environment Notes

Essential Development Commands

Prerequisites and Setup

Frontend Development

Frontend CI Checkstyle (run before PR to match CI)

Backend Development

Python Ingestion Development

Full Local Environment

Testing

Backend Integration Tests

Code Generation and Schemas

Schema Architecture

Key Directories

Development Workflow

Frontend Architecture Patterns

React Component Patterns

State Management

Styling

UI considerations

Application Configuration

Service Utilities

Type Safety

Database and Migrations

Security and Authentication

Code Generation Standards

Comments Policy

Java Code Requirements

Method Size and Complexity (Kafka-Grade Standards)

Naming and Readability

Immutability and Defensive Design

Error Handling

No Magic Strings — Define Constants

No Convoluted if/else Chains

No Code Duplication

Class Size

Modern Java (Java 21)

Common Bug Patterns to Avoid

Testing

Structure

TypeScript/Frontend Code Requirements

CI Checkstyle Rules (enforced on every PR)

Playwright Test Rules (lint-playwright)

Python Code Requirements

Python Ingestion Connector Guidelines

Testing Philosophy

Response Format

26 KiB

Raw Blame History