Elgato_dark/OpenMetadata: OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

mirror of https://github.com/open-metadata/OpenMetadata synced 2026-05-24 09:39:11 +00:00

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

Find a file

sonika-shah e91c90c144 fix: validate custom property name charset (#27808 ) * fix: validate custom property name charset Tighten custom property name validation to block characters that break downstream parsers, with verified empirical reproduction: - `"` causes HTTP 500 on PUT /metadata/types/{id} - `:` breaks CSV import — exporter writes `key:value;key:value`, importer splits at first colon, treats prefix as the field name - `^` breaks OpenSearch query when the name is in searchSettings.searchFields — Lucene reads `^` as the boost separator in `field^boost` - `$` breaks CSV import via java.util.regex.Matcher.replaceAll which interprets `$<letter>` as a backreference Adds a `customPropertyName` definition in basic.json and switches customProperty.json to reference it. Adds a defensive regex check in TypeRepository.validateProperty so the API returns 400 with a clear error message even if schema validation is bypassed. Tests cover allowed-charset acceptance, the four blocked characters, leading-character validation, max-length enforcement, and unbalanced brackets. * Update generated TypeScript types * test: add schema-vs-Java consistency test for custom property name Guards against drift between basic.json#customPropertyName and the TypeRepository regex/length constants. If either side is updated without the other, CI fails with a message pointing to both files. The Java validator is kept (better error message + covers internal callers that bypass the HTTP layer); the consistency test guarantees the two definitions cannot drift. * fix: extend custom property name charset after gap-coverage matrix Re-ran the matrix on previously-untested chars (+ ? * ~ ` \) across all 17 property types × create/patch/CSV/search: - + ? * ~ ` all pass cleanly on every operation × every property type — add to allow list - \ fails CSV roundtrip for entityReference and entityReferenceList types (escape inconsistency in CSV serialization) — add to block list Updates the regex, schema description, Java validator error message, and adds the new chars to the allow/block integration tests. Consistency unit tests in TypeRepositoryTest continue to pass. Final allow set: alphanumeric _ - . / & % # @ ! , ; = \| ' + ? * ~ ` space ( ) < > [ ] { } Final block set: " : ^ $ \ * Update generated TypeScript types * updated the custom property name validation * added name suffix in custom property name * lint fixes * include backslash in invalid char Co-authored-by: Copilot <copilot@github.com> * fixed the playwright issue Co-authored-by: Copilot <copilot@github.com> * lint fix * fix check style * Drop redundant Java validator for custom property name; tighten IT assertions Schema is the single source of truth: jsonschema2pojo emits @Pattern + @Size on CustomProperty.name from basic.json#/definitions/customPropertyName, and @Valid on TypeResource.addOrUpdateProperty enforces them at the HTTP boundary. The hand-written Pattern constant, validateCustomPropertyName, and the schema-vs-Java sync test were duplicating that rule and could never reach the HTTP user (Bean Validation always fires first via @Valid). Tighten the new TypeResourceIT cases from assertThrows(Exception.class) to assertThrows(InvalidRequestException.class) so a regression to a different exception type or status code fails loudly. * restrict few more special characters from Cp name * minor fix * Disallow & < > in custom property names; align IT cases Schema-side counterpart to the UI changes in the previous two commits: basic.json#/definitions/customPropertyName now blocks &, <, > alongside the existing " : ^ $ \\. The DOMPurify pass on the UI sanitizes &, <, > into HTML entities, which produced inconsistent persisted values; rejecting them at the schema layer prevents that drift across all write paths. IT updates: - Drop &, <, > from the allowed-charset cases (and the "withMatched(pair)And<more>" composite) - Add &, <, > to the disallowed-charset cases - Drop "<" leading-character case (now covered as a disallowed character) - Drop "<" / ">" unbalanced-bracket cases * Update generated TypeScript types * Close PATCH bypass for custom property name validation on Type Bean Validation runs for the dedicated PUT /types/{id} (addOrUpdateProperty) because the resource declares @Valid CustomProperty, and the createOrUpdate path can't carry customProperties at all (CreateType schema doesn't include the field). PATCH /types/{id} accepts an opaque JsonPatch, so @Valid never reaches into the resulting customProperties[] — a JSON Patch like [{"op":"add","path":"/customProperties/-","value":{"name":"bad:colon",...}}] persisted bad-named properties (verified live: HTTP 200 before this fix). Run Hibernate Validator programmatically inside TypeRepository.prepare() so every write path enforces the schema-derived @Pattern / @Size / @NotNull on each CustomProperty. The rule still lives only in basic.json — picked up via the generated @Pattern annotation, executed via ValidatorUtil.validate. Tests in TypeResourceIT: - test_patchCannotAddCustomPropertyWithDisallowedName — seeds a valid property to ensure /customProperties exists, then PATCHes appending a name with ':', asserts InvalidRequestException and verifies the bad name is not persisted - test_patchCanAddCustomPropertyWithValidName — guards against the fix rejecting valid PATCH-driven additions * Block * in custom property names — breaks ES field-path lookup When the property name appears in extension.<propertyName>^boost entries of searchSettings.searchFields, OpenSearch treats * as a field-path wildcard. The literal * field never matches its own wildcard pattern, so the field gets silently skipped from the query and Explore search returns no hit for the value. Bisected against the running server: of 12 candidate Lucene-special chars, only * actually breaks the mainline UI search flow. ? ~ ( ) { } [ ] / ! and space all returned hits via the searchFields path because OS looks up the field literally and only treats * as a wildcard at that layer. Updates the regex + description in basic.json/customProperty.json, the UI regex in regex.constants.ts, the validation message across 19 locales, the generated TS docstrings, the Playwright invalid-name fixtures and spec, and the IT TypeResourceIT case (withasterisk moves from allowed to disallowed). Validate only newly-added custom properties; isolate PATCH IT to fresh types prepare() previously validated the entire customProperties[] on every Type write. An upgraded instance with a legacy property whose name contained a now-banned char would then reject any subsequent PUT/PATCH on that type, even when the write only adds a different valid property. Move the name validation into TypeUpdater.updateCustomProperties() and scope it to the `added` list computed by recordListChange against the original entity. New properties are still validated; pre-existing names are left alone. Replace the IT PATCH cases' shared `topic` Type with a fresh, namespaced entity-category Type per test (createEntityTypeForTest). The shared `topic` was mutated concurrently by other tests in the class — combined with PATCH's lack of per-type locking, that produced lost-update races and flaky asserts. The fresh per-test type has customProperties: [] from creation, so the patch sets the array directly without a seed property. * chore: prettier formatting on the new asterisk-rejection test * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs: add + ? ~ ` to JSDoc allow-list to match the regex * fix(it): request customProperties field on read-back in PATCH IT Type.customProperties is a lazy field — TypeRepository.setFields only populates it when the request URL includes ?fields=customProperties. The default getTypeById helper omits the param, so the read-back always saw customProperties == null. That made test_patchCanAdd... fail (the just- persisted property wasn't visible) and made test_patchCannotAdd... pass for the wrong reason (would have stayed green even if the bad name had slipped through validation). Add a fields-aware getTypeById overload and use it in both PATCH cases. Empirically verified against the live server: good name returns 200 + appears in customProperties, bad name returns 400 + does not. * minor fix * playwright test fix * removed unecessary test * blocked ~ and / from custom property name * lint-fix * Block / and ~ in custom property names (JSON Pointer reservations) Forward slash and tilde are reserved by JSON Pointer (RFC 6901): / is the path separator and ~ is the escape lead-in (~0 = ~, ~1 = /). Allowing them in a property name shifts the burden onto every caller that builds a JSON Patch by string interpolation; a raw `/extension/${propertyName}` either splits into the wrong number of segments or contains an invalid escape sequence, and the server applies the patch to the wrong key (or 400s outright). This surfaced as a reproducible failure in the table-cp Playwright suite: the preceding test ended with `path: \`/extension/${propertyName}\`` where propertyName ended in `/`. The server addressed extension[name-without-/][""] instead of extension[name-with-/], returned 400, and TableClass.patch overwrote entityResponseData with the error body — stripping id and FQN. The next test fell into the search-based navigation path with an empty search term and timed out at 180s. Tighten the schema regex in openmetadata-spec/.../basic.json#customPropertyName to drop / and ~ from the allowed set; update the human-readable description in basic.json and customProperty.json to call out the RFC 6901 reservation. Move the with/slash and with~tilde cases from the allowed-charset IT to the disallowed-charset IT in TypeResourceIT. * Update generated TypeScript types * Use fresh per-test Type in custom-property name validation IT The five charset/length/lead-char tests added in this PR previously mutated the shared built-in TABLE_ENTITY_TYPE under @Execution(CONCURRENT). The PUT path acquires TYPE_PROPERTY_LOCKS so concurrent writes serialize, but relying on that lock for test isolation is fragile — the PATCH-driven IT in the same class already uses a per-test fresh Type via createEntityTypeForTest(client, ns, ...) for exactly this reason (see `1864b0a6ac`). Switch the five PUT tests to the same pattern so no test mutates a shared Type, eliminating cross-test coupling regardless of whether the server-side lock is in place. Tests affected: - test_customPropertyNameAllowedCharacters_succeeds - test_customPropertyNameDisallowedCharacters_fails - test_customPropertyNameMustStartWithAlphanumeric_fails - test_customPropertyNameTooLong_fails - test_customPropertyNameUnbalancedBrackets_succeeds * Align UI artifacts with the tightened custom-property-name regex Three small follow-ups flagged by reviewers: - regex.constants.ts: JSDoc above CUSTOM_PROPERTY_NAME_REGEX still listed / and ~ as allowed even though the pattern below was tightened to drop them. Update the comment to match the actual regex and call out the RFC 6901 reason so future edits don't reintroduce them. - CustomProperties.spec.ts: the "should accept a valid name with allowed special characters" test fed a hardcoded string containing ~ and /, which the new regex rejects — the assertion would fail. Drop those two characters so the input stays in the allowed set. - zh-cn.json: the Simplified Chinese translation of custom-property-name-validation was double-escaped (\\\" and \\\\), which would render to users as literal \" and \\ rather than " and \. Match the escaping pattern used by the other 18 locales. * addressed gitar comment --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Rohit0301 <rj03012002@gmail.com> Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Rohit Jain <60229265+Rohit0301@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>		2026-05-07 15:35:43 +05:30
.claude	feat(ingestion): restore connector-audit skill (#27688 )	2026-04-24 10:23:02 +02:00
.devcontainer	MINOR - DevContainer Setup for contribution (#26623 )	2026-03-20 08:27:30 +01:00
.github	docs(github): require issue link, design, tests, UI recording in PR template (#27891 )	2026-05-07 08:05:56 +02:00
bin	Set Indexing related executor threads priority to LOW (#27153 )	2026-04-15 11:28:47 -07:00
bootstrap	Fix/mcp oauth databricks (#27922 )	2026-05-06 07:30:42 +02:00
common	ISSUE #20212 - TestCase DP Propagation + Search Index Propagation Refactor & Issue (#26901 )	2026-04-03 17:32:53 +00:00
conf	Containers: batch container data-model column tag retrieval to avoid subtree fan-out (#27836 )	2026-04-30 20:55:55 -07:00
docker	Perf/redis cache metrics and indexes (#27499 )	2026-04-23 12:18:53 +02:00
docs	feat: Add auto-classification support for storage service containers (#26495 )	2026-04-24 06:29:16 -07:00
examples/python-sdk/data-quality	Create documentation resources for Data Quality as Code (closes #23800 ) (#24169 )	2025-11-11 10:25:42 +00:00
ingestion	fix: API response for TableColumnCountToBeBetween (#27900 )	2026-05-07 10:15:39 +05:30
openmetadata-airflow-apis	chore(ingestion): drop pylint, expand ruff (#27774 )	2026-04-28 07:21:59 +02:00
openmetadata-clients	fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs (#27940 )	2026-05-07 09:19:10 +00:00
openmetadata-dist	Deprecate OpenMetadata Java client in favor of new Java SDK (#26388 )	2026-03-10 21:30:39 -07:00
openmetadata-integration-tests	fix: validate custom property name charset (#27808 )	2026-05-07 15:35:43 +05:30
openmetadata-k8s-operator	fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs (#27940 )	2026-05-07 09:19:10 +00:00
openmetadata-mcp	Fix/mcp oauth databricks (#27922 )	2026-05-06 07:30:42 +02:00
openmetadata-sdk	Containers: batch container data-model column tag retrieval to avoid subtree fan-out (#27836 )	2026-04-30 20:55:55 -07:00
openmetadata-service	fix: validate custom property name charset (#27808 )	2026-05-07 15:35:43 +05:30
openmetadata-shaded-deps	fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs (#27940 )	2026-05-07 09:19:10 +00:00
openmetadata-spec	fix: validate custom property name charset (#27808 )	2026-05-07 15:35:43 +05:30
openmetadata-ui	fix: validate custom property name charset (#27808 )	2026-05-07 15:35:43 +05:30
openmetadata-ui-core-components	fix(grid): export GridItem component for external usage [skip-ci] (#27904 )	2026-05-05 12:24:22 +00:00
openspec	Task redesign (#25894 )	2026-04-23 15:52:30 +02:00
scripts	Reindex robustness: selective fields, cache fail-fast, stop actually stops (#27876 )	2026-05-04 13:22:15 -07:00
skills	docs(github): require issue link, design, tests, UI recording in PR template (#27891 )	2026-05-07 08:05:56 +02:00
.dockerignore	RDF, cleanup relations and remove unnecessary bindings, add distributed mode for RDF reindex (#26902 )	2026-04-14 13:24:41 -07:00
.git-blame-ignore-revs	Minor: update git-blmae-ignore-revs, and uncomment ClassificationResourceTest tests code (#14431 )	2023-12-18 19:16:29 -08:00
.gitignore	chore(ingestion): enable basedpyright across the codebase via baseline (#27755 )	2026-04-27 17:15:44 +02:00
.nojekyll	shahsank3t published a site update	2021-08-04 06:23:29 +00:00
.pre-commit-config.yaml	chore(ingestion): migrate to ruff for format + isort + unused-import (#27739 )	2026-04-27 10:05:28 +02:00
.snyk	Ignore _openmetadata_testutils from snyk (#21168 )	2025-05-13 18:01:05 +05:30
adr-incident-manager-governance-workflows.md	Task redesign (#25894 )	2026-04-23 15:52:30 +02:00
AGENTS.md	chore(ingestion): drop pylint, expand ruff (#27774 )	2026-04-28 07:21:59 +02:00
APPLICATION.md	Rename app 'preview' property to 'enabled' (#26170 )	2026-03-05 08:29:54 +01:00
CLAUDE.md	chore(ingestion): drop pylint, expand ruff (#27774 )	2026-04-28 07:21:59 +02:00
CODE_OF_CONDUCT.md	Fix #412 - Add code of conduct for OpenMetadata community	2021-09-06 18:57:17 -07:00
CONTRIBUTING.md	addded more detail on issue creation in contributors page (#16583 )	2024-06-09 14:02:36 -07:00
DEVELOPER.md	chore(ingestion): drop pylint, expand ruff (#27774 )	2026-04-28 07:21:59 +02:00
generate_ts.sh	Feature: Generate TS From JSON (#19823 )	2025-02-25 18:18:02 +05:30
INCIDENT_RESPONSE.md	Add threat model and incident response (#23603 )	2025-09-28 13:17:23 -07:00
LICENSE	OpenMetadata snapshot release 0.3	2021-08-01 14:27:44 -07:00
Makefile	Chore(UI): consolidated UI checkstyle fix commands and modify workflow comment (#27402 )	2026-04-16 17:18:22 +05:30
NOTICE	OpenMetadata snapshot release 0.3	2021-08-01 14:27:44 -07:00
package.json	fix: Resolve frontend security vulnerabilities in lodash and lodash-es (#27105 )	2026-04-07 07:55:25 +00:00
pom.xml	fix(security): upgrade Java dependencies to resolve CRITICAL and HIGH CVEs (#27940 )	2026-05-07 09:19:10 +00:00
README.md	Update README.md for column-level consistency (#24670 )	2025-12-03 07:59:18 -08:00
SECURITY.md	Update vulnerability reporting instructions in SECURITY.md (#25651 )	2026-01-30 14:03:09 -08:00
THREAT_MODEL.md	Add threat model and incident response (#23603 )	2025-09-28 13:17:23 -07:00
yarn.lock	fix: Resolve frontend security vulnerabilities in lodash and lodash-es (#27105 )	2026-04-07 07:55:25 +00:00

README.md

Empower your Data Journey with OpenMetadata

What is OpenMetadata?

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column-level lineage, and seamless team collaboration. It is one of the fastest-growing open-source projects with a vibrant community and adoption by a diverse set of companies in a variety of industry verticals. Based on Open Metadata Standards and APIs, supporting connectors to a wide range of data services, OpenMetadata enables end-to-end metadata management, giving you the freedom to unlock the value of your data assets.

Contents:

Features
Try our Sandbox
Install & Run
Roadmap
Documentation and Support
Contributors

OpenMetadata Consists of Four Main Components:

Metadata Schemas: These are the core definitions and vocabulary for metadata based on common abstractions and types. They also allow for custom extensions and properties to suit different use cases and domains.
Metadata Store: This is the central repository for storing and managing the metadata graph, which connects data assets, users, and tool-generated metadata in a unified way.
Metadata APIs: These are the interfaces for producing and consuming metadata, built on top of the metadata schemas. They enable seamless integration of user interfaces and tools, systems, and services with the metadata store.
Ingestion Framework: This is a pluggable framework for ingesting metadata from various sources and tools to the metadata store. It supports about 84+ connectors for data warehouses, databases, dashboard services, messaging services, pipeline services, and more.

Key Features of OpenMetadata

Data Discovery: Find and explore all your data assets in a single place using various strategies, such as keyword search, data associations, and advanced queries. You can search across tables, topics, dashboards, pipelines, and services.

Data Collaboration: Communicate, converse, and cooperate with other users and teams on data assets. You can get event notifications, send alerts, add announcements, create tasks, and use conversation threads.

Data Quality and Profiler: Measure and monitor the quality with no-code to build trust in your data. You can define and run data quality tests, group them into test suites, and view the results in an interactive dashboard. With powerful collaboration, make data quality a shared responsibility in your organization.

Data Governance: Enforce data policies and standards across your organization. You can define data domains and data products, assign owners and stakeholders, and classify data assets using tags and terms. Use powerful automation features to auto-classify your data.

Data Insights and KPIs: Use reports and platform analytics to understand how your organization's data is doing. Data Insights provides a single-pane view of all the key metrics to reflect the state of your data best. Define the Key Performance Indicators (KPIs) and set goals within OpenMetadata to work towards better documentation, ownership, and tiering. Alerts can be set against the KPIs to be received on a specified schedule.

Data Lineage: Track and visualize the origin and transformation of your data assets end-to-end. You can view column-level lineage, filter queries, and edit lineage manually using a no-code editor.

Data Documentation: Document your data assets and metadata entities using rich text, images, and links. You can also add comments and annotations and generate data dictionaries and data catalogs.

Data Observability: Monitor the health and performance of your data assets and pipelines. You can view metrics such as data freshness, data volume, data quality, and data latency. You can also set up alerts and notifications for any anomalies or failures.

Data Security: Secure your data and metadata using various authentication and authorization mechanisms. You can integrate with different identity providers for single sign-on and define roles and policies for access control.

Webhooks: Integrate with external applications and services using webhooks. You can register URLs to receive metadata event notifications and integrate with Slack, Microsoft Teams, and Google Chat.

Connectors: Ingest metadata from various sources and tools using connectors. OpenMetadata supports about 84+ connectors for data warehouses, databases, dashboard services, messaging services, pipeline services, and more.

Try our Sandbox

Take a look and play with sample data at http://sandbox.open-metadata.org

Install and Run OpenMetadata

Get up and running in a few minutes. See the OpenMetadata documentation for installation instructions.

Documentation and Support

We're here to help and make OpenMetadata even better! Check out OpenMetadata documentation for a complete description of OpenMetadata's features. Join our Slack Community to get in touch with us if you want to chat, need help, or discuss new feature requirements.

Contributors

We ❤️ all contributions, big and small! Check out our CONTRIBUTING guide to get started, and let us know how we can help.

Don't want to miss anything? Give the project a ⭐ 🚀

A HUGE THANK YOU to all our supporters!

Stargazers

License

OpenMetadata is released under Apache License, Version 2.0