mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
* chore(ingestion): drop pylint, expand ruff to Stage 2c
Replace pylint with a coherent ruff-only stack (Stage 2c of the modernize
roadmap). Pylint is dropped from dev deps and CI workflows; ruff selected
ruleset expanded to ~22 families covering style, bug catchers, hygiene,
and the pylint port (PLE/PLC/PLW/PLR with the noisy "too-many-X"
complexity caps + magic-value disabled).
What's selected (with rationale in pyproject.toml):
E, W, F, I, N — style + correctness baseline + naming
UP — pyupgrade (py>=3.10 modernizations)
B, C4, C90, RET, SIM, TRY — bug catchers
PIE, ICN, T20, TC, TID, PTH, PERF — hygiene
PLE, PLC, PLW, PLR — pylint port (PLR complexity caps ignored)
RUF — ruff-native (incl. RUF100 unused-noqa)
What's removed:
- .pylintrc (root) — duplicate of the ingestion pylint config
- [tool.pylint.*] block in ingestion/pyproject.toml (~140 lines)
- ingestion/plugins/{print_checker,import_checker}.py + tests + README
(replaced by built-in T20 + TID251 banned-api respectively)
- pylint dep from ingestion/setup.py and openmetadata-airflow-apis/pyproject.toml
- `make lint` Makefile target + the pylint invocation in py_format_check
- dead pylint TODO comment + ignored test entry in noxfile.py
Cwd-stable config: ruff is invoked both from the repo root (pre-commit,
CI) and from ingestion/ (`make py_format_check`). The `src`,
`extend-exclude`, and per-file-ignores entries are listed twice — once
relative to ingestion/ and once with the `ingestion/` prefix — so
first-party isort detection and exclusions match in both invocations.
Grandfathering: ran `ruff check --add-noqa` once + format-stable
iteration. ~12,130 noqa directives across ~1,400 files. Cleanup is
deferred to follow-up PRs that drop noqas one rule at a time.
Documentation sweep: replaced `make lint` references in CLAUDE.md,
AGENTS.md, DEVELOPER.md, copilot-instructions, and 6 SKILL files with
the apply+verify shape `make py_format && make py_format_check`.
`make py_format` is NOT a strict superset of pylint — it only applies
auto-fixable violations; `make py_format_check` catches the rest.
Basedpyright baseline regenerated: ruff format reflowed multi-line
signatures in ~70 files, shifting type-error column positions. The
basedpyright baseline matches by (file path, error code, range), so
column shifts caused 19 entries to mis-align. Net diff is small
(154 lines in/out of the 13MB baseline.json) — purely positional.
Verified locally:
- make py_format_check → All checks passed
- nox --no-venv -s static-checks → 0 errors, 0 warnings, 0 notes
* chore(ingestion): finish ruff swap — nox lint session + skill docs
Three remaining stale-tooling references after Stage 2c:
- `ingestion/noxfile.py` `lint` session was still calling `black --check`,
`isort --check-only`, `pycln --diff`. Those tools aren't installed
anywhere (we dropped them from dev deps). Replace with the ruff
equivalents that mirror `make py_format_check`.
- `skills/standards/code_style.md`: stack listed as `black + isort +
pycln`; line length claimed 88 (black default). Both wrong: stack is
ruff, line length is 120.
- `skills/connector-building/SKILL.md`: `make py_format` comment said
`# black + isort + pycln`. Same swap.
* chore(ingestion): keep main's baseline + globally ignore TRY400
Per gitar-bot's review on PR #27774:
1. Main's PR #27728 promoted ~60 `logger.warning()` → `logger.error()`
inside `except` blocks. Those changes landed on main with their own
baseline updates. Our PR doesn't promote anything — the merge from
origin/main brought those `error` calls along with their baseline
entries.
The bot interpreted the `# noqa: TRY400` we added next to those lines
as us silencing the rule case-by-case. Cleaner: globally ignore
TRY400 in pyproject.toml, with a comment explaining why the codebase's
`logger.error(...)` + separate `logger.debug(traceback.format_exc())`
pattern is intentional. Strip ~430 per-line `# noqa: TRY400` markers
from source.
2. Document that `S101` in `per-file-ignores` is a forward-looking
entry — flake8-bandit (`S`) is not yet selected, so the rule is
no-op today; the entry stays so when `S` lands later, tests don't
immediately error.
Reverts the platform pin and Linux Docker–generated baseline. Keep
main's baseline intact and let CI surface the exact column-shifted
entries; the team will decide whether to fix in-place (revert format
on affected files) or add per-line `# pyright: ignore` markers.
* chore(ingestion): regen baseline for new connector type debt
Main's baseline was stale relative to recently-added connectors
(McpConnection, CustomDriveConnection) that lack common attributes
like `hostPort`, `database`, `catalog` etc. — all sites that access
those attributes via the union-typed `serviceConnection.root.config`
fire `reportAttributeAccessIssue` errors that aren't baselined.
71 errors + 58 warnings absorbed. Local macOS regen; pushing to see
CI's drift count. Per the basedpyright-baseline-and-ci PR experience,
macOS↔Linux column drift on this size of regen has historically been
1-7 residuals.
582 lines
24 KiB
Markdown
582 lines
24 KiB
Markdown
# DEVELOPER.md — AI-Assisted Development Guide for OpenMetadata
|
|
|
|
This guide helps developers (and AI agents like Claude Code, Codex, Copilot) write correct, production-quality code in the OpenMetadata codebase. It covers the preferred workflow for each language, architecture patterns you must understand, and how to use the available skills.
|
|
|
|
For environment setup, build commands, and coding standards, see [CLAUDE.md](CLAUDE.md).
|
|
For connector-specific development, see [skills/README.md](skills/README.md).
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
- [Preferred Workflow by Language](#preferred-workflow-by-language)
|
|
- [Using the Skills](#using-the-skills)
|
|
- [Architecture Deep Dives](#architecture-deep-dives)
|
|
- [Schema-First Design](#schema-first-design)
|
|
- [Entity Model and Registry](#entity-model-and-registry)
|
|
- [REST Resource Pattern](#rest-resource-pattern)
|
|
- [JDBI3 Data Access Layer](#jdbi3-data-access-layer)
|
|
- [Database Migrations](#database-migrations)
|
|
- [Change Events and Audit](#change-events-and-audit)
|
|
- [Authorization (RBAC)](#authorization-rbac)
|
|
- [Search Infrastructure](#search-infrastructure)
|
|
- [Python Ingestion Topology](#python-ingestion-topology)
|
|
- [Frontend Patterns](#frontend-patterns)
|
|
- [Cross-Cutting Patterns](#cross-cutting-patterns)
|
|
|
|
---
|
|
|
|
## Preferred Workflow by Language
|
|
|
|
### Java Backend
|
|
|
|
```
|
|
1. /planning — Design the approach (which entities, endpoints, migrations?)
|
|
2. /tdd — Write a failing integration test in openmetadata-integration-tests/
|
|
3. Implement in openmetadata-service/
|
|
4. mvn spotless:apply
|
|
5. /test-enforcement — Verify 90% coverage, integration tests for all endpoints
|
|
6. /verification — Show passing test output
|
|
7. /code-review — Run java-reviewer agent for Kafka-grade quality check
|
|
```
|
|
|
|
**Key rules:**
|
|
- Start with the JSON schema if adding/modifying an entity (`openmetadata-spec/`)
|
|
- Always write the integration test first — it proves the API contract works
|
|
- Methods must be 15 lines or fewer, no magic strings, no convoluted if/else chains
|
|
- Run `mvn spotless:apply` before every commit
|
|
- Every new REST endpoint needs a corresponding `*IT.java` in `openmetadata-integration-tests/`
|
|
|
|
### React/TypeScript Frontend
|
|
|
|
```
|
|
1. /planning — Identify components, state management, API contracts
|
|
2. /tdd — Write Jest test for the component
|
|
3. Implement the component
|
|
4. yarn lint:fix
|
|
5. /test-enforcement — Verify Jest coverage + Playwright E2E if user-facing
|
|
6. /verification — Show lint + test output
|
|
7. /code-review — Run frontend-reviewer agent
|
|
```
|
|
|
|
**Key rules:**
|
|
- Use components from `openmetadata-ui-core-components`, never MUI
|
|
- All Tailwind classes use `tw:` prefix, all colors use CSS custom properties
|
|
- No string literals in JSX — use `t('label.key-name')` from `useTranslation()`
|
|
- No `any` type — use generated types from `generated/` or define proper interfaces
|
|
- New keys go in `locale/languages/en-us.json` using kebab-case under the appropriate namespace
|
|
- Run `yarn parse-schema` after any connection schema changes
|
|
|
|
### Python Ingestion
|
|
|
|
```
|
|
1. /planning — Choose connector architecture (SQLAlchemy vs REST vs SDK)
|
|
2. /connector-standards — Load the relevant standards
|
|
3. /tdd — Write pytest tests first
|
|
4. Implement using topology pattern
|
|
5. make py_format && make py_format_check
|
|
6. /test-enforcement — Verify 90% coverage
|
|
7. /verification — Show test + lint output
|
|
8. /connector-review — Full review against golden standards (for connectors)
|
|
```
|
|
|
|
**Key rules:**
|
|
- Use pytest style — plain `assert`, no `unittest.TestCase`
|
|
- Use the topology pattern (`ServiceSpec` + `TopologyNode`) for all connectors
|
|
- Keep connector-specific logic in the connector's directory, not in shared files
|
|
- Use generators (`yield`) for streaming entities — never accumulate in memory
|
|
- Implement pagination for all REST API calls
|
|
- Reuse HTTP sessions — create one `requests.Session()` per connector lifetime
|
|
|
|
---
|
|
|
|
## Using the Skills
|
|
|
|
### Slash Commands Quick Reference
|
|
|
|
| Command | When to use | What it does |
|
|
|---------|-------------|-------------|
|
|
| `/planning` | Starting any non-trivial task | Brainstorm approaches, get approval, create step-by-step plan |
|
|
| `/tdd` | Implementing any feature or fix | Guides RED-GREEN-REFACTOR cycle for Java/Python/TypeScript |
|
|
| `/test-enforcement` | Before creating a PR | Checks 90% coverage, integration tests, Playwright E2E |
|
|
| `/verification` | Before claiming "done" | Requires actual test output as evidence |
|
|
| `/code-review` | Before or during PR review | Two-stage review: spec compliance then code quality |
|
|
| `/systematic-debugging` | Bug with unclear root cause | 4-phase: gather evidence, hypothesize, verify, fix |
|
|
| `/playwright` | Adding E2E tests | Generates zero-flakiness Playwright tests following handbook |
|
|
| `/connector-standards` | Before connector work | Loads all 23 connector development standards |
|
|
| `/connector-review` | Reviewing connector PRs | Multi-agent review against golden standards |
|
|
| `/scaffold-connector` | Building a new connector | Generates JSON Schema, Python boilerplate, AI context |
|
|
| `/test-locally` | Testing in full environment | Builds and deploys local Docker stack |
|
|
|
|
### Workflow Routing
|
|
|
|
The `openmetadata-workflow` meta-skill (loaded at session start) routes tasks automatically:
|
|
|
|
| Task | Skills triggered in order |
|
|
|------|--------------------------|
|
|
| New feature (multi-file) | `/planning` -> `/tdd` -> `/test-enforcement` -> `/verification` |
|
|
| Bug fix | `/systematic-debugging` -> `/tdd` -> `/verification` |
|
|
| New API endpoint | `/planning` -> `/tdd` -> `/test-enforcement` (must include IT) |
|
|
| New connector | `/connector-standards` -> `/scaffold-connector` -> `/test-enforcement` |
|
|
| UI component | `/tdd` -> `/test-enforcement` (Jest + Playwright) |
|
|
| PR review | `/code-review` -> `/test-enforcement` |
|
|
|
|
---
|
|
|
|
## Architecture Deep Dives
|
|
|
|
### Schema-First Design
|
|
|
|
OpenMetadata uses a single-source-of-truth approach: JSON Schema definitions drive code generation across all languages.
|
|
|
|
**The pipeline:**
|
|
```
|
|
openmetadata-spec/src/main/resources/json/schema/
|
|
│
|
|
├── entity/ → Entity definitions (table.json, dashboard.json, ...)
|
|
│ ├── data/ → Data entities (table, topic, dashboard, pipeline, ...)
|
|
│ ├── services/ → Service entities (databaseService, dashboardService, ...)
|
|
│ │ └── connections/ → Connection configs per service type
|
|
│ ├── teams/ → Team/user entities
|
|
│ ├── policies/ → Governance entities
|
|
│ └── feed/ → Activity feed entities
|
|
│
|
|
├── api/ → API request/response objects (createTable.json, ...)
|
|
│
|
|
└── type/ → Shared type definitions (entityReference.json, tagLabel.json, ...)
|
|
|
|
↓ Code generation ↓
|
|
|
|
Java POJOs: jsonschema2pojo → openmetadata-spec/target/generated-sources/
|
|
Python models: datamodel-code-generator → ingestion/src/metadata/generated/
|
|
TypeScript: QuickType → openmetadata-ui/.../ui/src/generated/
|
|
UI forms: parseSchemas.js → resolved JSON for RJSF auto-rendering
|
|
```
|
|
|
|
**When you modify a schema:**
|
|
1. Edit the JSON schema in `openmetadata-spec/`
|
|
2. Run `make generate` (Python models)
|
|
3. Run `mvn clean install -pl openmetadata-spec` (Java POJOs)
|
|
4. Run `yarn parse-schema` (UI connection schemas only)
|
|
5. Add corresponding Flyway migration if the change affects the database
|
|
|
|
**Schema conventions:**
|
|
- `$id` must match the file path
|
|
- `title` is camelCase of the filename
|
|
- `javaType` follows `org.openmetadata.schema.{category}.{ClassName}`
|
|
- Use `$ref` for shared types
|
|
- Set `additionalProperties: false` on connection schemas
|
|
- Feature flags: `supportsMetadataExtraction`, `supportsProfiler`, `supportsDBTExtraction`, `supportsQueryComment`
|
|
|
|
### Entity Model and Registry
|
|
|
|
All entities implement `EntityInterface` and are registered in `Entity.java` — the central singleton registry.
|
|
|
|
**Entity.java key patterns:**
|
|
|
|
```java
|
|
// Entity type string constants (use these, never raw strings)
|
|
Entity.TABLE // "table"
|
|
Entity.DATABASE // "database"
|
|
Entity.DATABASE_SCHEMA // "databaseSchema"
|
|
Entity.DASHBOARD // "dashboard"
|
|
Entity.PIPELINE // "pipeline"
|
|
|
|
// Common field name constants (use these for field references)
|
|
Entity.FIELD_OWNERS
|
|
Entity.FIELD_TAGS
|
|
Entity.FIELD_DESCRIPTION
|
|
Entity.FIELD_DOMAINS
|
|
Entity.FIELD_FULLY_QUALIFIED_NAME
|
|
|
|
// FQN separator
|
|
Entity.SEPARATOR // "."
|
|
|
|
// Access repositories by entity type
|
|
EntityRepository<?> repo = Entity.getRepository(entityType);
|
|
|
|
// Build href for entity references
|
|
Entity.withHref(uriInfo, entityReference);
|
|
```
|
|
|
|
**Fully Qualified Names (FQN):**
|
|
|
|
Entities form a hierarchy with `.` separator:
|
|
```
|
|
databaseService.database.databaseSchema.table
|
|
databaseService.database.databaseSchema.table.column
|
|
dashboardService.dashboard
|
|
pipelineService.pipeline
|
|
```
|
|
|
|
Each `EntityRepository` must implement `setFullyQualifiedName()` to build the FQN from parent FQN + entity name.
|
|
|
|
### REST Resource Pattern
|
|
|
|
All entity REST resources extend `EntityResource<E, R extends EntityRepository<E>>`.
|
|
|
|
**Creating a new resource:**
|
|
|
|
```java
|
|
@Path("/v1/myEntities")
|
|
@Tag(name = "MyEntities")
|
|
@Produces(MediaType.APPLICATION_JSON)
|
|
@Consumes(MediaType.APPLICATION_JSON)
|
|
@Collection(name = "myEntities")
|
|
public class MyEntityResource extends EntityResource<MyEntity, MyEntityRepository> {
|
|
|
|
public static final String COLLECTION_PATH = "/v1/myEntities/";
|
|
public static final String FIELDS = "owners,tags,domain";
|
|
|
|
public MyEntityResource(Authorizer authorizer, Limits limits) {
|
|
super(Entity.MY_ENTITY, authorizer, limits);
|
|
}
|
|
|
|
// Inner class for list serialization (required, no body)
|
|
public static class MyEntityList extends ResultList<MyEntity> {}
|
|
|
|
// Standard endpoints are inherited from EntityResource:
|
|
// GET /v1/myEntities — list with pagination
|
|
// GET /v1/myEntities/{id} — get by ID
|
|
// GET /v1/myEntities/name/{fqn} — get by FQN
|
|
// POST /v1/myEntities — create
|
|
// PUT /v1/myEntities — create or update
|
|
// PATCH /v1/myEntities/{id} — JSON patch
|
|
// DELETE /v1/myEntities/{id} — soft/hard delete
|
|
}
|
|
```
|
|
|
|
**Conventions:**
|
|
- All methods receive `@Context SecurityContext` and `@Context UriInfo`
|
|
- Cursor-based pagination with `before`/`after` string params + `limit` int
|
|
- Field filtering via `fields` query param (comma-separated, maps to `FIELDS` constant)
|
|
- A dedicated `*Mapper` class handles Create DTO -> Entity mapping
|
|
- Override `getEntitySpecificOperations()` to register field-level view permissions
|
|
|
|
### JDBI3 Data Access Layer
|
|
|
|
OpenMetadata uses JDBI3 (not JPA/Hibernate) for database access. All repositories extend `EntityRepository<E>`.
|
|
|
|
**Creating a new repository:**
|
|
|
|
```java
|
|
@Slf4j
|
|
public class MyEntityRepository extends EntityRepository<MyEntity> {
|
|
|
|
public MyEntityRepository() {
|
|
super(
|
|
MyEntityResource.COLLECTION_PATH,
|
|
Entity.MY_ENTITY,
|
|
MyEntity.class,
|
|
Entity.getCollectionDAO().myEntityDAO(), // DAO interface
|
|
"", // patch fields
|
|
"" // put fields
|
|
);
|
|
supportsSearch = true; // enable ES indexing
|
|
}
|
|
|
|
// Required overrides:
|
|
|
|
@Override
|
|
public void setFullyQualifiedName(MyEntity entity) {
|
|
entity.setFullyQualifiedName(
|
|
FullyQualifiedName.build(entity.getService().getFullyQualifiedName(),
|
|
entity.getName()));
|
|
}
|
|
|
|
@Override
|
|
public void prepare(MyEntity entity, boolean update) {
|
|
// Validate references, populate service, resolve owners/tags
|
|
populateService(entity);
|
|
}
|
|
|
|
@Override
|
|
public void storeEntity(MyEntity entity, boolean update) {
|
|
store(entity, update);
|
|
}
|
|
|
|
@Override
|
|
public void storeRelationships(MyEntity entity) {
|
|
addServiceRelationship(entity, entity.getService());
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key patterns:**
|
|
- `@Transaction` annotation for multi-step writes
|
|
- `Entity.getCollectionDAO()` provides type-safe DAO access
|
|
- Override `getFieldsStrippedFromStorageJson()` to exclude computed fields from JSON storage
|
|
- Bulk operations: override `storeEntities()`, `clearEntitySpecificRelationshipsForMany()`, `storeEntitySpecificRelationshipsForMany()`
|
|
|
|
### Database Migrations
|
|
|
|
OpenMetadata uses a **hybrid migration system**: native SQL migrations tracked in `SERVER_CHANGE_LOG`, with Flyway SQL parsers for robust semicolon handling.
|
|
|
|
**Adding a new migration:**
|
|
|
|
1. Create the version directory:
|
|
```
|
|
bootstrap/sql/migrations/native/{version}/
|
|
├── mysql/schemaChanges.sql
|
|
└── postgres/schemaChanges.sql
|
|
```
|
|
|
|
2. Write both MySQL and PostgreSQL variants:
|
|
```sql
|
|
-- MySQL
|
|
ALTER TABLE my_entity ADD COLUMN new_field JSON;
|
|
ALTER TABLE my_entity ADD INDEX idx_new_field ((new_field->>'$.key'));
|
|
|
|
-- PostgreSQL
|
|
ALTER TABLE my_entity ADD COLUMN new_field JSONB;
|
|
CREATE INDEX idx_new_field ON my_entity ((new_field->>'key'));
|
|
```
|
|
|
|
**Rules:**
|
|
- Always one `schemaChanges.sql` per database per version — no numbered sub-files
|
|
- Always provide both MySQL and PostgreSQL variants
|
|
- Migrations are tracked in `SERVER_CHANGE_LOG` (keyed by version)
|
|
- Never add new `v0xx` Flyway files — always use the native path
|
|
- Migrations must be idempotent where possible (`IF NOT EXISTS`, etc.)
|
|
- Extension migrations go in `bootstrap/sql/migrations/extensions/{name}/`
|
|
|
|
### Change Events and Audit
|
|
|
|
Every non-GET API response triggers the change event system via `ChangeEventHandler` (a JAX-RS `ContainerResponseFilter`).
|
|
|
|
**Flow:**
|
|
```
|
|
HTTP Response (non-GET)
|
|
→ ChangeEventHandler.process()
|
|
→ Extract ChangeEvent from response
|
|
→ Set userName from SecurityContext
|
|
→ Mask PII in entity JSON
|
|
→ Persist to changeEventDAO (unless ENTITY_NO_CHANGE)
|
|
→ Send to WebsocketNotificationHandler (real-time UI)
|
|
```
|
|
|
|
**Event types (`EventType` enum):**
|
|
- `ENTITY_CREATED` — new entity
|
|
- `ENTITY_UPDATED` — modified entity
|
|
- `ENTITY_SOFT_DELETED` — soft delete
|
|
- `ENTITY_DELETED` — hard delete
|
|
- `ENTITY_NO_CHANGE` — no-op (not persisted)
|
|
|
|
**When adding a new entity type:** change events are automatic if your resource extends `EntityResource`. No manual wiring needed. However, `Entity.QUERY` and `Entity.WORKFLOW` events are excluded by default.
|
|
|
|
### Authorization (RBAC)
|
|
|
|
Authorization uses the `Authorizer` interface, injected into all resource classes.
|
|
|
|
**Pattern in resources:**
|
|
```java
|
|
// All mutating operations must authorize:
|
|
authorizer.authorize(securityContext, operationContext, resourceContext);
|
|
|
|
// OperationContext wraps the MetadataOperation enum:
|
|
new OperationContext(Entity.TABLE, MetadataOperation.CREATE)
|
|
new OperationContext(Entity.TABLE, MetadataOperation.EDIT_TAGS)
|
|
|
|
// ResourceContextInterface provides the target entity
|
|
```
|
|
|
|
**Key classes:**
|
|
- `Authorizer` — interface with `authorize()`, `authorizeAdmin()`, `authorizeAdminOrBot()`
|
|
- `DefaultAuthorizer` — production implementation using `PolicyEvaluator`
|
|
- `NoopAuthorizer` — allows everything (for testing)
|
|
- `PolicyEvaluator` — evaluates rules against `SubjectContext` + `ResourceContextInterface` + `OperationContext`
|
|
- `SubjectCache` — caches permission evaluations per user
|
|
|
|
**When adding new resources:** ensure all mutating methods call `authorizer.authorize()` before execution. Override `getEntitySpecificOperations()` in the resource to register field-level view permissions.
|
|
|
|
### Search Infrastructure
|
|
|
|
Entities are indexed in Elasticsearch 7.17+ or OpenSearch 2.6+ for discovery.
|
|
|
|
**Key patterns:**
|
|
- Set `supportsSearch = true` in the repository constructor to enable indexing
|
|
- Search index classes in `openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/`
|
|
- Each entity type has a corresponding `*Index.java` that defines the search document structure
|
|
- Reindexing triggered via the reindex API or on entity create/update
|
|
|
|
### Python Ingestion Topology
|
|
|
|
Connectors use a **declarative topology** that defines the entity traversal order.
|
|
|
|
**Core concept:**
|
|
```python
|
|
# Topology defines the execution graph:
|
|
class DatabaseServiceTopology(ServiceTopology):
|
|
root = TopologyNode(
|
|
producer="get_services",
|
|
stages=[NodeStage(type_=DatabaseService, ...)],
|
|
children=["database"],
|
|
)
|
|
database = TopologyNode(
|
|
producer="get_database_names",
|
|
stages=[NodeStage(type_=Database, processor="yield_database", ...)],
|
|
children=["database_schema"],
|
|
)
|
|
database_schema = TopologyNode(
|
|
producer="get_database_schema_names",
|
|
stages=[NodeStage(type_=DatabaseSchema, processor="yield_database_schema", ...)],
|
|
children=["table"],
|
|
)
|
|
table = TopologyNode(
|
|
producer="get_tables_name_and_type",
|
|
stages=[NodeStage(type_=Table, processor="yield_table", ...)],
|
|
)
|
|
```
|
|
|
|
**Key patterns:**
|
|
- `producer` is a method name on the source class that yields entity names
|
|
- `processor` is a method name that yields the entity objects
|
|
- `TopologyRunnerMixin` drives the depth-first traversal automatically
|
|
- `TopologyContext` tracks the current position in the hierarchy (for FQN building)
|
|
- `Either` monad wraps all results: `Either(right=entity)` or `Either(left=error)`
|
|
- Fingerprinting via `sourceHash`: CREATE if new, PATCH if changed, SKIP if identical
|
|
- Nodes with `threads=True` enable parallel processing
|
|
|
|
**Source class hierarchy:**
|
|
```
|
|
Source (abstract)
|
|
→ DatabaseServiceSource (defines topology + abstract methods)
|
|
→ CommonDbSourceService (SQL extraction via SQLAlchemy)
|
|
→ PostgresSource, MySQLSource, SnowflakeSource, ...
|
|
```
|
|
|
|
**ServiceSpec pattern:**
|
|
```python
|
|
ServiceSpec = DefaultDatabaseSpec(
|
|
metadata_source_class=BigquerySource,
|
|
lineage_source_class=BigqueryLineageSource,
|
|
usage_source_class=BigqueryUsageSource,
|
|
profiler_class=BigQueryProfiler,
|
|
sampler_class=BigQuerySampler,
|
|
)
|
|
```
|
|
|
|
### Frontend Patterns
|
|
|
|
#### i18n Key Structure
|
|
|
|
Translation keys live in `locale/languages/en-us.json`:
|
|
|
|
```json
|
|
{
|
|
"label": {
|
|
"add-entity": "Add {{entity}}",
|
|
"activity-feed": "Activity Feed",
|
|
"activity-feed-plural": "Activity Feeds",
|
|
"delete-entity": "Delete {{entity}}"
|
|
},
|
|
"message": {
|
|
"entity-deleted-successfully": "{{entity}} deleted successfully!"
|
|
},
|
|
"server": {
|
|
"unexpected-error": "An unexpected error occurred"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Conventions:**
|
|
- Keys use kebab-case: `add-data-product`, `activity-feed-and-task-plural`
|
|
- Namespaces: `label` (UI labels), `message` (user-facing messages), `server` (error messages)
|
|
- Interpolation: `{{paramName}}` double-brace mustache
|
|
- Plurals: append `-plural` suffix: `"activity"` / `"activity-plural"`
|
|
- Variants: `-uppercase`, `-lowercase`, `-with-colon`
|
|
|
|
**Usage:**
|
|
```tsx
|
|
const { t } = useTranslation();
|
|
// Simple
|
|
<span>{t('label.activity-feed')}</span>
|
|
// With parameter
|
|
<span>{t('label.add-entity', { entity: t('label.table') })}</span>
|
|
```
|
|
|
|
#### Component Library
|
|
|
|
Use `openmetadata-ui-core-components` for all new UI work:
|
|
- Components: Button, Input, Select, Modal, Table, Tabs, Pagination, Badge, Avatar, Checkbox, Dropdown, Form, Card, Tooltip, Toggle, Slider, Textarea, Tags
|
|
- Source: `openmetadata-ui-core-components/src/main/resources/ui/src/components/`
|
|
- Tailwind CSS v4 with `tw:` prefix for all utility classes
|
|
- CSS custom properties for design tokens (see `globals.css`)
|
|
|
|
#### State Management
|
|
|
|
| Scope | Tool | Example |
|
|
|-------|------|---------|
|
|
| Component-local | `useState` | Form inputs, toggle states |
|
|
| Feature-shared | Context providers | `ApplicationsProvider` |
|
|
| Global | Zustand stores | `useLimitStore`, `useWelcomeStore` |
|
|
|
|
#### Generated Types
|
|
|
|
TypeScript interfaces are generated from JSON schemas and live in:
|
|
```
|
|
openmetadata-ui/src/main/resources/ui/src/generated/
|
|
```
|
|
|
|
Always import from `generated/` for API response types. Never hand-write interfaces for schema-defined types.
|
|
|
|
---
|
|
|
|
## Cross-Cutting Patterns
|
|
|
|
### Design Patterns Used Across the Codebase
|
|
|
|
| Pattern | Where | Purpose |
|
|
|---------|-------|---------|
|
|
| Schema-first | Everywhere | JSON Schema drives all code generation |
|
|
| Topology | Python ingestion | Declarative traversal of entity hierarchies |
|
|
| Either monad | Python ingestion | Unified error handling without exceptions |
|
|
| Singledispatch | `MetadataRestSink` | Type-based routing for entity persistence |
|
|
| Registry | `Entity.java`, `Metrics` enum | Central lookup for entity types and metric implementations |
|
|
| Template method | Validators, repositories | Base class defines skeleton, subclasses fill in steps |
|
|
| Strategy via mixins | Profiler, sampler | SQA vs Pandas implementations composed via mixin |
|
|
| Dynamic import | Connectors, validators | Zero-config discovery by file path convention |
|
|
| Fingerprinting | Ingestion | `sourceHash` for incremental create/patch/skip |
|
|
| Mixin composition | `OpenMetadata` API client | 25+ specialized mixins for different entity operations |
|
|
| Factory | Interface/sampler/profiler | Create the right implementation for the service type |
|
|
| Cascade parsing | Lineage | SqlGlot -> SqlFluff -> SqlParse (each with timeout) |
|
|
|
|
### Performance Patterns
|
|
|
|
- **Pagination is mandatory** for all list APIs (REST and database)
|
|
- **Stream, don't accumulate** — use generators in Python, iterators in Java
|
|
- **Reuse HTTP sessions** — one `requests.Session()` per connector lifetime
|
|
- **Bound caches** with `lru_cache(maxsize=N)` or size-limited maps
|
|
- **Build lookup dictionaries in `prepare()`** for O(1) access instead of repeated iteration
|
|
- **Use `computeIfAbsent()`** instead of `containsKey()` + `get()` double lookups
|
|
- **No `Thread.sleep()` in tests** — use condition-based waiting
|
|
|
|
### Adding a New Entity (End-to-End Checklist)
|
|
|
|
1. **JSON Schema**: Create `openmetadata-spec/src/main/resources/json/schema/entity/{category}/{entity}.json`
|
|
2. **API Schema**: Create `openmetadata-spec/src/main/resources/json/schema/api/{category}/create{Entity}.json`
|
|
3. **Generate code**: `mvn clean install -pl openmetadata-spec` + `make generate`
|
|
4. **Entity constant**: Add `Entity.MY_ENTITY = "myEntity"` in `Entity.java`
|
|
5. **DAO**: Add `myEntityDAO()` method to `CollectionDAO`
|
|
6. **Repository**: Create `MyEntityRepository extends EntityRepository<MyEntity>`
|
|
7. **Mapper**: Create `MyEntityMapper`
|
|
8. **Resource**: Create `MyEntityResource extends EntityResource<MyEntity, MyEntityRepository>`
|
|
9. **Migration**: Create `bootstrap/sql/migrations/native/{version}/mysql/schemaChanges.sql` + postgres variant
|
|
10. **Search index**: Create `MyEntityIndex.java` if searchable
|
|
11. **Integration test**: Create `MyEntityIT extends BaseEntityIT<MyEntity, CreateMyEntity>` in `openmetadata-integration-tests/`
|
|
12. **Frontend**: Add generated types, API client methods, and UI components
|
|
13. **i18n**: Add labels to `en-us.json`
|
|
|
|
### Adding a New Connector (End-to-End Checklist)
|
|
|
|
1. **Connection schema**: `openmetadata-spec/src/main/resources/json/schema/entity/services/connections/{type}/{connector}.json`
|
|
2. **Service type enum**: Add to `{type}Service.json` `oneOf` list
|
|
3. **Generate code**: `make generate` + `mvn clean install -pl openmetadata-spec`
|
|
4. **ClassConverter** (if using `oneOf`): `openmetadata-service/src/main/java/org/openmetadata/service/secrets/converter/`
|
|
5. **Python source class**: `ingestion/src/metadata/ingestion/source/{type}/{connector}/`
|
|
- `connection.py` — connection handling
|
|
- `metadata.py` — metadata extraction
|
|
- `__init__.py` — ServiceSpec definition
|
|
6. **Unit tests**: `ingestion/tests/unit/topology/{type}/test_{connector}.py`
|
|
7. **UI integration**: Update `{type}ServiceUtils.tsx`, add MDX doc file
|
|
8. **Run**: `yarn parse-schema` for UI form generation
|