* chore(ingestion): drop pylint, expand ruff to Stage 2c
Replace pylint with a coherent ruff-only stack (Stage 2c of the modernize
roadmap). Pylint is dropped from dev deps and CI workflows; ruff selected
ruleset expanded to ~22 families covering style, bug catchers, hygiene,
and the pylint port (PLE/PLC/PLW/PLR with the noisy "too-many-X"
complexity caps + magic-value disabled).
What's selected (with rationale in pyproject.toml):
E, W, F, I, N — style + correctness baseline + naming
UP — pyupgrade (py>=3.10 modernizations)
B, C4, C90, RET, SIM, TRY — bug catchers
PIE, ICN, T20, TC, TID, PTH, PERF — hygiene
PLE, PLC, PLW, PLR — pylint port (PLR complexity caps ignored)
RUF — ruff-native (incl. RUF100 unused-noqa)
What's removed:
- .pylintrc (root) — duplicate of the ingestion pylint config
- [tool.pylint.*] block in ingestion/pyproject.toml (~140 lines)
- ingestion/plugins/{print_checker,import_checker}.py + tests + README
(replaced by built-in T20 + TID251 banned-api respectively)
- pylint dep from ingestion/setup.py and openmetadata-airflow-apis/pyproject.toml
- `make lint` Makefile target + the pylint invocation in py_format_check
- dead pylint TODO comment + ignored test entry in noxfile.py
Cwd-stable config: ruff is invoked both from the repo root (pre-commit,
CI) and from ingestion/ (`make py_format_check`). The `src`,
`extend-exclude`, and per-file-ignores entries are listed twice — once
relative to ingestion/ and once with the `ingestion/` prefix — so
first-party isort detection and exclusions match in both invocations.
Grandfathering: ran `ruff check --add-noqa` once + format-stable
iteration. ~12,130 noqa directives across ~1,400 files. Cleanup is
deferred to follow-up PRs that drop noqas one rule at a time.
Documentation sweep: replaced `make lint` references in CLAUDE.md,
AGENTS.md, DEVELOPER.md, copilot-instructions, and 6 SKILL files with
the apply+verify shape `make py_format && make py_format_check`.
`make py_format` is NOT a strict superset of pylint — it only applies
auto-fixable violations; `make py_format_check` catches the rest.
Basedpyright baseline regenerated: ruff format reflowed multi-line
signatures in ~70 files, shifting type-error column positions. The
basedpyright baseline matches by (file path, error code, range), so
column shifts caused 19 entries to mis-align. Net diff is small
(154 lines in/out of the 13MB baseline.json) — purely positional.
Verified locally:
- make py_format_check → All checks passed
- nox --no-venv -s static-checks → 0 errors, 0 warnings, 0 notes
* chore(ingestion): finish ruff swap — nox lint session + skill docs
Three remaining stale-tooling references after Stage 2c:
- `ingestion/noxfile.py` `lint` session was still calling `black --check`,
`isort --check-only`, `pycln --diff`. Those tools aren't installed
anywhere (we dropped them from dev deps). Replace with the ruff
equivalents that mirror `make py_format_check`.
- `skills/standards/code_style.md`: stack listed as `black + isort +
pycln`; line length claimed 88 (black default). Both wrong: stack is
ruff, line length is 120.
- `skills/connector-building/SKILL.md`: `make py_format` comment said
`# black + isort + pycln`. Same swap.
* chore(ingestion): keep main's baseline + globally ignore TRY400
Per gitar-bot's review on PR #27774:
1. Main's PR #27728 promoted ~60 `logger.warning()` → `logger.error()`
inside `except` blocks. Those changes landed on main with their own
baseline updates. Our PR doesn't promote anything — the merge from
origin/main brought those `error` calls along with their baseline
entries.
The bot interpreted the `# noqa: TRY400` we added next to those lines
as us silencing the rule case-by-case. Cleaner: globally ignore
TRY400 in pyproject.toml, with a comment explaining why the codebase's
`logger.error(...)` + separate `logger.debug(traceback.format_exc())`
pattern is intentional. Strip ~430 per-line `# noqa: TRY400` markers
from source.
2. Document that `S101` in `per-file-ignores` is a forward-looking
entry — flake8-bandit (`S`) is not yet selected, so the rule is
no-op today; the entry stays so when `S` lands later, tests don't
immediately error.
Reverts the platform pin and Linux Docker–generated baseline. Keep
main's baseline intact and let CI surface the exact column-shifted
entries; the team will decide whether to fix in-place (revert format
on affected files) or add per-line `# pyright: ignore` markers.
* chore(ingestion): regen baseline for new connector type debt
Main's baseline was stale relative to recently-added connectors
(McpConnection, CustomDriveConnection) that lack common attributes
like `hostPort`, `database`, `catalog` etc. — all sites that access
those attributes via the union-typed `serviceConnection.root.config`
fire `reportAttributeAccessIssue` errors that aren't baselined.
71 errors + 58 warnings absorbed. Local macOS regen; pushing to see
CI's drift count. Per the basedpyright-baseline-and-ci PR experience,
macOS↔Linux column drift on this size of regen has historically been
1-7 residuals.
25 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
About OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance. This is a multi-module project with Java backend services, React frontend, Python ingestion framework, and comprehensive Docker infrastructure.
For architecture deep dives, entity/repository/resource patterns, and end-to-end checklists for adding new entities or connectors, see DEVELOPER.md.
Architecture Overview
- Backend: Java 21 + Dropwizard REST API framework, multi-module Maven project
- Frontend: React + TypeScript, built with Webpack and Yarn; component library via
openmetadata-ui-core-components(Tailwind CSS v4 withtw:prefix, react-aria-components foundation) - Ingestion: Python 3.10-3.11 with Pydantic 2.x, 75+ data source connectors
- Database: MySQL (default) or PostgreSQL with Flyway migrations
- Search: Elasticsearch 7.17+ or OpenSearch 2.6+ for metadata discovery
- Infrastructure: Apache Airflow for workflow orchestration
Environment Setup
Python Virtual Environment (REQUIRED)
You MUST activate the Python venv before any Python work. OpenMetadata supports Python 3.10-3.11; 3.11 is recommended.
# First-time setup (creates venv at repo root):
# python3.11 -m venv env
# ALWAYS activate before running Python, make generate, make install_dev, etc:
source env/bin/activate
# Verify:
python --version # Should show Python 3.10.x or 3.11.x
In worktrees: When Claude Code creates a Git worktree, the venv from the main repo is NOT copied. You need to either:
- Create a new venv in the worktree:
python3.11 -m venv env && source env/bin/activate && cd ingestion && make install_dev - Or symlink the main repo's venv:
ln -s /path/to/main-repo/env env
Initial Dev Environment Setup
After activating the venv, install all dependencies:
source env/bin/activate
# Install ingestion module with all dev dependencies (required before make generate)
cd ingestion
make install_dev_env # Full dev environment (edit mode + all extras)
# OR for lighter install:
make install_dev # Just dev dependencies
cd ..
# Generate Pydantic models from JSON schemas (required after schema changes)
make generate
# Install UI dependencies
make yarn_install_cache
Other Environment Notes
- Java: Java 21 required. Use
mvn(Maven) for backend builds. - Node/Yarn: Use
yarn(notnpm) for frontend. Frontend root isopenmetadata-ui/src/main/resources/ui/. - Docker services: Development services (MySQL, Elasticsearch, etc.) run via
docker/development/docker-compose.yml:docker compose -f docker/development/docker-compose.yml up -d
Essential Development Commands
Prerequisites and Setup
make prerequisites # Check system requirements
source env/bin/activate # ALWAYS activate venv first
cd ingestion && make install_dev_env # Install Python dev dependencies
make generate # Generate Pydantic models from JSON schemas
make yarn_install_cache # Install UI dependencies
Frontend Development
cd openmetadata-ui/src/main/resources/ui
yarn start # Start development server on localhost:3000
yarn test # Run Jest unit tests
yarn test path/to/test.spec.ts # Run a specific test file
yarn test:watch # Run tests in watch mode
yarn playwright:run # Run E2E tests
yarn lint # ESLint check
yarn lint:fix # ESLint with auto-fix
yarn build # Production build
Frontend CI Checkstyle (run before PR to match CI)
cd openmetadata-ui/src/main/resources/ui
yarn ui-checkstyle:changed # One-shot checkstyle for changed files (excludes tsc)
yarn organize-imports:cli <files> # Sort and organize imports
yarn lint:fix # ESLint auto-fix
yarn pretty:base --write <files> # Prettier formatting
yarn license-header-fix <files> # Add Apache 2.0 license headers
yarn i18n # Sync all 17 locale files with en-us.json
yarn generate:app-docs # Regenerate application documentation
npx tsc --noEmit # TypeScript type check (catches errors early)
Backend Development
mvn clean package -DskipTests # Build without tests
mvn clean package -DonlyBackend -pl !openmetadata-ui # Backend only
mvn test # Run unit tests
mvn verify # Run integration tests
mvn spotless:apply # Format Java code
Python Ingestion Development
cd ingestion
make install_dev_env # Install in development mode
make generate # Generate Pydantic models from JSON schemas
make unit_ingestion_dev_env # Run unit tests
make py_format # Apply ruff lint-fix + format
make py_format_check # Verify lint + format (matches CI; catches non-auto-fixable issues)
make static-checks # Run type checking with basedpyright
Full Local Environment
./docker/run_local_docker.sh -m ui -d mysql # Complete local setup with UI
./docker/run_local_docker.sh -m no-ui -d postgresql # Backend only with PostgreSQL
./docker/run_local_docker.sh -s true # Skip Maven build step
Testing
make run_e2e_tests # Full E2E test suite
make unit_ingestion # Python unit tests with coverage
yarn test:coverage # Frontend test coverage
Backend Integration Tests
All backend API integration tests MUST be placed in openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ directory. Tests should:
- Use naming convention
*IT.java(Integration Test) - Extend
BaseEntityIT<T, K>for entity CRUD tests - Be designed to run concurrently (use
@Execution(ExecutionMode.CONCURRENT)) - Use
TestNamespacefor test isolation - Use
SdkClientsfor API calls (e.g.,SdkClients.adminClient().tables().create(...))
# Run a specific integration test
mvn test -pl openmetadata-integration-tests -Dtest=TaskResourceIT
# Run all integration tests
mvn test -pl openmetadata-integration-tests
Code Generation and Schemas
OpenMetadata uses a schema-first approach with JSON Schema definitions driving code generation:
make generate # Generate all models from schemas
make py_antlr # Generate Python ANTLR parsers
make js_antlr # Generate JavaScript ANTLR parsers
yarn parse-schema # Parse JSON schemas for frontend (connection and ingestion schemas)
Schema Architecture
- Source schemas in
openmetadata-spec/define the canonical data models - Connection schemas are pre-processed at build time via
parseSchemas.jsto resolve all$refreferences - Application schemas in
openmetadata-ui/.../ApplicationSchemas/are resolved at runtime usingschemaResolver.ts - JSON schemas with
$refreferences to external files require resolution before use in forms
Key Directories
openmetadata-service/- Core Java backend services and REST APIsopenmetadata-ui/src/main/resources/ui/- React frontend applicationingestion/- Python ingestion framework with connectorsopenmetadata-spec/- JSON Schema specifications for all entitiesbootstrap/sql/- Database schema migrations and sample dataconf/- Configuration files for different environmentsdocker/- Docker configurations for local and production deployment
Development Workflow
- Schema Changes: Modify JSON schemas in
openmetadata-spec/, then runmvn clean installon openmetadata-spec to update models - Backend: Develop in Java using Dropwizard patterns, test with
mvn test, format withmvn spotless:apply - Frontend: Use React/TypeScript with components from
openmetadata-ui-core-components, test with Jest/Playwright - Ingestion: Python connectors follow plugin pattern, use
make install_dev_envfor development - Full Testing: Use
make run_e2e_testsbefore major changes
Frontend Architecture Patterns
React Component Patterns
- File Naming: Components use
ComponentName.component.tsx, interfaces useComponentName.interface.ts - State Management: Use
useStatewith proper typing, avoidany - Side Effects: Use
useEffectwith proper dependency arrays - Performance: Use
useCallbackfor event handlers,useMemofor expensive computations - Custom Hooks: Prefix with
use, place insrc/hooks/, return typed objects - Internationalization: Use
useTranslationhook from react-i18next, access witht('key') - Component Structure: Functional components only, no class components
- Props: Define interfaces for all component props, place in
.interface.tsfiles - Loading States: Use object state for multiple loading states:
useState<Record<string, boolean>>({}) - Error Handling: Use
showErrorToastandshowSuccessToastutilities from ToastUtils - Navigation: Use
useNavigatefrom react-router-dom, not direct history manipulation - Data Fetching: Async functions with try-catch blocks, update loading states appropriately
State Management
- Use Zustand stores for global state (e.g.,
useLimitStore,useWelcomeStore) - Keep component state local when possible with
useState - Use context providers for feature-specific shared state (e.g.,
ApplicationsProvider)
Styling
- Component Library: Use components from
openmetadata-ui-core-componentsfor all new UI work. This is the canonical component library — do not use MUI or introduce new MUI dependencies. - Available Components: Button, Input, Select, Modal, Table, Tabs, Pagination, Badge, Avatar, Checkbox, Dropdown, Form, Card, Tooltip, Toggle, Slider, Textarea, Tags, and more — all in
openmetadata-ui-core-components/src/main/resources/ui/src/components/ - Tailwind Classes: All Tailwind utility classes must use the
tw:prefix (e.g.,tw:flex,tw:text-sm,tw:bg-blue-500) to avoid conflicts with existing Ant Design/Less styles - Design Tokens: Use CSS custom properties defined in
openmetadata-ui-core-components/src/main/resources/ui/src/styles/globals.css. Never use hardcoded color or spacing values. Semantic tokens include:- Text:
--color-text-primary,--color-text-secondary,--color-text-tertiary,--color-text-error-primary, etc. - Border:
--color-border-primary,--color-border-secondary,--color-border-error,--color-border-brand, etc. - Background:
--color-bg-primary,--color-bg-secondary,--color-bg-error-primary,--color-bg-brand-solid, etc. - Shadows:
--shadow-xsthrough--shadow-3xl - Border radius:
--radius-nonethrough--radius-full
- Text:
- MUI: Do not use MUI — we are actively removing MUI from the codebase. Do not import from
@mui/*or@emotion/* - Legacy: Ant Design components remain in existing code but should be replaced with
openmetadata-ui-core-componentsequivalents when refactoring - Do not add unnecessary spacing between logs and code.
- In Java, avoid wildcards imports (e.g., use
import java.util.List;instead ofimport java.util.*;) - Custom styles in
.lessfiles with component-specific naming (legacy pattern, avoid for new code) - Follow BEM naming convention for custom CSS classes when writing raw CSS
UI considerations
- Do not use string literals at any place. You should use useTranslation hook and use it like const {t} = useTranslation(). And for example if you want to have "Run" as string, you should be using { t('label.run') }, this label is defined in locales.
Application Configuration
- Applications use
ApplicationsClassBasefor schema loading and configuration - Dynamic imports handle application-specific schemas and assets
- Form schemas use React JSON Schema Form (RJSF) with custom UI widgets
Service Utilities
- Each service type has dedicated utility files (e.g.,
DatabaseServiceUtils.tsx) - Connection schemas are imported statically and pre-resolved
- Service configurations use switch statements to map types to schemas
Type Safety
- All API responses have generated TypeScript interfaces in
generated/ - Custom types extend base interfaces when needed
- Avoid type assertions unless absolutely necessary
- Use discriminated unions for action types and state variants
Database and Migrations
- Flyway handles schema migrations in
bootstrap/sql/migrations/ - Use Docker containers for local database setup
- Default MySQL, PostgreSQL supported as alternative
- Sample data loaded automatically in development environment
Security and Authentication
- JWT-based authentication with OAuth2/SAML support
- Role-based access control defined in Java entities
- Security configurations in
conf/openmetadata.yaml - Never commit secrets - use environment variables or secure vaults
Code Generation Standards
Comments Policy
- Do NOT add unnecessary comments - write self-documenting code
- NEVER add single-line comments that describe what the code obviously does
- Only include comments for:
- Complex business logic that isn't obvious
- Non-obvious algorithms or workarounds
- Public API JavaDoc documentation
- TODO/FIXME with ticket references
- Bad examples (NEVER do this):
// Create userbeforecreateUser()// Get clientbeforeSdkClients.adminClient()// Verify domain is setbeforeassertNotNull(entity.getDomain())// User names are lowercasedwhen the codetoLowerCase()makes it obvious
- If the code needs a comment to be understood, refactor the code to be clearer instead
Java Code Requirements
Always run mvn spotless:apply when generating/modifying .java files.
Method Size and Complexity (Kafka-Grade Standards)
- Methods must be 15 lines or fewer (excluding blank lines and braces). If a method is longer, break it into smaller focused methods with descriptive names.
- Maximum 3 levels of nesting. Use early returns to reduce nesting:
// BAD: deeply nested if (entity != null) { if (entity.isActive()) { if (hasPermission(entity)) { process(entity); } } } // GOOD: early returns, flat if (entity == null) return; if (!entity.isActive()) return; if (!hasPermission(entity)) return; process(entity); - Maximum 10 cyclomatic complexity. Extract complex conditions into named methods:
// BAD: complex inline boolean if (entity.getStatus() == ACTIVE && entity.getOwner() != null && !entity.isDeleted() && entity.getVersion() > 0.1) { ... } // GOOD: self-documenting if (isEligibleForProcessing(entity)) { ... } - Maximum 5 parameters. Introduce a parameter object or builder for more.
- Each method does one thing. If you can describe what a method does using "and" or "then", it should be two methods.
Naming and Readability
- Names should make code read like prose — if you need a comment, the name isn't good enough
- Methods: verb phrases —
calculateScore(),findByName(),isValid() - Booleans: question-form —
isActive,hasPermission,canRetry(neverflag,status,check) - Variables: descriptive, no abbreviations —
entityReferencenoter,retryCountnotrc - Constants:
UPPER_SNAKE_CASE—MAX_RETRY_COUNT,DEFAULT_PAGE_SIZE - No single-letter variables except in short lambdas or loop indices
Immutability and Defensive Design
- Use
finalon local variables and parameters that don't change (which is most of them) - Use
finalon fields set in the constructor - Return
Collections.unmodifiableList()/List.copyOf()from public methods, never expose internal mutable collections - Utility classes must be
finalwith a private constructor - Prefer
recordfor immutable data carriers where appropriate
Error Handling
- No empty catch blocks — at minimum, log the exception
- No
catch (Exception e)— catch the specific type you expect - No
e.printStackTrace()— use the logger - Error messages must include context:
"Table '%s' not found in database '%s'"not just"Not found" - No
throworreturninsidefinallyblocks — they mask the original exception - No exceptions for flow control — use conditionals for expected cases
No Magic Strings — Define Constants
- Never use raw string literals in
.equals(),.contains(), orswitchcases — define a constant or use an existing enum - If an enum already exists in
openmetadata-spec/schemas for those values, use it - If the same string appears in more than one place, it must be a named constant
- One definition, one location — don't define the same constant in multiple classes
- Prefer enums over string constants when the values form a closed set:
// BAD: magic strings scattered everywhere if (taskStatus.equals("Open")) { ... } if (config.getResources().get(0).equals("all")) { ... } // GOOD: use existing enums or define constants if (taskStatus == TaskStatus.OPEN) { ... } private static final String RESOURCE_ALL = "all";
No Convoluted if/else Chains
- More than 3
else ifbranches means the structure is wrong — refactor:else ifchain oninstanceof→switchwith pattern matching (Java 21)else ifchain on enum values →switchexpressionelse ifchain on.equals("string")→Mapdispatch or enum lookupelse ifchain on.contains("string")→Mapor list of predicates
- Repeated compound conditions (same multi-part
&&/||expression in multiple places) → extract into a named method orSet.contains()// BAD: 3-part condition repeated 3 times across the file if (!tenantId.equals("common") && !tenantId.equals("organizations") && !tenantId.equals("consumers")) { ... } // GOOD: define once, use everywhere private static final Set<String> MULTI_TENANT_IDS = Set.of("common", "organizations", "consumers"); private boolean isSingleTenant(String tenantId) { return !MULTI_TENANT_IDS.contains(tenantId); }
No Code Duplication
- If the same logic exists in two places, extract to a shared method
- Near-identical methods (e.g., same logic for OpenSearch and ElasticSearch) should share a common implementation with only the engine-specific parts varying
- Copy-pasted blocks within the same file should be extracted into a parameterized method
Class Size
- Classes should be under 500 lines. Over 1000 lines is a design problem.
- If a class is large, look for clusters of methods that operate on the same subset of fields — extract them into a new focused class
- Resource classes should be thin orchestrators
- Repository classes handle data access, not business logic
Modern Java (Java 21)
- Use try-with-resources for all
AutoCloseableobjects - Use diamond operator
<>—new ArrayList<>()notnew ArrayList<String>() - Use pattern matching:
if (obj instanceof String s)instead of cast - Use
switchexpressions instead ofif/else ifchains on enums or types - Use
List.of(),Map.of(),Set.of()for immutable collection literals - Use
Optionalcorrectly: never as a field type, never as a parameter, never assignnullto it - Use text blocks
"""for multi-line strings
Common Bug Patterns to Avoid
equals()withouthashCode()(or vice versa)equals()on arrays — useArrays.equals()- Ignoring return values of
String.replace(),File.delete() collection.size() == 0— usecollection.isEmpty()- String concatenation inside loops — use
StringBuilder synchronizedon non-final fields — the lock reference can changetoLowerCase()withoutLocale— always usetoLowerCase(Locale.ROOT)- Double map lookups — use
computeIfAbsent()orgetOrDefault()
Testing
- Generate production-ready code, not tutorial code
- Create integration tests in
openmetadata-integration-testsfor new API endpoints - Never use
Thread.sleep()in tests — use condition-based waiting orAwaitility - Bug fixes must include a test that fails without the fix
- 90% line coverage target on changed classes
Structure
- Do not use Fully Qualified Names in code (e.g.,
org.openmetadata.schema.type.Status) — import the class instead - Do not import wildcard packages — import exactly the required classes
- No commented-out code — version control maintains history
- No TODOs without a ticket reference
- One statement per line — no
if (x) return y;on one line
TypeScript/Frontend Code Requirements
- NEVER use
anytype in TypeScript code - always use proper types - Use
unknownwhen the type is truly unknown and add type guards - Import types from existing type definitions (e.g.,
RJSFSchemafrom@rjsf/utils) - Add
// eslint-disable-next-linecomments only when absolutely necessary - Import Organization — use
yarn organize-imports:clito auto-sort. Order:- External libraries (React, etc.)
- Internal absolute imports from
generated/,constants/,hooks/, etc. - Relative imports for utilities and components
- Asset imports (SVGs, styles)
- Type imports grouped separately when needed
CI Checkstyle Rules (enforced on every PR)
These checks run automatically in CI. Code that violates them will not merge.
- No
console.log/warn/error—no-consolerule is enforced. Use the logger or remove. - Use
===not==—eqeqeq(smart mode, except fornullchecks) - Max 200 characters per line — break long lines
- Self-closing components —
<Div />not<Div></Div> - Sort JSX props alphabetically — callbacks last
- Space after
//in comments —// commentnot//comment - Blank lines before
function,class,export,returnstatements - Use
it()consistently in tests — don't mixtest()andit() - Blank lines around
describe,it,beforeEachin test files - JSON keys sorted alphabetically in locale files (
src/locale/**/*.json) - Apache 2.0 license header on every new source file — run
yarn license-header-fix - i18n keys synced — after adding keys to
en-us.json, runyarn i18nto sync all 17 locales - Prettier formatting — 2-space indent, single quotes, strict HTML whitespace
Playwright Test Rules (lint-playwright)
- No
waitForLoadState('networkidle')— flaky, use web-first assertions - No
page.pause()— remove before committing - No
.onlyon tests — blocks all other tests in CI - Prefer
expect(locator).toBeVisible()over manualwaitForSelectorchecks - Don't use
{ force: true }— fix the locator instead - Use locators, not element handles
Python Code Requirements
- Use pytest, not unittest - write tests using pytest style with plain
assertstatements - Use pytest fixtures for test setup instead of
setUp/tearDownmethods - Use
unittest.mockfor mocking (MagicMock, patch) - this is compatible with pytest - Test classes should not inherit from
TestCase- use plain classes prefixed withTest - Use
assert x == yinstead ofself.assertEqual(x, y) - Use
assert x is Noneinstead ofself.assertIsNone(x) - Use
assert "text" in stringinstead ofself.assertIn("text", string)
Python Ingestion Connector Guidelines
- Keep connector-specific logic in connector-specific files, not in generic/shared files like
builders.py - Example: Redshift IAM auth should be in
ingestion/src/metadata/ingestion/source/database/redshift/connection.py, not iningestion/src/metadata/ingestion/connections/builders.py - This keeps the codebase modular and prevents generic utilities from becoming cluttered with connector-specific edge cases
- Use
model_str()for Pydantic RootModel to string conversion — OpenMetadata schema types likeColumnName,EntityName,FullyQualifiedEntityName, andUUIDare PydanticRootModel[str]subclasses wherestr()returns"root='value'"instead of the raw value. Always usemodel_str()frommetadata.ingestion.ometa.utilsinstead of manualhasattr(x, "root")/str(x.root)checks.
Testing Philosophy
- Test real behavior, not mock wiring - if a test requires mocking 3+ classes just to verify a method call, it's testing the wrong thing
- Prefer integration tests over heavily-mocked unit tests. This project has full integration test infrastructure (OpenMetadataApplicationTest, Docker containers, real OpenSearch). Use it.
- Mocks are for boundaries, not internals - mock external services (HTTP clients, third-party APIs), not your own classes. If you're mocking static methods left and right to test internal plumbing, write an integration test instead.
- A test that mocks everything proves nothing - it only verifies that your mocks are wired correctly, not that the system works
- Ask "what breaks if this test passes but the code is wrong?" - if the answer is "nothing, because everything real is mocked out", delete the test and write a better one
- Test the outcome, not the implementation - assert on observable results (API responses, database state, stats values) rather than verifying internal method calls with
verify()
Response Format
- Provide clean code blocks without unnecessary explanations
- Assume readers are experienced developers
- Focus on functionality over education