mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
* chore(ingestion): drop pylint, expand ruff to Stage 2c
Replace pylint with a coherent ruff-only stack (Stage 2c of the modernize
roadmap). Pylint is dropped from dev deps and CI workflows; ruff selected
ruleset expanded to ~22 families covering style, bug catchers, hygiene,
and the pylint port (PLE/PLC/PLW/PLR with the noisy "too-many-X"
complexity caps + magic-value disabled).
What's selected (with rationale in pyproject.toml):
E, W, F, I, N — style + correctness baseline + naming
UP — pyupgrade (py>=3.10 modernizations)
B, C4, C90, RET, SIM, TRY — bug catchers
PIE, ICN, T20, TC, TID, PTH, PERF — hygiene
PLE, PLC, PLW, PLR — pylint port (PLR complexity caps ignored)
RUF — ruff-native (incl. RUF100 unused-noqa)
What's removed:
- .pylintrc (root) — duplicate of the ingestion pylint config
- [tool.pylint.*] block in ingestion/pyproject.toml (~140 lines)
- ingestion/plugins/{print_checker,import_checker}.py + tests + README
(replaced by built-in T20 + TID251 banned-api respectively)
- pylint dep from ingestion/setup.py and openmetadata-airflow-apis/pyproject.toml
- `make lint` Makefile target + the pylint invocation in py_format_check
- dead pylint TODO comment + ignored test entry in noxfile.py
Cwd-stable config: ruff is invoked both from the repo root (pre-commit,
CI) and from ingestion/ (`make py_format_check`). The `src`,
`extend-exclude`, and per-file-ignores entries are listed twice — once
relative to ingestion/ and once with the `ingestion/` prefix — so
first-party isort detection and exclusions match in both invocations.
Grandfathering: ran `ruff check --add-noqa` once + format-stable
iteration. ~12,130 noqa directives across ~1,400 files. Cleanup is
deferred to follow-up PRs that drop noqas one rule at a time.
Documentation sweep: replaced `make lint` references in CLAUDE.md,
AGENTS.md, DEVELOPER.md, copilot-instructions, and 6 SKILL files with
the apply+verify shape `make py_format && make py_format_check`.
`make py_format` is NOT a strict superset of pylint — it only applies
auto-fixable violations; `make py_format_check` catches the rest.
Basedpyright baseline regenerated: ruff format reflowed multi-line
signatures in ~70 files, shifting type-error column positions. The
basedpyright baseline matches by (file path, error code, range), so
column shifts caused 19 entries to mis-align. Net diff is small
(154 lines in/out of the 13MB baseline.json) — purely positional.
Verified locally:
- make py_format_check → All checks passed
- nox --no-venv -s static-checks → 0 errors, 0 warnings, 0 notes
* chore(ingestion): finish ruff swap — nox lint session + skill docs
Three remaining stale-tooling references after Stage 2c:
- `ingestion/noxfile.py` `lint` session was still calling `black --check`,
`isort --check-only`, `pycln --diff`. Those tools aren't installed
anywhere (we dropped them from dev deps). Replace with the ruff
equivalents that mirror `make py_format_check`.
- `skills/standards/code_style.md`: stack listed as `black + isort +
pycln`; line length claimed 88 (black default). Both wrong: stack is
ruff, line length is 120.
- `skills/connector-building/SKILL.md`: `make py_format` comment said
`# black + isort + pycln`. Same swap.
* chore(ingestion): keep main's baseline + globally ignore TRY400
Per gitar-bot's review on PR #27774:
1. Main's PR #27728 promoted ~60 `logger.warning()` → `logger.error()`
inside `except` blocks. Those changes landed on main with their own
baseline updates. Our PR doesn't promote anything — the merge from
origin/main brought those `error` calls along with their baseline
entries.
The bot interpreted the `# noqa: TRY400` we added next to those lines
as us silencing the rule case-by-case. Cleaner: globally ignore
TRY400 in pyproject.toml, with a comment explaining why the codebase's
`logger.error(...)` + separate `logger.debug(traceback.format_exc())`
pattern is intentional. Strip ~430 per-line `# noqa: TRY400` markers
from source.
2. Document that `S101` in `per-file-ignores` is a forward-looking
entry — flake8-bandit (`S`) is not yet selected, so the rule is
no-op today; the entry stays so when `S` lands later, tests don't
immediately error.
Reverts the platform pin and Linux Docker–generated baseline. Keep
main's baseline intact and let CI surface the exact column-shifted
entries; the team will decide whether to fix in-place (revert format
on affected files) or add per-line `# pyright: ignore` markers.
* chore(ingestion): regen baseline for new connector type debt
Main's baseline was stale relative to recently-added connectors
(McpConnection, CustomDriveConnection) that lack common attributes
like `hostPort`, `database`, `catalog` etc. — all sites that access
those attributes via the union-typed `serviceConnection.root.config`
fire `reportAttributeAccessIssue` errors that aren't baselined.
71 errors + 58 warnings absorbed. Local macOS regen; pushing to see
CI's drift count. Per the basedpyright-baseline-and-ci PR experience,
macOS↔Linux column drift on this size of regen has historically been
1-7 residuals.
13 KiB
13 KiB
AGENTS.md
This file provides guidance to Codex (Codex.ai/code) when working with code in this repository.
About OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance. This is a multi-module project with Java backend services, React frontend, Python ingestion framework, and comprehensive Docker infrastructure.
Architecture Overview
- Backend: Java 21 + Dropwizard REST API framework, multi-module Maven project
- Frontend: React + TypeScript + Ant Design, built with Webpack and Yarn
- Ingestion: Python 3.10-3.12 with Pydantic 2.x, 75+ data source connectors
- Database: MySQL (default) or PostgreSQL with Flyway migrations
- Search: Elasticsearch 7.17+ or OpenSearch 2.6+ for metadata discovery
- Infrastructure: Apache Airflow for workflow orchestration
Essential Development Commands
Prerequisites and Setup
make prerequisites # Check system requirements
make install_dev_env # Install all development dependencies
make yarn_install_cache # Install UI dependencies
Frontend Development
cd openmetadata-ui/src/main/resources/ui
yarn start # Start development server on localhost:3000
yarn test # Run Jest unit tests
yarn test path/to/test.spec.ts # Run a specific test file
yarn test:watch # Run tests in watch mode
yarn playwright:run # Run E2E tests
yarn lint # ESLint check
yarn lint:fix # ESLint with auto-fix
yarn build # Production build
Backend Development
mvn clean package -DskipTests # Build without tests
mvn clean package -DonlyBackend -pl !openmetadata-ui # Backend only
mvn test # Run unit tests
mvn verify # Run integration tests
mvn spotless:apply # Format Java code
Python Ingestion Development
cd ingestion
make install_dev_env # Install in development mode
make generate # Generate Pydantic models from JSON schemas
make unit_ingestion_dev_env # Run unit tests
make py_format # Apply ruff lint-fix + format
make py_format_check # Verify lint + format (matches CI; catches non-auto-fixable issues)
make static-checks # Run type checking with basedpyright
Full Local Environment
./docker/run_local_docker.sh -m ui -d mysql # Complete local setup with UI
./docker/run_local_docker.sh -m no-ui -d postgresql # Backend only with PostgreSQL
./docker/run_local_docker.sh -s true # Skip Maven build step
Testing
make run_e2e_tests # Full E2E test suite
make unit_ingestion # Python unit tests with coverage
yarn test:coverage # Frontend test coverage
Code Generation and Schemas
OpenMetadata uses a schema-first approach with JSON Schema definitions driving code generation:
make generate # Generate all models from schemas
make py_antlr # Generate Python ANTLR parsers
make js_antlr # Generate JavaScript ANTLR parsers
yarn parse-schema # Parse JSON schemas for frontend (connection and ingestion schemas)
Schema Architecture
- Source schemas in
openmetadata-spec/define the canonical data models - Connection schemas are pre-processed at build time via
parseSchemas.jsto resolve all$refreferences - Application schemas in
openmetadata-ui/.../ApplicationSchemas/are resolved at runtime usingschemaResolver.ts - JSON schemas with
$refreferences to external files require resolution before use in forms
Key Directories
openmetadata-service/- Core Java backend services and REST APIsopenmetadata-ui/src/main/resources/ui/- React frontend applicationingestion/- Python ingestion framework with connectorsopenmetadata-spec/- JSON Schema specifications for all entitiesbootstrap/sql/- Database schema migrations and sample dataconf/- Configuration files for different environmentsdocker/- Docker configurations for local and production deployment
Development Workflow
- Schema Changes: Modify JSON schemas in
openmetadata-spec/, then runmvn clean installon openmetadata-spec to update models - Backend: Develop in Java using Dropwizard patterns, test with
mvn test, format withmvn spotless:apply - Frontend: Use React/TypeScript with Ant Design components, test with Jest/Playwright
- Ingestion: Python connectors follow plugin pattern, use
make install_dev_envfor development - Full Testing: Use
make run_e2e_testsbefore major changes
Frontend Architecture Patterns
React Component Patterns
- File Naming: Components use
ComponentName.component.tsx, interfaces useComponentName.interface.ts - State Management: Use
useStatewith proper typing, avoidany - Side Effects: Use
useEffectwith proper dependency arrays - Performance: Use
useCallbackfor event handlers,useMemofor expensive computations - Custom Hooks: Prefix with
use, place insrc/hooks/, return typed objects - Internationalization: Use
useTranslationhook from react-i18next, access witht('key') - Component Structure: Functional components only, no class components
- Props: Define interfaces for all component props, place in
.interface.tsfiles - Loading States: Use object state for multiple loading states:
useState<Record<string, boolean>>({}) - Error Handling: Use
showErrorToastandshowSuccessToastutilities from ToastUtils - Navigation: Use
useNavigatefrom react-router-dom, not direct history manipulation - Data Fetching: Async functions with try-catch blocks, update loading states appropriately
State Management
- Use Zustand stores for global state (e.g.,
useLimitStore,useWelcomeStore) - Keep component state local when possible with
useState - Use context providers for feature-specific shared state (e.g.,
ApplicationsProvider)
Styling
- MUI Migration: The project is gradually migrating from Ant Design to Material-UI (MUI) v7.3.1
- Preferred Approach: Use MUI components v7.3.1 and styles wherever possible for new features
- Theme and Styles: MUI theme data and styles are defined in
openmetadata-ui-core-components - Colors and Design Tokens: Always reference theme colors and design tokens from the MUI theme, not hardcoded values
- Legacy Components: Ant Design components remain in existing code but should be replaced with MUI equivalents when refactoring
- Do not add unnecessary spacing between logs and code.
- In Java, avoid wildcards imports (e.g., use
import java.util.List;instead ofimport java.util.*;) - Custom styles in
.lessfiles with component-specific naming (legacy pattern) - Follow BEM naming convention for custom CSS classes
- Use CSS modules where appropriate
UI considerations
- Do not use string literals at any place. You should use useTranslation hook and use it like const {t} = useTranslation(). And for example if you want to have "Run" as string, you should be using { t('label.run') }, this label is defined in locales.
Application Configuration
- Applications use
ApplicationsClassBasefor schema loading and configuration - Dynamic imports handle application-specific schemas and assets
- Form schemas use React JSON Schema Form (RJSF) with custom UI widgets
Service Utilities
- Each service type has dedicated utility files (e.g.,
DatabaseServiceUtils.tsx) - Connection schemas are imported statically and pre-resolved
- Service configurations use switch statements to map types to schemas
Type Safety
- All API responses have generated TypeScript interfaces in
generated/ - Custom types extend base interfaces when needed
- Avoid type assertions unless absolutely necessary
- Use discriminated unions for action types and state variants
Database and Migrations
- Flyway handles schema migrations in
bootstrap/sql/migrations/ - Use Docker containers for local database setup
- Default MySQL, PostgreSQL supported as alternative
- Sample data loaded automatically in development environment
Security and Authentication
- JWT-based authentication with OAuth2/SAML support
- Role-based access control defined in Java entities
- Security configurations in
conf/openmetadata.yaml - Never commit secrets - use environment variables or secure vaults
Code Generation Standards
Comments Policy
- Do NOT add unnecessary comments - write self-documenting code
- NEVER add single-line comments that describe what the code obviously does
- Only include comments for:
- Complex business logic that isn't obvious
- Non-obvious algorithms or workarounds
- Public API JavaDoc documentation
- TODO/FIXME with ticket references
- Bad examples (NEVER do this):
// Create userbeforecreateUser()// Get clientbeforeSdkClients.adminClient()// Verify domain is setbeforeassertNotNull(entity.getDomain())// User names are lowercasedwhen the codetoLowerCase()makes it obvious
- If the code needs a comment to be understood, refactor the code to be clearer instead
Java Code Requirements
- Always mention running
mvn spotless:applywhen generating/modifying .java files - Use clear, descriptive variable and method names instead of comments
- Follow existing project patterns and conventions
- Generate production-ready code, not tutorial code
- Create integration tests in openmetadata-integration-tests
- Do not use Fully Qualified Names in the code such as org.openmetadata.schema.type.Status instead import the class name
- Do not import wild-card packages instead import exactly required packages
TypeScript/Frontend Code Requirements
- NEVER use
anytype in TypeScript code - always use proper types - Use
unknownwhen the type is truly unknown and add type guards - Import types from existing type definitions (e.g.,
RJSFSchemafrom@rjsf/utils) - Follow ESLint rules strictly - the project enforces no-console, proper formatting
- Add
// eslint-disable-next-linecomments only when absolutely necessary - Import Organization (in order):
- External libraries (React, Ant Design, etc.)
- Internal absolute imports from
generated/,constants/,hooks/, etc. - Relative imports for utilities and components
- Asset imports (SVGs, styles)
- Type imports grouped separately when needed
Python Code Requirements
- Use pytest, not unittest - write tests using pytest style with plain
assertstatements - Use pytest fixtures for test setup instead of
setUp/tearDownmethods - Use
unittest.mockfor mocking (MagicMock, patch) - this is compatible with pytest - Test classes should not inherit from
TestCase- use plain classes prefixed withTest - Use
assert x == yinstead ofself.assertEqual(x, y) - Use
assert x is Noneinstead ofself.assertIsNone(x) - Use
assert "text" in stringinstead ofself.assertIn("text", string)
Python Ingestion Connector Guidelines
- Keep connector-specific logic in connector-specific files, not in generic/shared files like
builders.py - Example: Redshift IAM auth should be in
ingestion/src/metadata/ingestion/source/database/redshift/connection.py, not iningestion/src/metadata/ingestion/connections/builders.py - This keeps the codebase modular and prevents generic utilities from becoming cluttered with connector-specific edge cases
Testing Philosophy
- Test real behavior, not mock wiring - if a test requires mocking 3+ classes just to verify a method call, it's testing the wrong thing
- Prefer integration tests over heavily-mocked unit tests. This project has full integration test infrastructure (OpenMetadataApplicationTest, Docker containers, real OpenSearch). Use it.
- Mocks are for boundaries, not internals - mock external services (HTTP clients, third-party APIs), not your own classes. If you're mocking static methods left and right to test internal plumbing, write an integration test instead.
- A test that mocks everything proves nothing - it only verifies that your mocks are wired correctly, not that the system works
- Ask "what breaks if this test passes but the code is wrong?" - if the answer is "nothing, because everything real is mocked out", delete the test and write a better one
- Test the outcome, not the implementation - assert on observable results (API responses, database state, stats values) rather than verifying internal method calls with
verify()
Response Format
- Provide clean code blocks without unnecessary explanations
- Assume readers are experienced developers
- Focus on functionality over education