OpenMetadata/ingestion/tests/integration/auto_classification/databases/conftest.py
IceS2 e9c87c6adb
chore(ingestion): drop pylint, expand ruff (#27774)
* chore(ingestion): drop pylint, expand ruff to Stage 2c

Replace pylint with a coherent ruff-only stack (Stage 2c of the modernize
roadmap). Pylint is dropped from dev deps and CI workflows; ruff selected
ruleset expanded to ~22 families covering style, bug catchers, hygiene,
and the pylint port (PLE/PLC/PLW/PLR with the noisy "too-many-X"
complexity caps + magic-value disabled).

What's selected (with rationale in pyproject.toml):
  E, W, F, I, N         — style + correctness baseline + naming
  UP                    — pyupgrade (py>=3.10 modernizations)
  B, C4, C90, RET, SIM, TRY  — bug catchers
  PIE, ICN, T20, TC, TID, PTH, PERF  — hygiene
  PLE, PLC, PLW, PLR    — pylint port (PLR complexity caps ignored)
  RUF                   — ruff-native (incl. RUF100 unused-noqa)

What's removed:
  - .pylintrc (root) — duplicate of the ingestion pylint config
  - [tool.pylint.*] block in ingestion/pyproject.toml (~140 lines)
  - ingestion/plugins/{print_checker,import_checker}.py + tests + README
    (replaced by built-in T20 + TID251 banned-api respectively)
  - pylint dep from ingestion/setup.py and openmetadata-airflow-apis/pyproject.toml
  - `make lint` Makefile target + the pylint invocation in py_format_check
  - dead pylint TODO comment + ignored test entry in noxfile.py

Cwd-stable config: ruff is invoked both from the repo root (pre-commit,
CI) and from ingestion/ (`make py_format_check`). The `src`,
`extend-exclude`, and per-file-ignores entries are listed twice — once
relative to ingestion/ and once with the `ingestion/` prefix — so
first-party isort detection and exclusions match in both invocations.

Grandfathering: ran `ruff check --add-noqa` once + format-stable
iteration. ~12,130 noqa directives across ~1,400 files. Cleanup is
deferred to follow-up PRs that drop noqas one rule at a time.

Documentation sweep: replaced `make lint` references in CLAUDE.md,
AGENTS.md, DEVELOPER.md, copilot-instructions, and 6 SKILL files with
the apply+verify shape `make py_format && make py_format_check`.
`make py_format` is NOT a strict superset of pylint — it only applies
auto-fixable violations; `make py_format_check` catches the rest.

Basedpyright baseline regenerated: ruff format reflowed multi-line
signatures in ~70 files, shifting type-error column positions. The
basedpyright baseline matches by (file path, error code, range), so
column shifts caused 19 entries to mis-align. Net diff is small
(154 lines in/out of the 13MB baseline.json) — purely positional.

Verified locally:
  - make py_format_check         → All checks passed
  - nox --no-venv -s static-checks → 0 errors, 0 warnings, 0 notes

* chore(ingestion): finish ruff swap — nox lint session + skill docs

Three remaining stale-tooling references after Stage 2c:

  - `ingestion/noxfile.py` `lint` session was still calling `black --check`,
    `isort --check-only`, `pycln --diff`. Those tools aren't installed
    anywhere (we dropped them from dev deps). Replace with the ruff
    equivalents that mirror `make py_format_check`.
  - `skills/standards/code_style.md`: stack listed as `black + isort +
    pycln`; line length claimed 88 (black default). Both wrong: stack is
    ruff, line length is 120.
  - `skills/connector-building/SKILL.md`: `make py_format` comment said
    `# black + isort + pycln`. Same swap.

* chore(ingestion): keep main's baseline + globally ignore TRY400

Per gitar-bot's review on PR #27774:

1. Main's PR #27728 promoted ~60 `logger.warning()` → `logger.error()`
   inside `except` blocks. Those changes landed on main with their own
   baseline updates. Our PR doesn't promote anything — the merge from
   origin/main brought those `error` calls along with their baseline
   entries.

   The bot interpreted the `# noqa: TRY400` we added next to those lines
   as us silencing the rule case-by-case. Cleaner: globally ignore
   TRY400 in pyproject.toml, with a comment explaining why the codebase's
   `logger.error(...)` + separate `logger.debug(traceback.format_exc())`
   pattern is intentional. Strip ~430 per-line `# noqa: TRY400` markers
   from source.

2. Document that `S101` in `per-file-ignores` is a forward-looking
   entry — flake8-bandit (`S`) is not yet selected, so the rule is
   no-op today; the entry stays so when `S` lands later, tests don't
   immediately error.

Reverts the platform pin and Linux Docker–generated baseline. Keep
main's baseline intact and let CI surface the exact column-shifted
entries; the team will decide whether to fix in-place (revert format
on affected files) or add per-line `# pyright: ignore` markers.

* chore(ingestion): regen baseline for new connector type debt

Main's baseline was stale relative to recently-added connectors
(McpConnection, CustomDriveConnection) that lack common attributes
like `hostPort`, `database`, `catalog` etc. — all sites that access
those attributes via the union-typed `serviceConnection.root.config`
fire `reportAttributeAccessIssue` errors that aren't baselined.

71 errors + 58 warnings absorbed. Local macOS regen; pushing to see
CI's drift count. Per the basedpyright-baseline-and-ci PR experience,
macOS↔Linux column drift on this size of regen has historically been
1-7 residuals.
2026-04-28 07:21:59 +02:00

523 lines
16 KiB
Python

import os
import pytest
from testcontainers.postgres import PostgresContainer
from _openmetadata_testutils.factories.metadata.generated.schema.api.classification.create_classification import (
CreateClassificationRequestFactory,
)
from _openmetadata_testutils.factories.metadata.generated.schema.api.classification.create_tag import (
CreateTagRequestFactory,
)
from _openmetadata_testutils.factories.metadata.generated.schema.type.recognizer import (
RecognizerFactory,
)
from _openmetadata_testutils.helpers.docker import try_bind
from metadata.generated.schema.api.classification.createClassification import (
CreateClassificationRequest,
)
from metadata.generated.schema.api.classification.createTag import CreateTagRequest
from metadata.generated.schema.entity.classification.classification import (
Classification,
ConflictResolution,
)
from metadata.generated.schema.entity.classification.tag import Tag
from metadata.generated.schema.type.piiEntity import PIIEntity
from metadata.generated.schema.type.predefinedRecognizer import Name
from metadata.generated.schema.type.recognizer import Recognizer
from metadata.ingestion.ometa.ometa_api import OpenMetadata
@pytest.fixture(scope="module")
def postgres_container():
"""Start a PostgreSQL container with the test database."""
init_file = os.path.join(os.path.dirname(__file__), "init.sql") # noqa: PTH118, PTH120
container = PostgresContainer("postgres:15", dbname="test_db").with_volume_mapping(
init_file, "/docker-entrypoint-initdb.d/init.sql"
)
with try_bind(container, 5432, 5432) if not os.getenv("CI") else container as container:
yield container
@pytest.fixture(scope="session")
def credit_card_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="credit_card_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.CreditCardRecognizer,
)
@pytest.fixture(scope="session")
def aba_routing_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="aba_routing_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.AbaRoutingRecognizer,
)
@pytest.fixture(scope="session")
def crypto_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="crypto_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.CryptoRecognizer,
)
@pytest.fixture(scope="session")
def date_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="date_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.DateRecognizer,
)
@pytest.fixture(scope="session")
def email_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="email_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.EmailRecognizer,
)
@pytest.fixture(scope="session")
def iban_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="iban_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.IbanRecognizer,
)
@pytest.fixture(scope="session")
def ip_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="ip_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.IpRecognizer,
)
@pytest.fixture(scope="session")
def nhs_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="nhs_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.NhsRecognizer,
)
@pytest.fixture(scope="session")
def medical_license_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="medical_license_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.MedicalLicenseRecognizer,
)
@pytest.fixture(scope="session")
def phone_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="phone_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.PhoneRecognizer,
)
@pytest.fixture(scope="session")
def sg_fin_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="sg_fin_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.SgFinRecognizer,
)
@pytest.fixture(scope="session")
def url_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="url_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.UrlRecognizer,
)
@pytest.fixture(scope="session")
def us_bank_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="us_bank_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.UsBankRecognizer,
)
@pytest.fixture(scope="session")
def us_itin_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="us_itin_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.UsItinRecognizer,
)
@pytest.fixture(scope="session")
def us_license_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="us_license_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.UsLicenseRecognizer,
)
@pytest.fixture(scope="session")
def us_passport_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="us_passport_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.UsPassportRecognizer,
)
@pytest.fixture(scope="session")
def us_ssn_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="us_ssn_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.UsSsnRecognizer,
)
@pytest.fixture(scope="session")
def es_nif_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="es_nif_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.EsNifRecognizer,
)
@pytest.fixture(scope="session")
def pii_spacy_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="spacy_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.SpacyRecognizer,
recognizerConfig__supportedEntities=[
PIIEntity.PERSON,
],
)
@pytest.fixture(scope="session")
def non_pii_spacy_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="spacy_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.SpacyRecognizer,
recognizerConfig__supportedEntities=[
PIIEntity.LOCATION,
PIIEntity.DATE_TIME,
],
)
@pytest.fixture(scope="session")
def au_abn_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="au_abn_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.AuAbnRecognizer,
)
@pytest.fixture(scope="session")
def au_acn_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="au_acn_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.AuAcnRecognizer,
)
@pytest.fixture(scope="session")
def au_tfn_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="au_tfn_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.AuTfnRecognizer,
)
@pytest.fixture(scope="session")
def au_medicare_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="au_medicare_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.AuMedicareRecognizer,
)
@pytest.fixture(scope="session")
def it_driver_license_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="it_driver_license_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.ItDriverLicenseRecognizer,
)
@pytest.fixture(scope="session")
def it_fiscal_code_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="it_fiscal_code_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.ItFiscalCodeRecognizer,
)
@pytest.fixture(scope="session")
def it_vat_code_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="it_vat_code_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.ItVatCodeRecognizer,
)
@pytest.fixture(scope="session")
def it_identity_card_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="it_identity_card_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.ItIdentityCardRecognizer,
)
@pytest.fixture(scope="session")
def it_passport_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="it_passport_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.ItPassportRecognizer,
)
@pytest.fixture(scope="session")
def in_pan_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="in_pan_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.InPanRecognizer,
)
@pytest.fixture(scope="session")
def pl_pesel_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="pl_pesel_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.PlPeselRecognizer,
)
@pytest.fixture(scope="session")
def in_aadhaar_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="in_aadhaar_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.InAadhaarRecognizer,
)
@pytest.fixture(scope="session")
def in_vehicle_registration_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="in_vehicle_registration_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.InVehicleRegistrationRecognizer,
)
@pytest.fixture(scope="session")
def sg_uen_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="sg_uen_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.SgUenRecognizer,
)
@pytest.fixture(scope="session")
def in_voter_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="in_voter_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.InVoterRecognizer,
)
@pytest.fixture(scope="session")
def in_passport_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="in_passport_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.InPassportRecognizer,
)
@pytest.fixture(scope="session")
def fi_personal_identity_code_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="fi_personal_identity_code_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.FiPersonalIdentityCodeRecognizer,
)
@pytest.fixture(scope="session")
def es_nie_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="es_nie_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.EsNieRecognizer,
)
@pytest.fixture(scope="session")
def uk_nino_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="uk_nino_recognizer",
recognizerConfig__type="predefined",
recognizerConfig__name=Name.UkNinoRecognizer,
)
@pytest.fixture(scope="session")
def person_column_name_recognizer() -> Recognizer:
return RecognizerFactory.create(
name="person_column_name_recognizer",
recognizerConfig__type="pattern",
recognizerConfig__patterns__0__regex=r"^.*(user|client|person|first|last|maiden|nick).*(name).*$",
for_column_name=True,
)
@pytest.fixture(scope="session")
def pii_classification(metadata: OpenMetadata[Classification, CreateClassificationRequest]) -> Classification:
create_classification_request = CreateClassificationRequestFactory.create(
fqn="PII",
autoClassificationConfig__conflictResolution=ConflictResolution.highest_priority.value,
)
entity = metadata.create_or_update(create_classification_request)
return entity # noqa: RET504
@pytest.fixture(scope="session")
def sensitive_pii_tag(
metadata: OpenMetadata[Tag, CreateTagRequest],
pii_classification: Classification,
credit_card_recognizer: Recognizer,
aba_routing_recognizer: Recognizer,
crypto_recognizer: Recognizer,
email_recognizer: Recognizer,
iban_recognizer: Recognizer,
nhs_recognizer: Recognizer,
medical_license_recognizer: Recognizer,
sg_fin_recognizer: Recognizer,
us_bank_recognizer: Recognizer,
us_itin_recognizer: Recognizer,
us_license_recognizer: Recognizer,
us_passport_recognizer: Recognizer,
us_ssn_recognizer: Recognizer,
es_nif_recognizer: Recognizer,
pii_spacy_recognizer: Recognizer,
au_abn_recognizer: Recognizer,
au_acn_recognizer: Recognizer,
au_tfn_recognizer: Recognizer,
au_medicare_recognizer: Recognizer,
it_driver_license_recognizer: Recognizer,
it_fiscal_code_recognizer: Recognizer,
it_vat_code_recognizer: Recognizer,
it_identity_card_recognizer: Recognizer,
it_passport_recognizer: Recognizer,
in_pan_recognizer: Recognizer,
pl_pesel_recognizer: Recognizer,
in_aadhaar_recognizer: Recognizer,
sg_uen_recognizer: Recognizer,
in_voter_recognizer: Recognizer,
in_passport_recognizer: Recognizer,
fi_personal_identity_code_recognizer: Recognizer,
es_nie_recognizer: Recognizer,
uk_nino_recognizer: Recognizer,
person_column_name_recognizer: Recognizer,
) -> Tag:
create_tag_request: CreateTagRequest = CreateTagRequestFactory.create(
tag_name="Sensitive",
tag_classification=pii_classification.fullyQualifiedName.root,
autoClassificationPriority=100,
recognizers=[
credit_card_recognizer,
aba_routing_recognizer,
crypto_recognizer,
email_recognizer,
iban_recognizer,
nhs_recognizer,
medical_license_recognizer,
sg_fin_recognizer,
us_bank_recognizer,
us_itin_recognizer,
us_license_recognizer,
us_passport_recognizer,
us_ssn_recognizer,
es_nif_recognizer,
pii_spacy_recognizer,
au_abn_recognizer,
au_acn_recognizer,
au_tfn_recognizer,
au_medicare_recognizer,
it_driver_license_recognizer,
it_fiscal_code_recognizer,
it_vat_code_recognizer,
it_identity_card_recognizer,
it_passport_recognizer,
in_pan_recognizer,
pl_pesel_recognizer,
in_aadhaar_recognizer,
sg_uen_recognizer,
in_voter_recognizer,
in_passport_recognizer,
fi_personal_identity_code_recognizer,
es_nie_recognizer,
uk_nino_recognizer,
person_column_name_recognizer,
],
)
return metadata.create_or_update(create_tag_request)
@pytest.fixture(scope="session")
def non_sensitive_pii_tag(
metadata: OpenMetadata[Tag, CreateTagRequest],
pii_classification: Classification,
date_recognizer: Recognizer,
phone_recognizer: Recognizer,
non_pii_spacy_recognizer: Recognizer,
) -> Tag:
create_tag_request: CreateTagRequest = CreateTagRequestFactory.create(
tag_name="NonSensitive",
tag_classification=pii_classification.fullyQualifiedName.root,
autoClassificationPriority=80,
recognizers=[
date_recognizer,
phone_recognizer,
non_pii_spacy_recognizer,
],
)
return metadata.create_or_update(create_tag_request)