OpenMetadata

mirror of https://github.com/open-metadata/OpenMetadata synced 2026-05-24 09:39:11 +00:00

History

IceS2 e9c87c6adb chore(ingestion): drop pylint, expand ruff (#27774 ) * chore(ingestion): drop pylint, expand ruff to Stage 2c Replace pylint with a coherent ruff-only stack (Stage 2c of the modernize roadmap). Pylint is dropped from dev deps and CI workflows; ruff selected ruleset expanded to ~22 families covering style, bug catchers, hygiene, and the pylint port (PLE/PLC/PLW/PLR with the noisy "too-many-X" complexity caps + magic-value disabled). What's selected (with rationale in pyproject.toml): E, W, F, I, N — style + correctness baseline + naming UP — pyupgrade (py>=3.10 modernizations) B, C4, C90, RET, SIM, TRY — bug catchers PIE, ICN, T20, TC, TID, PTH, PERF — hygiene PLE, PLC, PLW, PLR — pylint port (PLR complexity caps ignored) RUF — ruff-native (incl. RUF100 unused-noqa) What's removed: - .pylintrc (root) — duplicate of the ingestion pylint config - [tool.pylint.] block in ingestion/pyproject.toml (~140 lines) - ingestion/plugins/{print_checker,import_checker}.py + tests + README (replaced by built-in T20 + TID251 banned-api respectively) - pylint dep from ingestion/setup.py and openmetadata-airflow-apis/pyproject.toml - `make lint` Makefile target + the pylint invocation in py_format_check - dead pylint TODO comment + ignored test entry in noxfile.py Cwd-stable config: ruff is invoked both from the repo root (pre-commit, CI) and from ingestion/ (`make py_format_check`). The `src`, `extend-exclude`, and per-file-ignores entries are listed twice — once relative to ingestion/ and once with the `ingestion/` prefix — so first-party isort detection and exclusions match in both invocations. Grandfathering: ran `ruff check --add-noqa` once + format-stable iteration. ~12,130 noqa directives across ~1,400 files. Cleanup is deferred to follow-up PRs that drop noqas one rule at a time. Documentation sweep: replaced `make lint` references in CLAUDE.md, AGENTS.md, DEVELOPER.md, copilot-instructions, and 6 SKILL files with the apply+verify shape `make py_format && make py_format_check`. `make py_format` is NOT a strict superset of pylint — it only applies auto-fixable violations; `make py_format_check` catches the rest. Basedpyright baseline regenerated: ruff format reflowed multi-line signatures in ~70 files, shifting type-error column positions. The basedpyright baseline matches by (file path, error code, range), so column shifts caused 19 entries to mis-align. Net diff is small (154 lines in/out of the 13MB baseline.json) — purely positional. Verified locally: - make py_format_check → All checks passed - nox --no-venv -s static-checks → 0 errors, 0 warnings, 0 notes chore(ingestion): finish ruff swap — nox lint session + skill docs Three remaining stale-tooling references after Stage 2c: - `ingestion/noxfile.py` `lint` session was still calling `black --check`, `isort --check-only`, `pycln --diff`. Those tools aren't installed anywhere (we dropped them from dev deps). Replace with the ruff equivalents that mirror `make py_format_check`. - `skills/standards/code_style.md`: stack listed as `black + isort + pycln`; line length claimed 88 (black default). Both wrong: stack is ruff, line length is 120. - `skills/connector-building/SKILL.md`: `make py_format` comment said `# black + isort + pycln`. Same swap. * chore(ingestion): keep main's baseline + globally ignore TRY400 Per gitar-bot's review on PR #27774: 1. Main's PR #27728 promoted ~60 `logger.warning()` → `logger.error()` inside `except` blocks. Those changes landed on main with their own baseline updates. Our PR doesn't promote anything — the merge from origin/main brought those `error` calls along with their baseline entries. The bot interpreted the `# noqa: TRY400` we added next to those lines as us silencing the rule case-by-case. Cleaner: globally ignore TRY400 in pyproject.toml, with a comment explaining why the codebase's `logger.error(...)` + separate `logger.debug(traceback.format_exc())` pattern is intentional. Strip ~430 per-line `# noqa: TRY400` markers from source. 2. Document that `S101` in `per-file-ignores` is a forward-looking entry — flake8-bandit (`S`) is not yet selected, so the rule is no-op today; the entry stays so when `S` lands later, tests don't immediately error. Reverts the platform pin and Linux Docker–generated baseline. Keep main's baseline intact and let CI surface the exact column-shifted entries; the team will decide whether to fix in-place (revert format on affected files) or add per-line `# pyright: ignore` markers. * chore(ingestion): regen baseline for new connector type debt Main's baseline was stale relative to recently-added connectors (McpConnection, CustomDriveConnection) that lack common attributes like `hostPort`, `database`, `catalog` etc. — all sites that access those attributes via the union-typed `serviceConnection.root.config` fire `reportAttributeAccessIssue` errors that aren't baselined. 71 errors + 58 warnings absorbed. Local macOS regen; pushing to see CI's drift count. Per the basedpyright-baseline-and-ci PR experience, macOS↔Linux column drift on this size of regen has historically been 1-7 residuals.		2026-04-28 07:21:59 +02:00
..
summaries	GEN-1412: Implement load test logic (#19155 )	2025-04-24 16:08:38 +02:00
test_resources	chore(ingestion): drop pylint, expand ruff (#27774 )	2026-04-28 07:21:59 +02:00
__init__.py	GEN-1412: Implement load test logic (#19155 )	2025-04-24 16:08:38 +02:00
README.md	Fix typos in OpenMetadata documentation (#22899 )	2025-08-12 17:27:40 +05:30
test_load.py	chore(ingestion): drop pylint, expand ruff (#27774 )	2026-04-28 07:21:59 +02:00
utils.py	chore(ingestion): drop pylint, expand ruff (#27774 )	2026-04-28 07:21:59 +02:00

README.md

Adding a new resource to load tests

Add a new *.py file to test_resources/tasks. The naming does not matter, but we use the resource name as defined in Java, but separated by _ (e.g. TestCaseResource becomes test_case_tasks.py).

In your newly created file, you'll need to import at minimum 1 package

from locust import task, TaskSet

task will be used as a decorator to define our task that will run as part of our load test. TaskSet wil be inherited by our task set class.

Here is an example of a locust task definition. The integer argument in @task will give a specific weigth to the task (i.e. increasing its probability to be ran)

class TestCaseResultTasks(TaskSet):
    """Test case result resource load test"""

    def _list_test_case_results(self, start_ts: int, end_ts: int, days_range: str):
        """List test case results for a given time range

        Args:
            start_ts (int): start timestamp
            end_ts (int): end timestamp
            range (str): 
        """
        for test_case in self.test_cases:
            fqn = test_case.get("fullyQualifiedName")
            if fqn:
                self.client.get(
                    f"{TEST_CASE_RESULT_RESOURCE_PATH}/{fqn}",
                    params={ # type: ignore
                        "startTs": start_ts,
                        "endTs": end_ts,
                    },
                    auth=self.bearer,
                    name=f"{TEST_CASE_RESULT_RESOURCE_PATH}/[fqn]/{days_range}"
                )

    @task(3)
    def list_test_case_results_30_days(self):
        """List test case results for the last 30 days. Weighted 3"""
        now = datetime.now()
        last_30_days = int((now - timedelta(days=30)).timestamp() * 1000)
        self._list_test_case_results(last_30_days, int(now.timestamp() * 1000), "30_days")

Notice how we use self.client.get to perform the request. This is provided by locust HttpSession. If the request needs to be authenticated, you can use auth=self.bearer. You will need to first define self.bearer, you can achieve this using the on_start hook from locust.

from _openmetadata_testutils.helpers.login_user import login_user

class TestCaseResultTasks(TaskSet):
    """Test case result resource load test"""
    [...]

    def on_start(self):
        """Get a list of test cases to fetch results for"""
        self.bearer = login_user(self.client)
        resp = self.client.get(f"{TEST_CASE_RESOURCE_PATH}", params={"limit": 100}, auth=self.bearer)
        json = resp.json()
        self.test_cases = json.get("data", [])

IMPORTANT You MUST define a def stop(self) methodd in your TaskSet class as shown below so that control is given back to the parent user class.

class TestCaseResultTasks(TaskSet):
    """Test case result resource load test"""
    [...]

    @task
    def stop(self):
        self.interrupt()

If your request contains a parameter (i.e. /api/v1/dataQuality/testCases/testCaseResults/{fqn}) you can name your request so all the request sent you will be grouped together like this

self.client.get(
    f"{TEST_CASE_RESULT_RESOURCE_PATH}/{fqn}",
    params={ # type: ignore
        "startTs": start_ts,
        "endTs": end_ts,
    },
    auth=self.bearer,
    name=f"{TEST_CASE_RESULT_RESOURCE_PATH}/[fqn]/{days_range}"
)

Notice the argument name=f"{TEST_CASE_RESULT_RESOURCE_PATH}/[fqn]/{days_range}", this will define under which name the requests will be grouped. Example of statistics summary below grouped by the request name

Type,Name,Request Count,Failure Count,Median Response Time,Average Response Time,Min Response Time,Max Response Time,Average Content Size,Requests/s,Failures/s,50%,66%,75%,80%,90%,95%,98%,99%,99.9%,99.99%,100%
GET,/api/v1/dataQuality/testCases/testCaseResults/[fqn]/60_days,3510,0,13,16.2354597524217,5.146791999997902,100.67633299999557,84567.57407407407,49.30531562959204,0.0,13,17,20,21,28,35,45,56,92,100,100

As a final step in test_resources/manifest.yaml add the resources, the metrics and the thresholds you want to test.

/api/v1/dataQuality/testCases/testCaseResults/[fqn]/30_days:
  type: GET
  99%: 100

/api/v1/dataQuality/testCases/testCaseResults/[fqn]/60_days:
  type: GET
  99%: 100

This will test that our GET request for the defined resources are running 99% of the time in less than 100 milliseconds (0.1 seconds).

Below is a list of all the metrics you can use:

Request Count
Failure Count
Median Response Time
Average Response Time
Min Response Time
Max Response Time
Average Content Size
Requests/s
Failures/s
50%
66%
75%
80%
90%
95%
98%
99%
99.9%
99.99%
100%