OpenMetadata/.github/workflows/py-tests.yml
IceS2 84ed278720
chore(ingestion): enable basedpyright across the codebase via baseline (#27755)
* chore(ingestion): enable basedpyright across the codebase via baseline

Removes the ~25 paths from `[tool.basedpyright] ignore` (which excluded
roughly 90% of the codebase from type checking) and grandfathers the
existing violations into a baseline file. New violations in any
previously-ignored file now fail CI.

Changes:
- ingestion/pyproject.toml: drop the entire `ignore = [...]` block
- ingestion/setup.py: bump `basedpyright~=1.14` to `~=1.39.0`
- ingestion/.basedpyright/baseline.json (new, ~13MB): captures the
  starting violation set (~18.8K errors + ~37.4K warnings) so the
  migration is behavior-preserving. Regenerate with
  `cd ingestion && basedpyright -p pyproject.toml --baselinefile
  .basedpyright/baseline.json --writebaseline`. basedpyright analysis
  has minor non-determinism (similar to ruff's), so re-running
  --writebaseline a few times converges the baseline.
- ingestion/noxfile.py: pass `--baselinefile .basedpyright/baseline.json`
  to the basedpyright invocation in the `static-checks` session so CI
  honors the grandfathering. CI already runs the session via
  `cd ingestion && nox --no-venv -s static-checks` (py-tests.yml).
- ingestion/Makefile: `make static-checks` now delegates to
  `nox -s static-checks` so local invocations match CI exactly. Also
  drops the dead Python 3.9 / OM_SKIP_SDK_PY39 branch (we require
  Python >=3.10 since the previous modernization PR).
- .gitignore: add `.serena/` (local language-server cache)

* chore(ingestion): add nox to the dev dependency set

The static-checks Makefile target and the py-tests CI job both delegate
to `nox -s static-checks`, but nox was being installed as a separate
side step (`pip install nox` in `install_dev_env`, `uv pip install nox`
in the test-environment composite action). Listing it in dev extras
means a plain `pip install ingestion[dev]` brings it in.

* chore(ingestion): pin basedpyright analysis to py3.10; CI runs once

Following the basedpyright + multi-Python-version research:

- ingestion/pyproject.toml: add `pythonVersion = "3.10"` to
  [tool.basedpyright] so type-checking always analyzes for the lowest
  supported Python version. Forward-incompatible code (tomllib usage,
  PEP 695 generics, etc.) is caught at type-check time regardless of
  which Python interpreter runs the checker.
- .github/workflows/py-tests.yml: gate the "Run Static Checks" step on
  `matrix.py-version == '3.10'`. With pythonVersion pinned, results are
  identical across the matrix; running once avoids redundant work and
  keeps the baseline file deterministic. Unit tests still run on the
  full 3.10/3.11/3.12 matrix to verify runtime compatibility.
- ingestion/.basedpyright/baseline.json: regenerated cleanly with the
  new pythonVersion config (~18.8K errors / ~37.3K warnings, similar
  scale to the previous baseline). Aligns with the canonical
  type-check-on-floor / test-on-matrix pattern used by Pydantic, CPython,
  and other major Python projects.

* chore(ingestion): pin basedpyright pythonPlatform to Linux + regen baseline

CI's previous run still surfaced ~9 issues (2 errors + 7 warnings) that
weren't in the baseline. Root cause: my local environment differs from
CI's in three ways that affect type inference — Python interpreter
(3.11 vs 3.10), platform (Darwin vs Linux), and pip-resolved package
versions (couchbase, avro, trino, sqlalchemy stubs all differ slightly).

This commit closes the platform gap and regenerates the baseline from a
fresh CI-equivalent environment:

- ingestion/pyproject.toml: add `pythonPlatform = "Linux"` to
  [tool.basedpyright] so type-checking uses the Linux subset of stdlib /
  third-party stubs regardless of where the analyzer runs.
- ingestion/.basedpyright/baseline.json: regenerated against a fresh
  Python 3.10 venv installed via `uv pip install ingestion[test]` (the
  same install path CI's setup-openmetadata-test-environment composite
  action uses). New scale: ~18.7K errors / ~37.5K warnings — same
  ballpark as the previous baseline, with column positions now matching
  CI's environment.

Local-developer note: when running `make static-checks` from a venv
that doesn't mirror CI exactly (e.g. macOS, Python 3.11, different
package versions), you may see drift errors. The supported workflow for
regenerating the baseline is to mirror CI:
  python3.10 -m venv /tmp/ci-mirror
  source /tmp/ci-mirror/bin/activate
  uv pip install --upgrade pip "setuptools<81"
  uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
  uv pip install -e "ingestion[test]"
  uv pip install "basedpyright~=1.39.0" nox
  cd ingestion && basedpyright -p pyproject.toml \
      --baselinefile .basedpyright/baseline.json --writebaseline

* chore(ingestion): drop pythonPlatform pin and regen baseline from CI-mirror

The previous attempt added `pythonPlatform = "Linux"` thinking it would
make the local-generated baseline match CI. It did the opposite — Linux
platform stubs activate additional conditional code paths that weren't
analyzed before, so CI saw 101 errors instead of the prior 2 errors.

Reverting:
- Drop `pythonPlatform = "Linux"` from [tool.basedpyright]. Without it,
  basedpyright analyzes for the host platform; on CI's ubuntu-latest
  runner that's Linux automatically, but type-stub coverage stays the
  same as before (matching the d9196dff6b baseline).
- Regenerate ingestion/.basedpyright/baseline.json against a fresh
  Python 3.10 venv installed via `uv pip install ingestion[test]`
  (mirroring CI's setup-openmetadata-test-environment composite action).
  ~18.8K errors / 37.7K warnings captured — same scale as the working
  d9196dff6b version.

Local-developer note: any baseline regeneration done on macOS will drift
from CI's Linux env (different transitive package versions, different
stubs). The supported local mirror procedure:
  python3.10 -m venv /tmp/ci-mirror
  source /tmp/ci-mirror/bin/activate
  uv pip install --upgrade pip "setuptools<81"
  uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
  uv pip install -e "ingestion[test]"
  uv pip install "basedpyright~=1.39.0" nox
  cd ingestion && basedpyright -p pyproject.toml \\
      --baselinefile .basedpyright/baseline.json --writebaseline

* chore(ingestion): regen baseline from full CI install (mac arm64 mirror)

Prior CI-mirror only installed [test], skipping [all] and the four
--no-deps SA pins (sqlalchemy-redshift/databricks/ibmi, pydoris-custom).
That left ~75 connector packages out of the analysis env, so basedpyright
couldn't resolve types from databricks.sqlalchemy, GE 0.18 Batch,
sklearn BaseEstimator, airflow SQLAlchemy models, pandas/numpy stubs,
etc. CI saw 129 errors absent from the baseline.

Regenerated against a fresh py3.10 venv that mirrors
.github/actions/setup-openmetadata-test-environment exactly:
  uv pip install ./ingestion[dev]
  make generate
  uv pip install "setuptools<81"
  uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
  uv pip install --no-deps sqlalchemy-redshift==0.8.14 \
                            sqlalchemy-databricks==0.2.0 \
                            sqlalchemy-ibmi==0.9.3 \
                            pydoris-custom==1.1.0
  uv pip install ./ingestion[all]
  uv pip install ./ingestion[test]
  uv pip install nox

First run: 128 errors, 272 warnings — within 1 error of CI's 129/272.
Wrote baseline with 56,100 entries across 1,035 files. Verify run with
the new baseline reports 0/0/0.

macOS arm64 vs Linux x86_64 wheel resolution may leave a small residual
(~3-7 errors per the d9196dff6b precedent). Re-run --writebaseline 2-3x
if any show up in CI.

* chore(ingestion): silence avro.py:95 basedpyright residual

CI's Linux fastavro stub returns Schema as `str | List[Any]`, while
the macOS arm64 wheel narrows to `str` — the only error not absorbed
by the regenerated baseline. Add a targeted pyright: ignore on the
parse_avro_schema call instead of broadening behavior.

* chore(ingestion): tolerate cross-platform pyright ignore drift

CI's `--baselinemode=lock` (default) requires the baseline to match
exactly — neither up nor down. Two related issues:

1. The avro.py noqa silenced not just the surfaced error but 10
   cascading entries at line 95 (sub-errors propagating from the
   unresolved `schema` arg type). Baseline went `down by 10` → lock
   violated → exit 3 even with `0 errors` reported. Regenerate baseline
   so the 10 stale entries are dropped.

2. The macOS arm64 fastavro stub doesn't surface that error in the
   first place, so basedpyright treats the noqa as
   `reportUnnecessaryTypeIgnoreComment` locally — causing the opposite
   lock mismatch on CI (a warning entry that doesn't exist there).
   Disable the rule so platform-specific residuals can land without
   flapping between local and CI.

* chore(ingestion): use --baselinemode=discard for cross-platform tolerance

CI's implicit default is `lock`, which fails on any baseline change in
either direction (errors going up *or* down) via console.error → exit 3.
That cannot accommodate macOS arm64 vs Linux x86_64 stub drift: a
baseline regenerated locally always carries some entries that don't fire
on CI (and vice versa).

`auto` would tolerate the drift but silently overwrites the baseline
file — unacceptable in CI, where unreviewed changes never get committed
back.

`discard` is the right balance:
  - New errors not in the baseline still fail the run (early-return path
    in BaselineHandler.write before the lock/discard branch).
  - Stale baseline entries (errors that no longer fire on the current
    platform) print an info message and exit 0.
  - The baseline file is never modified.
2026-04-27 17:15:44 +02:00

313 lines
11 KiB
YAML

# Copyright 2021 Collate
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: py-tests
on:
merge_group:
workflow_dispatch:
pull_request_target:
types: [labeled, opened, synchronize, reopened, ready_for_review]
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
# matrix can't use 'env'. When updating it, update it for both jobs.
MAIN_PYTHON_VERSION: "3.10"
SONAR_OPTS: >-
-Dsonar.pullrequest.key=${{ github.event.pull_request.number }}
-Dsonar.pullrequest.branch=${{ github.event.pull_request.head.ref }}
-Dsonar.pullrequest.github.repository=OpenMetadata
-Dsonar.scm.revision=${{ github.event.pull_request.head.sha }}
-Dsonar.pullrequest.provider=github
jobs:
# Detect whether relevant paths changed. When no Python/service/schema files
# are modified the downstream jobs are skipped via their `if` condition.
# A job skipped by `if` reports as "Success", so required checks still pass.
# This replaces the old py-tests-skip.yml companion workflow.
changes:
name: Detect Changes
runs-on: ubuntu-latest
if: ${{ !github.event.pull_request.draft }}
outputs:
python: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.filter.outputs.python }}
steps:
- uses: dorny/paths-filter@v3
id: filter
if: ${{ github.event_name != 'workflow_dispatch' }}
with:
filters: |
python:
- 'ingestion/**'
- 'openmetadata-service/**'
- 'openmetadata-spec/src/main/resources/json/schema/**'
- 'pom.xml'
- 'Makefile'
py-unit-tests:
name: Unit Tests & Static Checks
needs: changes
if: ${{ needs.changes.outputs.python == 'true' }}
timeout-minutes: 60
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
py-version: ["3.10", "3.11", "3.12"]
steps:
- name: Wait for the labeler
uses: lewagon/wait-on-check-action@v1.3.4
if: ${{ github.event_name == 'pull_request_target' }}
with:
ref: ${{ github.event.pull_request.head.sha }}
check-name: Team Label
repo-token: ${{ secrets.GITHUB_TOKEN }}
wait-interval: 30
- name: Verify PR labels
uses: jesusvasquez333/verify-pr-label-action@v1.4.0
if: ${{ github.event_name == 'pull_request_target' }}
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
valid-labels: "safe to test"
pull-request-number: "${{ github.event.pull_request.number }}"
disable-reviews: true
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'merge_group' && github.sha || github.event.pull_request.head.sha }}
- name: Setup Openmetadata Test Environment
uses: ./.github/actions/setup-openmetadata-test-environment
with:
python-version: ${{ matrix.py-version }}
install-server: 'false'
- name: Run Static Checks
# basedpyright is configured with `pythonVersion = "3.10"` (the lowest
# supported version) so type-checking results are identical across the
# 3.10/3.11/3.12 matrix. Run on the lowest version only to avoid
# redundant work and keep the baseline file deterministic.
if: matrix.py-version == '3.10'
run: |
source env/bin/activate
cd ingestion
nox --no-venv -s static-checks
shell: bash
- name: Run Unit Tests
run: |
source env/bin/activate
cd ingestion
nox --no-venv -s unit-tests
shell: bash
- name: Upload coverage artifact
if: ${{ matrix.py-version == env.MAIN_PYTHON_VERSION && !cancelled() }}
uses: actions/upload-artifact@v4
with:
name: coverage-unit
path: ingestion/.coverage
include-hidden-files: true
py-integration-tests:
name: "Integration Tests (${{ matrix.shard.name }}, ${{ matrix.py-version }})"
needs: changes
if: ${{ needs.changes.outputs.python == 'true' }}
timeout-minutes: 180
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
py-version: ["3.10", "3.11", "3.12"]
shard:
- name: "shard-1"
nox-args: >-
tests/integration/ometa
tests/integration/postgres
tests/integration/mysql
tests/integration/profiler
tests/integration/data_quality
- name: "shard-2"
nox-args: >-
--ignore=tests/integration/ometa
--ignore=tests/integration/postgres
--ignore=tests/integration/mysql
--ignore=tests/integration/profiler
--ignore=tests/integration/data_quality
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false
android: true
dotnet: true
haskell: true
large-packages: false
swap-storage: true
docker-images: false
- name: Wait for the labeler
uses: lewagon/wait-on-check-action@v1.3.4
if: ${{ github.event_name == 'pull_request_target' }}
with:
ref: ${{ github.event.pull_request.head.sha }}
check-name: Team Label
repo-token: ${{ secrets.GITHUB_TOKEN }}
wait-interval: 90
- name: Verify PR labels
uses: jesusvasquez333/verify-pr-label-action@v1.4.0
if: ${{ github.event_name == 'pull_request_target' }}
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
valid-labels: "safe to test"
pull-request-number: "${{ github.event.pull_request.number }}"
disable-reviews: true # To not auto approve changes
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'merge_group' && github.sha || github.event.pull_request.head.sha }}
- name: Setup Openmetadata Test Environment
uses: ./.github/actions/setup-openmetadata-test-environment
with:
python-version: ${{ matrix.py-version}}
args: "-m no-ui"
ingestion_dependency: "mysql,elasticsearch,sample-data"
- name: Run Integration Tests
run: |
source env/bin/activate
cd ingestion
nox --no-venv -s integration-tests -- --standalone --durations=5 ${{ matrix.shard.nox-args }}
env:
TESTCONTAINERS_RYUK_DISABLED: true
shell: bash
- name: Upload coverage artifact
if: ${{ matrix.py-version == env.MAIN_PYTHON_VERSION && !cancelled() }}
uses: actions/upload-artifact@v4
with:
name: coverage-integration-${{ matrix.shard.name }}
path: ingestion/.coverage
include-hidden-files: true
- name: Clean Up
run: |
cd ./docker/development
docker compose down --remove-orphans
sudo rm -rf ${PWD}/docker-volume
# Single required-check gate for branch protection.
# Skipped (= "Success") when all test jobs pass or are legitimately skipped.
# Runs and exits 1 only when a test job fails or is cancelled.
# Set "py-tests / py-tests-status" as the sole required check for this workflow.
py-tests-status:
name: py-tests-status
needs: [changes, py-unit-tests, py-integration-tests]
if: ${{ failure() || cancelled() }}
runs-on: ubuntu-latest
steps:
- run: exit 1
py-combine-coverage:
needs: [changes, py-unit-tests, py-integration-tests]
if: ${{ needs.changes.outputs.python == 'true' && !cancelled() }}
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'merge_group' && github.sha || github.event.pull_request.head.sha }}
fetch-depth: 0
filter: blob:none
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ env.MAIN_PYTHON_VERSION }}
- name: Install uv
run: pip install uv
shell: bash
- name: Install coverage
run: |
python3 -m venv env
source env/bin/activate
uv pip install "coverage[toml]" nox
shell: bash
- name: Download coverage artifacts
uses: actions/download-artifact@v4
with:
pattern: coverage-*
path: ingestion/coverage-data/
- name: Prepare coverage files
run: |
cd ingestion
[ -f coverage-data/coverage-unit/.coverage ] && mv coverage-data/coverage-unit/.coverage .coverage.unit
for dir in coverage-data/coverage-integration-*/; do
shard=$(basename "$dir" | sed 's/coverage-integration-//')
[ -f "$dir/.coverage" ] && mv "$dir/.coverage" ".coverage.integration-$shard"
done
shell: bash
- name: Combine coverage
run: |
source env/bin/activate
cd ingestion
nox --no-venv -s combine-coverage
shell: bash
- name: Remove pom.xml
run: rm pom.xml
shell: bash
# we have to pass these args values since we are working with the 'pull_request_target' trigger
- name: Push Results in PR to Sonar
id: push-to-sonar
if: ${{ github.event_name == 'pull_request_target'}}
continue-on-error: true
uses: SonarSource/sonarqube-scan-action@v7
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.INGESTION_SONAR_SECRET }}
with:
projectBaseDir: ingestion/
args: ${{ env.SONAR_OPTS }}
# next two steps are for retrying "Push Results in PR to Sonar" step in case it fails
- name: Wait to retry 'Push Results in PR to Sonar'
if: ${{ github.event_name == 'pull_request_target' && steps.push-to-sonar.outcome != 'success' }}
run: sleep 20s
shell: bash
- name: Retry 'Push Results in PR to Sonar'
uses: SonarSource/sonarqube-scan-action@v7
if: ${{ github.event_name == 'pull_request_target' && steps.push-to-sonar.outcome != 'success' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.INGESTION_SONAR_SECRET }}
with:
projectBaseDir: ingestion/
args: ${{ env.SONAR_OPTS }}