* chore(ingestion): enable basedpyright across the codebase via baseline
Removes the ~25 paths from `[tool.basedpyright] ignore` (which excluded
roughly 90% of the codebase from type checking) and grandfathers the
existing violations into a baseline file. New violations in any
previously-ignored file now fail CI.
Changes:
- ingestion/pyproject.toml: drop the entire `ignore = [...]` block
- ingestion/setup.py: bump `basedpyright~=1.14` to `~=1.39.0`
- ingestion/.basedpyright/baseline.json (new, ~13MB): captures the
starting violation set (~18.8K errors + ~37.4K warnings) so the
migration is behavior-preserving. Regenerate with
`cd ingestion && basedpyright -p pyproject.toml --baselinefile
.basedpyright/baseline.json --writebaseline`. basedpyright analysis
has minor non-determinism (similar to ruff's), so re-running
--writebaseline a few times converges the baseline.
- ingestion/noxfile.py: pass `--baselinefile .basedpyright/baseline.json`
to the basedpyright invocation in the `static-checks` session so CI
honors the grandfathering. CI already runs the session via
`cd ingestion && nox --no-venv -s static-checks` (py-tests.yml).
- ingestion/Makefile: `make static-checks` now delegates to
`nox -s static-checks` so local invocations match CI exactly. Also
drops the dead Python 3.9 / OM_SKIP_SDK_PY39 branch (we require
Python >=3.10 since the previous modernization PR).
- .gitignore: add `.serena/` (local language-server cache)
* chore(ingestion): add nox to the dev dependency set
The static-checks Makefile target and the py-tests CI job both delegate
to `nox -s static-checks`, but nox was being installed as a separate
side step (`pip install nox` in `install_dev_env`, `uv pip install nox`
in the test-environment composite action). Listing it in dev extras
means a plain `pip install ingestion[dev]` brings it in.
* chore(ingestion): pin basedpyright analysis to py3.10; CI runs once
Following the basedpyright + multi-Python-version research:
- ingestion/pyproject.toml: add `pythonVersion = "3.10"` to
[tool.basedpyright] so type-checking always analyzes for the lowest
supported Python version. Forward-incompatible code (tomllib usage,
PEP 695 generics, etc.) is caught at type-check time regardless of
which Python interpreter runs the checker.
- .github/workflows/py-tests.yml: gate the "Run Static Checks" step on
`matrix.py-version == '3.10'`. With pythonVersion pinned, results are
identical across the matrix; running once avoids redundant work and
keeps the baseline file deterministic. Unit tests still run on the
full 3.10/3.11/3.12 matrix to verify runtime compatibility.
- ingestion/.basedpyright/baseline.json: regenerated cleanly with the
new pythonVersion config (~18.8K errors / ~37.3K warnings, similar
scale to the previous baseline). Aligns with the canonical
type-check-on-floor / test-on-matrix pattern used by Pydantic, CPython,
and other major Python projects.
* chore(ingestion): pin basedpyright pythonPlatform to Linux + regen baseline
CI's previous run still surfaced ~9 issues (2 errors + 7 warnings) that
weren't in the baseline. Root cause: my local environment differs from
CI's in three ways that affect type inference — Python interpreter
(3.11 vs 3.10), platform (Darwin vs Linux), and pip-resolved package
versions (couchbase, avro, trino, sqlalchemy stubs all differ slightly).
This commit closes the platform gap and regenerates the baseline from a
fresh CI-equivalent environment:
- ingestion/pyproject.toml: add `pythonPlatform = "Linux"` to
[tool.basedpyright] so type-checking uses the Linux subset of stdlib /
third-party stubs regardless of where the analyzer runs.
- ingestion/.basedpyright/baseline.json: regenerated against a fresh
Python 3.10 venv installed via `uv pip install ingestion[test]` (the
same install path CI's setup-openmetadata-test-environment composite
action uses). New scale: ~18.7K errors / ~37.5K warnings — same
ballpark as the previous baseline, with column positions now matching
CI's environment.
Local-developer note: when running `make static-checks` from a venv
that doesn't mirror CI exactly (e.g. macOS, Python 3.11, different
package versions), you may see drift errors. The supported workflow for
regenerating the baseline is to mirror CI:
python3.10 -m venv /tmp/ci-mirror
source /tmp/ci-mirror/bin/activate
uv pip install --upgrade pip "setuptools<81"
uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
uv pip install -e "ingestion[test]"
uv pip install "basedpyright~=1.39.0" nox
cd ingestion && basedpyright -p pyproject.toml \
--baselinefile .basedpyright/baseline.json --writebaseline
* chore(ingestion): drop pythonPlatform pin and regen baseline from CI-mirror
The previous attempt added `pythonPlatform = "Linux"` thinking it would
make the local-generated baseline match CI. It did the opposite — Linux
platform stubs activate additional conditional code paths that weren't
analyzed before, so CI saw 101 errors instead of the prior 2 errors.
Reverting:
- Drop `pythonPlatform = "Linux"` from [tool.basedpyright]. Without it,
basedpyright analyzes for the host platform; on CI's ubuntu-latest
runner that's Linux automatically, but type-stub coverage stays the
same as before (matching the d9196dff6b baseline).
- Regenerate ingestion/.basedpyright/baseline.json against a fresh
Python 3.10 venv installed via `uv pip install ingestion[test]`
(mirroring CI's setup-openmetadata-test-environment composite action).
~18.8K errors / 37.7K warnings captured — same scale as the working
d9196dff6b version.
Local-developer note: any baseline regeneration done on macOS will drift
from CI's Linux env (different transitive package versions, different
stubs). The supported local mirror procedure:
python3.10 -m venv /tmp/ci-mirror
source /tmp/ci-mirror/bin/activate
uv pip install --upgrade pip "setuptools<81"
uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
uv pip install -e "ingestion[test]"
uv pip install "basedpyright~=1.39.0" nox
cd ingestion && basedpyright -p pyproject.toml \\
--baselinefile .basedpyright/baseline.json --writebaseline
* chore(ingestion): regen baseline from full CI install (mac arm64 mirror)
Prior CI-mirror only installed [test], skipping [all] and the four
--no-deps SA pins (sqlalchemy-redshift/databricks/ibmi, pydoris-custom).
That left ~75 connector packages out of the analysis env, so basedpyright
couldn't resolve types from databricks.sqlalchemy, GE 0.18 Batch,
sklearn BaseEstimator, airflow SQLAlchemy models, pandas/numpy stubs,
etc. CI saw 129 errors absent from the baseline.
Regenerated against a fresh py3.10 venv that mirrors
.github/actions/setup-openmetadata-test-environment exactly:
uv pip install ./ingestion[dev]
make generate
uv pip install "setuptools<81"
uv pip install --no-build-isolation "cx_Oracle>=8.3.0,<9"
uv pip install --no-deps sqlalchemy-redshift==0.8.14 \
sqlalchemy-databricks==0.2.0 \
sqlalchemy-ibmi==0.9.3 \
pydoris-custom==1.1.0
uv pip install ./ingestion[all]
uv pip install ./ingestion[test]
uv pip install nox
First run: 128 errors, 272 warnings — within 1 error of CI's 129/272.
Wrote baseline with 56,100 entries across 1,035 files. Verify run with
the new baseline reports 0/0/0.
macOS arm64 vs Linux x86_64 wheel resolution may leave a small residual
(~3-7 errors per the d9196dff6b precedent). Re-run --writebaseline 2-3x
if any show up in CI.
* chore(ingestion): silence avro.py:95 basedpyright residual
CI's Linux fastavro stub returns Schema as `str | List[Any]`, while
the macOS arm64 wheel narrows to `str` — the only error not absorbed
by the regenerated baseline. Add a targeted pyright: ignore on the
parse_avro_schema call instead of broadening behavior.
* chore(ingestion): tolerate cross-platform pyright ignore drift
CI's `--baselinemode=lock` (default) requires the baseline to match
exactly — neither up nor down. Two related issues:
1. The avro.py noqa silenced not just the surfaced error but 10
cascading entries at line 95 (sub-errors propagating from the
unresolved `schema` arg type). Baseline went `down by 10` → lock
violated → exit 3 even with `0 errors` reported. Regenerate baseline
so the 10 stale entries are dropped.
2. The macOS arm64 fastavro stub doesn't surface that error in the
first place, so basedpyright treats the noqa as
`reportUnnecessaryTypeIgnoreComment` locally — causing the opposite
lock mismatch on CI (a warning entry that doesn't exist there).
Disable the rule so platform-specific residuals can land without
flapping between local and CI.
* chore(ingestion): use --baselinemode=discard for cross-platform tolerance
CI's implicit default is `lock`, which fails on any baseline change in
either direction (errors going up *or* down) via console.error → exit 3.
That cannot accommodate macOS arm64 vs Linux x86_64 stub drift: a
baseline regenerated locally always carries some entries that don't fire
on CI (and vice versa).
`auto` would tolerate the drift but silently overwrites the baseline
file — unacceptable in CI, where unreviewed changes never get committed
back.
`discard` is the right balance:
- New errors not in the baseline still fail the run (early-return path
in BaselineHandler.write before the lock/discard branch).
- Stale baseline entries (errors that no longer fire on the current
platform) print an info message and exit 0.
- The baseline file is never modified.
* chore: added merge_group for github merge queue
* chore: remove unnecessary merger group on team labeler
* fix: added gates for merge queue and pull request events
* ci: reduce checkout history footprint in PR workflows
Optimize actions/checkout usage to avoid downloading the full repo blob
history on every PR run. The repo is large, so cloning everything just
to run tests wastes minutes of CI time per job.
- py-operator-build-test.yml: drop fetch-depth: 0 (no history needed)
- openmetadata-service-unit-tests.yml: drop fetch-depth: 0 (Sonar is
explicitly skipped via -Dsonar.skip=true); shallow-fetch PR base ref
- airflow-apis-tests.yml, py-tests.yml, yarn-coverage.yml: add
filter: blob:none to Sonar jobs so commits/trees remain available
for blame while blobs are fetched lazily on demand
- ui-checkstyle.yml: add filter: blob:none to all jobs that rely on
tj-actions/changed-files (needs commit/tree metadata, not blobs)
* ci: drop fetch-depth: 0 from jobs that don't walk history
Follow-up audit after the initial pass. Four jobs were still declaring
fetch-depth: 0 (plus filter: blob:none in two cases) without actually
needing any history beyond HEAD.
ui-checkstyle.yml
- i18n-sync: runs 'yarn i18n' then 'git status --porcelain'. git status
compares the working tree to HEAD; no history walk. Default depth 1
is sufficient.
- app-docs: same pattern with 'yarn generate:app-docs'.
py-sonarcloud-nightly.yml
- py-unit-tests: only uploads a coverage artifact, no Sonar invocation.
- py-integration-tests: same.
- py-combine-coverage: does run SonarSource/sonarqube-scan-action, so
it genuinely needs the commit graph — added filter: blob:none for
parity with the PR Sonar jobs.
* ci: remove unused 'Fetch PR base branch' step from service unit tests
Copilot review flagged that the step was using --depth=1 while the main
checkout is also shallow, which would break any merge-base operation.
On investigation, nothing downstream actually uses the base ref: the
only command that runs after the checkout is 'mvn ... -Dsonar.skip=true',
which has no git dependency. The step was preserved defensively in the
previous commit, but it's dead code — cleanest fix is to delete it.
* fix(ci): replace py-tests skip workflow with job-level path filtering and gate jobs
Replace the dual-workflow (real + skip) pattern with a single-workflow
approach using dorny/paths-filter for change detection and job-level
`if` conditions. A job skipped by `if` reports as "Success" for
required checks, eliminating the need for companion skip workflows.
Add inverse-gate status jobs (`py-tests-status`, `py-tests-postgres-status`)
that only run on failure/cancellation. These are the only jobs that need
to be set as required checks in branch protection — one per workflow
instead of one per matrix expansion.
How the gate works:
- All tests pass or skipped → gate is skipped → reports "Success"
- Any test fails → gate runs → exits 1 → blocks merge
Changes:
- py-tests.yml: remove `paths:` filter, add `changes` detection job,
gate test jobs on its output, add `py-tests-status` gate job
- py-tests-postgres.yml: same approach, add `py-tests-postgres-status`
- Delete py-tests-skip.yml (no longer needed)
* fix(ci): rename postgres gate job to py-tests-status for consistency
The workflow name already provides the context (py-tests-postgres),
so the gate job should just be py-tests-status like in the mysql workflow.
* chore(sqlalchemy): migrate to sql 2.0
* chore(sqlalchemy): migrate to sql 2.0
* chore: add --no-deps to python ci setup
* chore: fix failing unit tests
* chore(sqlalchemy): migrate to sql 2.0
* chore(sqlalchemy): migrate to sql 2.0
* chore(sqlalchemy): address flagged bugs from CI
* increase unit test timeout
* fix: failing CI
* fi xocnnection bug
* fix CI failures
* Add Java 17 support
* Change Test HTTP client provider
* Create Tests HTTP Client once
* Create Tests HTTP Client once
* fix(CI): Update CI to use jdk 17 and dockerfiles as well
---------
Co-authored-by: Akash-Jain <Akash.J@deuexsolutions.com>
* Adding the different docker-compose file openmetadata and ingestion
* Added two different env files for mysql and postgres
* Updated the docker file path
* Updated the path of docker folder structure
* Fix docker
* Updating the PR with necessary changes required
---------
Co-authored-by: “Vijay” <“vijay.l@deuexsolutions.com”>
Co-authored-by: Akash-Jain <Akash.J@deuexsolutions.com>
* Updated the command for ubuntu dependencies
* Updated the command for ubuntu dependencies
* Updated the command for ubuntu dependencies
---------
Co-authored-by: “Vijay” <“vijay.l@deuexsolutions.com”>