OpenMetadata/ingestion/tests/integration/auto_classification
Eugenio 6ac135dc7e
Fixes 21329: exclude temporal table period columns from autoClassification sampling (#27960)
* fix(azuresql): exclude temporal table period columns from sampling

Query sys.columns for generated_always_type to detect SYSTEM_TIME period
columns (ValidFrom/ValidTo) and skip them in both schema reflection
(mssql/utils.py) and sample data fetching (AzureSQLSampler). Also moves
the catalog round-trip inside the `if columns` guard to avoid the query
when column filtering is not in use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(azuresql): add unit tests for temporal column exclusion

Adds sampler unit tests covering period-column filtering and NOT_COMPUTE_PYODBC
exclusion. Adds a PII processor test case for temporal tables using single
first-names to avoid non-deterministic NER matches. Corrects customers_sensitive
expected tags to include address→PII.NonSensitive, which the classifier now
correctly detects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(azuresql): add full workflow integration test for temporal tables

Replaces the isolated sampler unit test with an end-to-end integration test
that registers the AzureSQL service, creates a system-versioned table, runs
MetadataWorkflow then AutoClassificationWorkflow, and asserts that sample
data excludes ValidFrom/ValidTo. Includes SQL permission prerequisites and
troubleshooting guide in the module docstring. Teardown controlled by
AZURE_SQL_CLEANUP env var (default: true).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix `spacy<3.8` for `ingestion/[dev]`

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 11:45:40 +05:30
..
containers Containers: batch container data-model column tag retrieval to avoid subtree fan-out (#27836) 2026-04-30 20:55:55 -07:00
databases Fixes 21329: exclude temporal table period columns from autoClassification sampling (#27960) 2026-05-11 11:45:40 +05:30
__init__.py Allow multiple classifications in TagProcessor (#24545) 2025-12-10 07:26:12 -08:00