mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
* Oh boy, factory-boy Created a bunch of `factory-boy` factories that help creating mock test data easily * Update `try_bind` docker utility to ease debugging * Resolve conflicts between `Classification` tags * Refactor `TagClassifier` into another entity This is so: 1. We're not tied to the `ColumnClassifier` interface that forced returning `Mapping[T, float]` (unnecessary since we're returning `List[ScoredTag]` 2. The tag analyzer uses the same `recognizer_factories` registry we used for `PIIProcessor` 3. Create a separate service that abstracts using `TagScorer` and `TagAnalyzer` to return `TagScore`s (makes testing upstream code easier) * Interface to retrieve available `Tag`s and `Classification`s * Refactor `TagProcessor` to support multi-classification - Depends `ClassificationManagerInterface` to retrieve `Tag`s and `Classification`s - Uses a callable dependency to score tags for a column - Accepts a classification filter parameter - Leverages `ConflictResolver` to resolve conflicts between tags of the same `Classification` * Add an integration test for the `TagProcessor` * Ensure `PII` classification is configured with migrations # Conflicts: # bootstrap/sql/migrations/native/1.11.1/mysql/postDataMigrationSQLScript.sql # bootstrap/sql/migrations/native/1.11.1/postgres/postDataMigrationSQLScript.sql * Move `FakeClassificationManager` to `_openmetadata_testutils` This is because importing from `tests` breaks in the CI when running pytests from the root of the repo * Fix broken mutually exclusive classifications This is because the implementation did not take into account previous tags when resolving conflicts. This caused that running the classifier twice for a classification, with a mutually exclusive configuration, would end up breaking the exclusivity
25 lines
No EOL
707 B
SQL
25 lines
No EOL
707 B
SQL
UPDATE test_definition
|
|
SET json = JSON_SET(
|
|
json,
|
|
'$.supportedServices',
|
|
JSON_ARRAY('Snowflake', 'BigQuery', 'Athena', 'Redshift', 'Postgres', 'MySQL', 'Mssql', 'Oracle', 'Trino', 'SapHana')
|
|
)
|
|
WHERE name = 'tableDiff'
|
|
AND (
|
|
JSON_EXTRACT(json, '$.supportedServices') IS NULL
|
|
OR JSON_LENGTH(JSON_EXTRACT(json, '$.supportedServices')) = 0
|
|
);
|
|
|
|
UPDATE
|
|
classification
|
|
SET
|
|
json = JSON_SET(
|
|
json,
|
|
'$.autoClassificationConfig',
|
|
CAST(
|
|
'{"enabled": true, "conflictResolution": "highest_priority", "minimumConfidence": 0.6, "requireExplicitMatch": true}'
|
|
AS JSON
|
|
)
|
|
)
|
|
WHERE
|
|
JSON_VALUE(json, '$.name' RETURNING CHAR) = 'PII'; |