OpenMetadata/ingestion/tests/unit
Mayur Singal 9921dc1389
Fixes #28245: ingest valueless Databricks/Unity Catalog tags (#28294)
* Fixes #28245: ingest valueless Databricks/Unity Catalog tags

Databricks/Unity Catalog exposes system-generated (and some user-defined)
tags as (tag_name, tag_value=null). The connectors mapped tag_name ->
Classification and tag_value -> Tag, so an empty tag_value was either
skipped (Unity Catalog) or coerced to a "NONE" sentinel (Databricks).

When tag_value is empty, fall back to a dedicated per-connector
classification (DATABRICKS_TAGS / UNITY_CATALOG_TAGS) and use tag_name
verbatim as the tag under it (no dot-splitting). Valued tags are
unchanged: classification = tag_name, tag = tag_value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address review: harden valueless-tag mapping

- Treat whitespace-only tag_value as valueless (strip-based check) so it
  falls back to the *_TAGS classification instead of being silently
  dropped downstream by get_ometa_tag_and_classification.
- Skip rows with empty/None tag_name in the Databricks connector, for
  parity with Unity Catalog, so an empty classification name is never
  sent to the API.
- Add tests for whitespace-only tag_value (both connectors) and the
  empty tag_name skip (Databricks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:41:03 +05:30
..
airflow make compressed dag ingestion available (#27984) 2026-05-18 11:58:00 +05:30
bulksink chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
clients chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
connections ISSUE #20036 - sqlalchemy 2.0 migration (#26031) 2026-03-02 13:07:47 -08:00
data_quality/validations chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
diagnostics ingestion: runtime diagnostics subsystem (#28161) 2026-05-20 16:38:09 -07:00
domain feat(ingestion): introduce TagRegistry domain layer (#27991) 2026-05-11 17:56:31 +02:00
great_expectations chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
lineage chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
metadata fix(ometa): use requests instead of httpx for SSE transport (#28293) 2026-05-20 14:14:07 +02:00
models Fix: Empty entity name/special name fixes in powerbi (#28092) 2026-05-14 21:41:55 +00:00
observability refactor(sampler): collapse SamplerInterface to a single typed config object (#28147) 2026-05-19 14:56:02 +02:00
pii Fixes 21329: exclude temporal table period columns from autoClassification sampling (#27960) 2026-05-11 11:45:40 +05:30
readers chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
resources Fixes #27950: [Datalake] JSON columns incorrectly typed as STRING for empty dict values (#27951) 2026-05-11 18:02:06 +05:30
sampler refactor(sampler): collapse SamplerInterface to a single typed config object (#28147) 2026-05-19 14:56:02 +02:00
sdk Fixes #4003: bulk + async restore for large entity hierarchies (#27997) 2026-05-20 17:57:40 -07:00
source chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
topology Fixes #28245: ingest valueless Databricks/Unity Catalog tags (#28294) 2026-05-21 19:41:03 +05:30
utils fix(logging): synchronous shutdown captures full streamable log tail (#28273) 2026-05-20 11:28:52 +02:00
workflow chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
__init__.py
conftest.py fix(logging): synchronous shutdown captures full streamable log tail (#28273) 2026-05-20 11:28:52 +02:00
test_avro_parser.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_azure_credentials.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_build_connection_url.py ISSUE #20036 - sqlalchemy 2.0 migration (#26031) 2026-03-02 13:07:47 -08:00
test_column_type_parser.csv Datalake: Add manifest file support, fix profiler metrics, add array and json column type support (#13017) 2023-09-13 15:15:49 +05:30
test_column_type_parser.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_config.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_connection_builders.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_credentials.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_data_insight_chart_imports.py refactor(schema): extract chart Function/KPIDetails into chartFunctions.json (#28049) 2026-05-12 14:28:12 +02:00
test_datatypes.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_db_utils.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_dbt.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_dbt_http_config.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_dbt_ingest.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_entity_link.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_exit_handler.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_filter_pattern.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_fqn.py fix(ingestion): isolate per-entity failures so one bad table doesn't break a schema (#28060) 2026-05-13 11:22:31 +05:30
test_handle_partitions.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_helpers.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_importer.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_incremental_extraction.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_json_schema_parser.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_lineage_empty_result.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_logger.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_mf4_reader.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_ometa_client_resilience.py fix(ometa): resilient transport — keepalive, retry, typed RestTransportError (#28256) 2026-05-19 15:46:25 +02:00
test_ometa_endpoints.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_ometa_http_adapter.py fix(ometa): resilient transport — keepalive, retry, typed RestTransportError (#28256) 2026-05-19 15:46:25 +02:00
test_ometa_mlmodel.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_ometa_restore.py Fixes #4003: bulk + async restore for large entity hierarchies (#27997) 2026-05-20 17:57:40 -07:00
test_ometa_to_dataframe.parquet
test_ometa_to_dataframe.py ISSUE #3032 (#27912) 2026-05-07 09:01:18 -07:00
test_ometa_utils.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_owner_config.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_owner_utils.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_parser_connection_class.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_parser_connection_fallback.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_parser_connection_module.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_partition.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_path_pattern.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_powerbi_filter_query.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_powerbi_table_measures.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_protobuf_parser.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_pydantic_v2.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_query_parser.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_root_model_defaults.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_scaffold.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_sink_barrier.py fix(powerbi): flush sink buffer before lineage resolution (#28308) 2026-05-21 14:54:35 +02:00
test_sink_buffer_on_flush_failure.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_sink_deduplication.py MINOR: Python E2E Test Fixes (#24821) 2025-12-15 08:07:05 +05:30
test_sink_empty_tag_validation.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_source_connection.py Migrate Databricks from sqlalchemy-databricks to databricks-sqlalchemy (#26896) 2026-05-04 18:53:24 +05:30
test_source_parsing.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_source_url.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_ssl_manager.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_status.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_time_utils.py MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
test_topology_runner_restore.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_trino_complex_types.py Fix #20689: Trino Column validation errors for highly complex fields (#22421) 2025-07-28 11:11:44 +05:30
test_trino_connection_ssl_verify.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_ttl_cache.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_usage_filter.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_usage_log.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_user_agent.py fix(ometa): use requests instead of httpx for SSE transport (#28293) 2026-05-20 14:14:07 +02:00
test_version.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
test_workflow_parse.py chore(ingestion): migrate to ruff for format + isort + unused-import (#27739) 2026-04-27 10:05:28 +02:00
test_workflow_parse_example_config.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00