OpenMetadata/ingestion
Mayur Singal 9921dc1389
Fixes #28245: ingest valueless Databricks/Unity Catalog tags (#28294)
* Fixes #28245: ingest valueless Databricks/Unity Catalog tags

Databricks/Unity Catalog exposes system-generated (and some user-defined)
tags as (tag_name, tag_value=null). The connectors mapped tag_name ->
Classification and tag_value -> Tag, so an empty tag_value was either
skipped (Unity Catalog) or coerced to a "NONE" sentinel (Databricks).

When tag_value is empty, fall back to a dedicated per-connector
classification (DATABRICKS_TAGS / UNITY_CATALOG_TAGS) and use tag_name
verbatim as the tag under it (no dot-splitting). Valued tags are
unchanged: classification = tag_name, tag = tag_value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address review: harden valueless-tag mapping

- Treat whitespace-only tag_value as valueless (strip-based check) so it
  falls back to the *_TAGS classification instead of being silently
  dropped downstream by get_ometa_tag_and_classification.
- Skip rows with empty/None tag_name in the Databricks connector, for
  parity with Unity Catalog, so an empty classification name is never
  sent to the API.
- Add tests for whitespace-only tag_value (both connectors) and the
  empty tag_name skip (Databricks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:41:03 +05:30
..
.basedpyright ingestion: runtime diagnostics subsystem (#28161) 2026-05-20 16:38:09 -07:00
docs/design ingestion: runtime diagnostics subsystem (#28161) 2026-05-20 16:38:09 -07:00
examples chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
operators fix(docker): mirror IBM iAccess driver on collate CDN (#28097) 2026-05-14 10:47:02 +02:00
pipelines Openlineage: Added Kinesis Support #24752 (#26050) 2026-02-26 14:20:46 +05:30
src Fixes #28245: ingest valueless Databricks/Unity Catalog tags (#28294) 2026-05-21 19:41:03 +05:30
stubs Refactor(ingestion): introduce ClassifiableEntityAdapter to eliminate scattered isinstance checks (#27716) 2026-05-13 09:37:58 +02:00
tests Fixes #28245: ingest valueless Databricks/Unity Catalog tags (#28294) 2026-05-21 19:41:03 +05:30
__init__.py ci/nox-setup-testing (#21377) 2025-05-27 10:56:52 +02:00
airflow-constraints-3.1.7.txt Address Transitive vulnerabilities (#28169) 2026-05-16 00:02:49 -07:00
airflow-constraints-3.2.1.txt fix(security): upgrade Apache Airflow to 3.2.1 (#28101) 2026-05-21 17:18:55 +05:30
Dockerfile fix(security): upgrade Apache Airflow to 3.2.1 (#28101) 2026-05-21 17:18:55 +05:30
Dockerfile.ci fix(security): upgrade Apache Airflow to 3.2.1 (#28101) 2026-05-21 17:18:55 +05:30
ingestion_dependency.sh Fix #23096: Add Airflow 3.x support (#24338) 2025-11-21 12:28:28 +01:00
LICENSE Docs - Ingestion License (#17893) 2024-09-17 08:58:53 -07:00
Makefile fix(security): upgrade Apache Airflow to 3.2.1 (#28101) 2026-05-21 17:18:55 +05:30
noxfile.py chore(ingestion): drop pylint, expand ruff (#27774) 2026-04-28 07:21:59 +02:00
pyproject.toml Refactor(ingestion): introduce ClassifiableEntityAdapter to eliminate scattered isinstance checks (#27716) 2026-05-13 09:37:58 +02:00
README.md Refactor: remove doc changes from OM repo (#22019) 2025-08-20 14:28:48 +05:30
setup.py fix(ingestion): cap kubernetes client below 36.0.0 (#28331) 2026-05-21 14:45:26 +02:00
sonar-project.properties MINOR: Fix sonar coverage (#25276) 2026-01-16 08:35:00 +01:00

This guide will help you setup the Ingestion framework and connectors

This guide will help you setup the Ingestion framework and connectors

Python version 3.9+

OpenMetadata Ingestion is a simple framework to build connectors and ingest metadata of various systems through OpenMetadata APIs. It could be used in an orchestration framework(e.g. Apache Airflow) to ingest metadata. Prerequisites

  • Python >= 3.9.x

Docs

Please refer to the documentation here https://docs.open-metadata.org/connectors

TopologyRunner

All the Ingestion Workflows run through the TopologyRunner.

The flow is depicted in the images below.

TopologyRunner Standard Flow

image

TopologyRunner Multithread Flow

image