mirror of https://github.com/open-metadata/OpenMetadata synced 2026-05-24 09:39:11 +00:00

History

Mohit Yadav 7bb8e40b65 Fix column filtering on Lineage (#25353 ) * Fix Column Filtering and add path preserve * Preserve only column with matching filter * Add Test * update param * Add UI work * Lanaguage * Add proper translations for column-filter locale keys (#25360) * Initial plan * Add proper translations for column-filter locale keys across all 18 languages Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com> * fix filtering * Fix ui : Dropdown filters (Domains, Owners, Tag, Tier, Service, etc.) were not showing in the Impact Analysis view and normal lineage view. * put back searchbox for column level * Fix query_filter not working for tag/domain/tier in lineage APIs -> table level filtering * fix: hasNodeLevelFilters bypassing ES filters causing empty results * fix: tag filter incorrectly sent to column_filter on table-level page * Fix Impact Analysis search and filtering with path preservation Summary of changes: - Backend: Path preservation for search, accurate pagination counts, wildcard query parsing, OR logic for name/displayName - Frontend: Column-level search now matches both table names and column names * Table level: Search → query_filter (matches table names) Column level: Search → column_filter only (matches column names) * Fix column impact analysis: depth-aware filtering, tag aggregation, and nested column support * address gitar bot feedback : lineage filter — add service to path preservation, fix OR semantics, rename preserve_paths, guard NPE on fromEntity * fix: use unfiltered depth counts in lineage pagination info, remove 10k doc fetch * fix: Impact Analysis — fix upstream BFS, always run BFS unfiltered and apply query filter as in-memory post-filter to support multi-depth traversal, fix column filter OR-within-type semantics, rename preserve_paths param, and add integration tests instead of passing queryFilter into the BFS (which blocked traversal through non-matching intermediate nodes), we now run BFS with no filter to discover the full graph topology, then apply the filter after all nodes and edges are collected using the existing applyInMemoryFiltersWithPathPreservationForEntityCount. * fix: lineage Impact Analysis — unfiltered BFS with post-filter for multi-depth traversal, upstream BFS direction fix, remove dead ES query column filter code, fix stale useCallback deps, add SDK methods and integration tests * fix: remove column_filter from UI calls where backend doesn't support it (exportAsync, platformLineage, dataQualityLineage, paginationInfo), fix stale useCallback deps in LineageProvider * fix: Impact Analysis — unfiltered BFS for multi-depth filter traversal, upstream direction fix, table/column tag separation, dead code cleanup, stale UI deps, node depth dropdown fix * fix: remove dead columnFilter plumbing from CustomControls, clear column filters on Table mode switch, fix QueryFilterParser search+filter OR logic, add search combo integration tests, log warn on tag fetch failure * fix: depth-based pagination sort * ui: performance optimization — avoids redundant lookups * handle matchesMultipleFiltersWithMetadata * fix: upstream/downstream count not updating in table view * fix UI changes * fix api issues * fix: Impact Analysis — move to ES-native filtering with unfiltered BFS, filtered pagination counts, tag name enrichment * address comments * fix: Impact Analysis — ES-native filtered traversal, batch tag enrichment, depth filtering with filters, SDK entityType support * fix tests * fix failing tests * fix backend test * add tests for code coverage * add tests for code coverage * fix: add id.keyword sub-field to ES index mappings to fix lineage filter dropdowns for topics, dashboards, and other non-table entities * address comments * fix service type filter case * address gitar bot feedback * fix tests * fix build * Fix the bugs * Fix the bugs * Fix all things related to Lineag, Impact Analysis * Update generated TypeScript types * Fix all things related to Lineag, Impact Analysis * Fix Mapping for ids for container and test suite * test: enhance lineage spec to cover all the missing cases (#26796) * test: enhance lineage spec to cover all the missing cases * fix searchIndex mapping * fix tests * added filter spec * fix filter issues * fix lineageSearchSelect * update database service filter tests * iterate over all the entity for service filter * update impact analysis fixes * update tests management * add missing test case * fix tests * fix column level lineage tests * fix apiEndpoint issue * improved lineage connection assertion * fix tests * fix column level linage issues * fix missing import * update test import from pages * fix mlModel spell issue * fix node pagination and right panel spec * refactor lineage tests to improve entity creation and visibility checks * fix license header * fix build * fix tests * fix tests * UI linter fixes * address comments * fix unit tests * remove redundant method * improve tests * fix impact analysis tests * fix impact analysis * Fix Export via Async and add tests * update tests * fix issues * Spotless fix * fix impact analysis * Fix issue with lineage export * Fix serviceType filtering * fix multiple calls issue * fix lint issues * fix uni tests * fix test issues * fix lineage settings spec * fix all the tests * Remove fix me * fix lint issue * fix failing specs --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com> Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com> Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io> Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>		2026-04-06 09:01:15 -07:00
..
.gitignore	Fix column filtering on Lineage (#25353 )	2026-04-06 09:01:15 -07:00
benchmark_lineage.py	Fix column filtering on Lineage (#25353 )	2026-04-06 09:01:15 -07:00
README.md	Fix column filtering on Lineage (#25353 )	2026-04-06 09:01:15 -07:00
seed_lineage_topology.py	Fix column filtering on Lineage (#25353 )	2026-04-06 09:01:15 -07:00

README.md

Perf Tests

This directory contains locally runnable performance benchmarks for OpenMetadata.

Lineage Benchmark

Use benchmark_lineage.py to:

discover lineaged assets across multiple entity types
benchmark graph lineage APIs
benchmark Impact Analysis table APIs
benchmark Impact Analysis column-mode APIs for tables
optionally capture Docker container stats snapshots before and after the run

The script uses only Python 3 standard library modules and writes a JSON report, a Markdown summary, and CSV outputs under perf-tests/results/.

Prerequisites

Python 3.9+
A running OpenMetadata instance
A valid JWT or personal access token
Optional: Docker CLI if you want container stats snapshots

Recommended Local Docker Resources

For larger lineage graphs, increase local Docker memory and CPU before running the benchmark. The exact values depend on the data volume, but a higher-memory setup helps avoid Elasticsearch and OpenMetadata JVM throttling during larger Impact Analysis runs.

For local Docker runs, the development compose now honors both:

OPENMETADATA_HEAP_OPTS
ES_JAVA_OPTS

Example:

export OPENMETADATA_HEAP_OPTS='-Xmx4G -Xms4G'
export ES_JAVA_OPTS='-Xms4g -Xmx4g'
./docker/run_local_docker.sh -m ui -d mysql -s false -i false -r true

Basic Usage

OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/benchmark_lineage.py \
  --base-url http://localhost:8585 \
  --warmup-runs 1 \
  --measured-runs 5

Useful Options

./perf-tests/benchmark_lineage.py --help

Common options:

--search-indexes table,topic,dashboard,pipeline,mlmodel,container,searchIndex,dashboardDataModel,storedProcedure,apiEndpoint,metric,chart
--benchmark-depth 2
--impact-page-size 100
--max-assets-per-type 10
--entities-file perf-tests/my-assets.json
--discovery-only
--docker-containers openmetadata-server,elasticsearch

Example: Benchmark Specific Assets

Create a JSON file with explicit assets:

[
  { "fqn": "sample_data.ecommerce_db.shopify.orders", "entityType": "table" },
  { "fqn": "sample_kafka.shopify.order_topic", "entityType": "topic" }
]

Then run:

OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/benchmark_lineage.py \
  --base-url http://localhost:8585 \
  --entities-file perf-tests/my-assets.json

Outputs

Each run creates a timestamped directory under perf-tests/results/, including:

assets.json: discovered or supplied assets and lineage counts
results.json: raw per-scenario benchmark results
summary.md: human-readable report
scenario_summary.csv: rollup per scenario
asset_results.csv: rollup per asset and scenario

Notes

The script does not create lineage data. It benchmarks whatever lineage is already present in the target environment.
Impact Analysis column-mode benchmarks are only executed for table assets.
getPaginationInfo is used during discovery to identify assets that actually have lineage.

Synthetic Lineage Seeding For Live Docker

Use seed_lineage_topology.py to create a synthetic table-lineage graph directly in a running OpenMetadata instance. This is the recommended path when you want to benchmark the branch already deployed in local Docker instead of the heavier Testcontainers-based integration benchmark.

The seeder creates:

a synthetic Postgres service, database, and schema
one root table
depth * width downstream tables
column lineage on every edge
one classification tag and one glossary term on the benchmark column

Example: Seed the 12x120 Topology

OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/seed_lineage_topology.py \
  --base-url http://localhost:8585 \
  --depth 12 \
  --width 120 \
  --output-dir perf-tests/results/seed-depth12-width120

Outputs:

manifest.json: root asset manifest compatible with benchmark_lineage.py
topology.json: created entity details plus the glossary term FQN

Example: Benchmark the Seeded 12x120 Topology

OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/benchmark_lineage.py \
  --base-url http://localhost:8585 \
  --entities-file perf-tests/results/seed-depth12-width120/manifest.json \
  --benchmark-depth 13 \
  --impact-page-size 100 \
  --warmup-runs 1 \
  --measured-runs 5 \
  --docker-containers openmetadata_server,openmetadata_elasticsearch,openmetadata_mysql

Use --benchmark-depth depth+1 for these seeded topologies when you want getPaginationInfo to include the deepest downstream layer from the root. The current pagination endpoint on the live stack requires one extra requested depth to surface the full seeded depth.

For targeted filter runs, reuse the same seeded manifest and pass:

--query-filter for structural or node-level table filtering
--column-filter for Impact Analysis column filtering

Synthetic Scale Benchmark

For controlled deep or wide Impact Analysis topologies, use the manual integration benchmark:

LineageImpactAnalysisBenchmarkIT.java

This benchmark provisions its own MySQL, Elasticsearch, and OpenMetadata test environment with Testcontainers, creates synthetic table lineage, and logs latency plus duplicate-count observations for:

table view without filters
table view with a structural filter
table view with a node-level filter
column view with a name filter
column view with a tag and glossary filter

Run a Single Scenario

The benchmark supports selecting scenarios with system properties:

mvn -pl openmetadata-integration-tests -P mysql-elasticsearch \
  -Dit.test=LineageImpactAnalysisBenchmarkIT \
  '-Djunit.jupiter.conditions.deactivate=*' \
  -Dlineage.benchmark.scenarios=depth12-width120 \
  -Dlineage.benchmark.warmupRuns=1 \
  -Dlineage.benchmark.measuredRuns=3 \
  -DfailIfNoTests=false \
  verify

Available scenario names:

depth12-width120
depth12-width240
depth12-width600
depth24-width120

This path is heavier than the Python benchmark because it creates the topology before measuring it. Increase Docker memory and CPU before running the larger scenarios.