* Fix Column Filtering and add path preserve * Preserve only column with matching filter * Add Test * update param * Add UI work * Lanaguage * Add proper translations for column-filter locale keys (#25360) * Initial plan * Add proper translations for column-filter locale keys across all 18 languages Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com> * fix filtering * Fix ui : Dropdown filters (Domains, Owners, Tag, Tier, Service, etc.) were not showing in the Impact Analysis view and normal lineage view. * put back searchbox for column level * Fix query_filter not working for tag/domain/tier in lineage APIs -> table level filtering * fix: hasNodeLevelFilters bypassing ES filters causing empty results * fix: tag filter incorrectly sent to column_filter on table-level page * Fix Impact Analysis search and filtering with path preservation Summary of changes: - Backend: Path preservation for search, accurate pagination counts, wildcard query parsing, OR logic for name/displayName - Frontend: Column-level search now matches both table names and column names * Table level: Search → query_filter (matches table names) Column level: Search → column_filter only (matches column names) * Fix column impact analysis: depth-aware filtering, tag aggregation, and nested column support * address gitar bot feedback : lineage filter — add service to path preservation, fix OR semantics, rename preserve_paths, guard NPE on fromEntity * fix: use unfiltered depth counts in lineage pagination info, remove 10k doc fetch * fix: Impact Analysis — fix upstream BFS, always run BFS unfiltered and apply query filter as in-memory post-filter to support multi-depth traversal, fix column filter OR-within-type semantics, rename preserve_paths param, and add integration tests instead of passing queryFilter into the BFS (which blocked traversal through non-matching intermediate nodes), we now run BFS with no filter to discover the full graph topology, then apply the filter after all nodes and edges are collected using the existing applyInMemoryFiltersWithPathPreservationForEntityCount. * fix: lineage Impact Analysis — unfiltered BFS with post-filter for multi-depth traversal, upstream BFS direction fix, remove dead ES query column filter code, fix stale useCallback deps, add SDK methods and integration tests * fix: remove column_filter from UI calls where backend doesn't support it (exportAsync, platformLineage, dataQualityLineage, paginationInfo), fix stale useCallback deps in LineageProvider * fix: Impact Analysis — unfiltered BFS for multi-depth filter traversal, upstream direction fix, table/column tag separation, dead code cleanup, stale UI deps, node depth dropdown fix * fix: remove dead columnFilter plumbing from CustomControls, clear column filters on Table mode switch, fix QueryFilterParser search+filter OR logic, add search combo integration tests, log warn on tag fetch failure * fix: depth-based pagination sort * ui: performance optimization — avoids redundant lookups * handle matchesMultipleFiltersWithMetadata * fix: upstream/downstream count not updating in table view * fix UI changes * fix api issues * fix: Impact Analysis — move to ES-native filtering with unfiltered BFS, filtered pagination counts, tag name enrichment * address comments * fix: Impact Analysis — ES-native filtered traversal, batch tag enrichment, depth filtering with filters, SDK entityType support * fix tests * fix failing tests * fix backend test * add tests for code coverage * add tests for code coverage * fix: add id.keyword sub-field to ES index mappings to fix lineage filter dropdowns for topics, dashboards, and other non-table entities * address comments * fix service type filter case * address gitar bot feedback * fix tests * fix build * Fix the bugs * Fix the bugs * Fix all things related to Lineag, Impact Analysis * Update generated TypeScript types * Fix all things related to Lineag, Impact Analysis * Fix Mapping for ids for container and test suite * test: enhance lineage spec to cover all the missing cases (#26796) * test: enhance lineage spec to cover all the missing cases * fix searchIndex mapping * fix tests * added filter spec * fix filter issues * fix lineageSearchSelect * update database service filter tests * iterate over all the entity for service filter * update impact analysis fixes * update tests management * add missing test case * fix tests * fix column level lineage tests * fix apiEndpoint issue * improved lineage connection assertion * fix tests * fix column level linage issues * fix missing import * update test import from pages * fix mlModel spell issue * fix node pagination and right panel spec * refactor lineage tests to improve entity creation and visibility checks * fix license header * fix build * fix tests * fix tests * UI linter fixes * address comments * fix unit tests * remove redundant method * improve tests * fix impact analysis tests * fix impact analysis * Fix Export via Async and add tests * update tests * fix issues * Spotless fix * fix impact analysis * Fix issue with lineage export * Fix serviceType filtering * fix multiple calls issue * fix lint issues * fix uni tests * fix test issues * fix lineage settings spec * fix all the tests * Remove fix me * fix lint issue * fix failing specs --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: karanh37 <33024356+karanh37@users.noreply.github.com> Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com> Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io> Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| .gitignore | ||
| benchmark_lineage.py | ||
| README.md | ||
| seed_lineage_topology.py | ||
Perf Tests
This directory contains locally runnable performance benchmarks for OpenMetadata.
Lineage Benchmark
Use benchmark_lineage.py to:
- discover lineaged assets across multiple entity types
- benchmark graph lineage APIs
- benchmark Impact Analysis table APIs
- benchmark Impact Analysis column-mode APIs for tables
- optionally capture Docker container stats snapshots before and after the run
The script uses only Python 3 standard library modules and writes a JSON report,
a Markdown summary, and CSV outputs under perf-tests/results/.
Prerequisites
- Python 3.9+
- A running OpenMetadata instance
- A valid JWT or personal access token
- Optional: Docker CLI if you want container stats snapshots
Recommended Local Docker Resources
For larger lineage graphs, increase local Docker memory and CPU before running the benchmark. The exact values depend on the data volume, but a higher-memory setup helps avoid Elasticsearch and OpenMetadata JVM throttling during larger Impact Analysis runs.
For local Docker runs, the development compose now honors both:
OPENMETADATA_HEAP_OPTSES_JAVA_OPTS
Example:
export OPENMETADATA_HEAP_OPTS='-Xmx4G -Xms4G'
export ES_JAVA_OPTS='-Xms4g -Xmx4g'
./docker/run_local_docker.sh -m ui -d mysql -s false -i false -r true
Basic Usage
OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/benchmark_lineage.py \
--base-url http://localhost:8585 \
--warmup-runs 1 \
--measured-runs 5
Useful Options
./perf-tests/benchmark_lineage.py --help
Common options:
--search-indexes table,topic,dashboard,pipeline,mlmodel,container,searchIndex,dashboardDataModel,storedProcedure,apiEndpoint,metric,chart--benchmark-depth 2--impact-page-size 100--max-assets-per-type 10--entities-file perf-tests/my-assets.json--discovery-only--docker-containers openmetadata-server,elasticsearch
Example: Benchmark Specific Assets
Create a JSON file with explicit assets:
[
{ "fqn": "sample_data.ecommerce_db.shopify.orders", "entityType": "table" },
{ "fqn": "sample_kafka.shopify.order_topic", "entityType": "topic" }
]
Then run:
OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/benchmark_lineage.py \
--base-url http://localhost:8585 \
--entities-file perf-tests/my-assets.json
Outputs
Each run creates a timestamped directory under perf-tests/results/, including:
assets.json: discovered or supplied assets and lineage countsresults.json: raw per-scenario benchmark resultssummary.md: human-readable reportscenario_summary.csv: rollup per scenarioasset_results.csv: rollup per asset and scenario
Notes
- The script does not create lineage data. It benchmarks whatever lineage is already present in the target environment.
- Impact Analysis column-mode benchmarks are only executed for table assets.
getPaginationInfois used during discovery to identify assets that actually have lineage.
Synthetic Lineage Seeding For Live Docker
Use seed_lineage_topology.py to create a synthetic table-lineage graph directly in a running OpenMetadata instance. This is the recommended path when you want to benchmark the branch already deployed in local Docker instead of the heavier Testcontainers-based integration benchmark.
The seeder creates:
- a synthetic Postgres service, database, and schema
- one root table
depth * widthdownstream tables- column lineage on every edge
- one classification tag and one glossary term on the benchmark column
Example: Seed the 12x120 Topology
OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/seed_lineage_topology.py \
--base-url http://localhost:8585 \
--depth 12 \
--width 120 \
--output-dir perf-tests/results/seed-depth12-width120
Outputs:
manifest.json: root asset manifest compatible withbenchmark_lineage.pytopology.json: created entity details plus the glossary term FQN
Example: Benchmark the Seeded 12x120 Topology
OPENMETADATA_JWT_TOKEN="<token>" \
./perf-tests/benchmark_lineage.py \
--base-url http://localhost:8585 \
--entities-file perf-tests/results/seed-depth12-width120/manifest.json \
--benchmark-depth 13 \
--impact-page-size 100 \
--warmup-runs 1 \
--measured-runs 5 \
--docker-containers openmetadata_server,openmetadata_elasticsearch,openmetadata_mysql
Use --benchmark-depth depth+1 for these seeded topologies when you want
getPaginationInfo to include the deepest downstream layer from the root. The
current pagination endpoint on the live stack requires one extra requested
depth to surface the full seeded depth.
For targeted filter runs, reuse the same seeded manifest and pass:
--query-filterfor structural or node-level table filtering--column-filterfor Impact Analysis column filtering
Synthetic Scale Benchmark
For controlled deep or wide Impact Analysis topologies, use the manual integration benchmark:
LineageImpactAnalysisBenchmarkIT.java
This benchmark provisions its own MySQL, Elasticsearch, and OpenMetadata test environment with Testcontainers, creates synthetic table lineage, and logs latency plus duplicate-count observations for:
- table view without filters
- table view with a structural filter
- table view with a node-level filter
- column view with a name filter
- column view with a tag and glossary filter
Run a Single Scenario
The benchmark supports selecting scenarios with system properties:
mvn -pl openmetadata-integration-tests -P mysql-elasticsearch \
-Dit.test=LineageImpactAnalysisBenchmarkIT \
'-Djunit.jupiter.conditions.deactivate=*' \
-Dlineage.benchmark.scenarios=depth12-width120 \
-Dlineage.benchmark.warmupRuns=1 \
-Dlineage.benchmark.measuredRuns=3 \
-DfailIfNoTests=false \
verify
Available scenario names:
depth12-width120depth12-width240depth12-width600depth24-width120
This path is heavier than the Python benchmark because it creates the topology before measuring it. Increase Docker memory and CPU before running the larger scenarios.