mirror of https://github.com/open-metadata/OpenMetadata synced 2026-05-24 09:39:11 +00:00

History

Sriharsha Chintalapani 8cec97b52c Containers: FQN-driven hierarchy listings + cascade-delete orphan fix (#27878 ) * Containers: FQN-driven hierarchy listings + cascade-delete orphan fix Stops `?root=true&service=...` and `/containers/.../children` from leaking deeply-nested orphans, fixes the source bug that produced them, and corrects the 1.13.0 fqnHash pattern index opclass. Listing path - ListFilter.getFqnPrefixCondition now binds both <param>Hash and <param>HashChild ('<hash>.%' and '<hash>.%.%') so depth-aware listings can require "exactly one segment below the prefix" via a single LIKE + NOT LIKE pair on fqnHash. Same shape works at any tree depth. - ContainerDAO.listRoot{Before,After,Count} swap the NOT EXISTS anti-join on entity_relationship for fqnHash NOT LIKE :serviceHashChild. The FQN is the canonical hierarchy in OpenMetadata; the relationship table is no longer consulted for hierarchical listings. - ContainerRepository.listChildren rewritten: no parent-by-name lookup, no findToWithOffset/countFindTo on entity_relationship, no second-hop hydration. Single SQL roundtrip + slim projection via listDirectChildSummariesByParentHash. Orphans whose parent CONTAINS row is missing are now correctly placed under their FQN-implied parent. - Both endpoints honour ?include=non-deleted\|all\|deleted; ChildrenPageCache key includes the include tag so toggling the UI Deleted switch doesn't return a stale page from the other side. - ContainerResource.listChildren accepts ?include= for parity with the root listing. Cascade-delete orphan source (EntityRepository.processDeletionBatch) - Removed the redundant pre-batch-delete of relationships and the swallow-all try/catch in the per-child loop. cleanup() per entity now owns row removal AND relationship deletion atomically; exceptions propagate so the loop stops on first failure with per-child atomicity. Stops the orphan-without-relationships pattern that the listing change defends against. Migration correction (1.13.0 postgres fqnHash pattern indexes) - Recreate 23 idx__fqnhash_pattern indexes with text_pattern_ops instead of varchar_pattern_ops. The planner casts the column to text when the LIKE RHS is text-typed (every JDBC setString call), so varchar_pattern_ops doesn't match the resulting (fqnhash)::text ~~ expression. Confirmed via EXPLAIN ANALYZE on a 580k-row table: the same query drops from ~470ms cold (Parallel Seq Scan) to <1ms (Index Scan). Tests - ListFilterTest: 3 unit tests covering both binds, dotted/quoted service name special-char handling, and include= flowing through alongside the service prefix. - ContainerResourceIT: 8 integration tests covering depth correctness at every level (5-level chain), orphan exclusion at root, orphan discoverability under FQN-implied parent, sibling subtree isolation, the include toggle on both endpoints, and large-batch hard-delete leaving no orphan rows or relationships. Closes #27870 (subset of its listing-side intent shipped here as a single FQN-depth predicate; PR's cascade fix and both new tests picked up verbatim). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Address review comments on #27878 - ContainerDAO.listRoot* override now defaults :serviceHashChild to '%.%.%' via rootListingParams() when ?service= is absent. Previous code unconditionally referenced the bind, so ?root=true without a service filter crashed at runtime with a missing-named-parameter error. - Migration 1.13.0/postgres/schemaChanges.sql now DROP INDEX CONCURRENTLY IF EXISTS before each CREATE so already-upgraded environments (which have the original varchar_pattern_ops indexes) get the index recreated with text_pattern_ops on next deploy. Fresh installs see the DROP as a no-op. Comment block updated to record the recreate intent. - ChildrenPageCache include tag for ALL changed from "all" to "a" so the CacheKeys.childrenPage Javadoc's "1-2 char" promise holds (now nd/a/d are all <=2 chars). - ContainerRepository.includeToBindString Javadoc corrected: it described the SQL as a CASE expression, but listDirectChildSummariesByParentHash actually uses a three-branch OR chain. - ListFilterTest: added test_noServiceFilter_doesNotBindServicePatterns as a regression guard for the missing-bind bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix java style * Address second review pass on #27878 - EntityRepository.processDeletionBatch wraps per-child cleanup exceptions with entityType + entityId context before re-throwing. The exception still propagates (so the loop still stops, failure-semantics contract unchanged); operators now get a stack trace that names the row that blocked a large recursive delete instead of an opaque error. - CacheKeys.childrenPage Javadoc now lists the actual include tags ("nd" / "a" / "d") and points at ChildrenPageCache.includeTag as the authoritative source. Earlier comment still mentioned "all" after the switch to single-letter tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Test: ?root=true without service filter end-to-end (#27878 review) Adds test_rootListing_withoutServiceFilter_returnsRootsAcrossAllServices to ContainerResourceIT. Creates two distinct storage services, each with a root container and a child container, then asserts that GET /containers?root=true (no service filter): - Succeeds (rootListingParams() defaults :serviceHashChild to '%.%.%' so the SQL has its bind even when ListFilter.getServiceCondition didn't add it). - Includes root containers from both services (cross-service listing works without a service prefix narrowing the candidate set). - Excludes child containers from either service (depth check still applied via the default bind). Regression guard for the bug Copilot's review pass flagged at CollectionDAO.java:784: 'GET /containers?root=true (no service) crashes at runtime due to a missing named parameter.' Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Use generated name column instead of JSON extract in container summary queries storage_container_entity has 'name' as a STORED generated column derived from json->>'name' (see bootstrap/sql/schema/postgres.sql). Both slim projection queries (findContainerSummaryRows and listDirectChildSummariesByParentHash) were redundantly extracting it via JSON_UNQUOTE(JSON_EXTRACT(...)) on MySQL and json->>'name' on Postgres — work the database had already done at insert time. Reading 'name' as a column directly: - Saves one JSON op per row on every page fetch - Lets ORDER BY name sort on the indexed generated column rather than a per-row JSON-extracted expression displayName, fullyQualifiedName, and description stay as JSON extracts — they aren't generated columns. (description in particular shouldn't be: free-text fields can be many KB and a STORED generated column would double the row size on disk.) Row mapper unchanged — column labels in the SELECT list still match. * Fix inaccurate ListFilterTest comment and Javadoc link to private method ListFilterTest: the prefix-pattern comment said the LIKE patterns 'exclude' direct/grandchildren — patterns themselves match, the SQL's NOT LIKE is what excludes. Rewrote to show how ContainerDAO.listRoot* combines LIKE and NOT LIKE on the two binds. CacheKeys.childrenPage: the @link pointed at ChildrenPageCache#includeTag which is private static; Javadoc tooling renders that as an unresolved link. Redirected to the public Include enum the tag is derived from. * Log original exception in recursive batch delete catch before wrapping Wrapping the caught RuntimeException into a new one (with entity context in the message) preserves the original via the cause chain, but the outer exception mapper sees the wrapper and renders a generic 500 — the original type information doesn't surface to operators investigating a failed delete. Adds a LOG.error before the wrap so the original exception (with full type and stack) lands in the logs adjacent to the entity context, giving operators enough signal to diagnose what actually blocked the delete. * Restore failure-semantics comment block on recursive batch delete wrap * use Entity.SEPARATOR instead of hard-coding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix check style --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>		2026-05-04 18:44:42 +05:30
..
src/test	Containers: FQN-driven hierarchy listings + cascade-delete orphan fix (#27878 )	2026-05-04 18:44:42 +05:30
K8S_TESTS.md	Code cleanup based on IDE flagged warnings (#26808 )	2026-03-27 06:17:01 -07:00
pom.xml	Fix flaky IT tests: incident-id tracking + GlossaryOntologyExportIT isolation (#27867 )	2026-05-02 17:50:11 -07:00
README.md	Faster tests (#24948 )	2025-12-26 23:47:49 -08:00
TEST_MIGRATION_TRACKER.md	Faster tests (#24948 )	2025-12-26 23:47:49 -08:00

README.md

OpenMetadata Integration Tests

This module contains SDK-based integration tests that run against a real OpenMetadata server using Testcontainers. Tests execute in parallel and are isolated using TestNamespace.

Quick Start

# Run all tests with MySQL + Elasticsearch (default)
mvn test -pl :openmetadata-integration-tests

# Run with PostgreSQL + OpenSearch
mvn test -pl :openmetadata-integration-tests -Ppostgres-opensearch

# Run a specific test
mvn test -pl :openmetadata-integration-tests -Dtest="TableResourceIT"

Available Profiles

Profile	Database	Search Engine
`mysql-elasticsearch` (default)	MySQL 8.3.0	Elasticsearch 8.11.4
`postgres-opensearch`	PostgreSQL 15	OpenSearch 2.19.0
`postgres-elasticsearch`	PostgreSQL 15	Elasticsearch 8.11.4
`mysql-opensearch`	MySQL 8.3.0	OpenSearch 2.19.0

Writing a New Integration Test

1. Create the Test Class

Extend BaseEntityIT for entity CRUD tests or create a standalone test class:

package org.openmetadata.it.tests;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertNotNull;

import org.junit.jupiter.api.Test;
import org.openmetadata.it.util.TestNamespace;
import org.openmetadata.schema.api.data.CreateTable;
import org.openmetadata.schema.entity.data.Table;

public class MyFeatureIT extends BaseEntityIT<Table, CreateTable> {

  @Override
  protected String getEntityType() {
    return "table";
  }

  @Override
  protected Table createEntity(CreateTable request) {
    return SdkClients.adminClient().tables().create(request);
  }

  @Override
  protected CreateTable createRequest(String name, TestNamespace ns) {
    return new CreateTable()
        .withName(name)
        .withDatabaseSchema(SharedEntities.getSchema().getFullyQualifiedName())
        .withColumns(List.of(new Column().withName("id").withDataType(ColumnDataType.INT)));
  }

  @Test
  void myCustomTest(TestNamespace ns) throws Exception {
    CreateTable request = createRequest(ns.prefix("myTable"), ns);
    Table table = createEntity(request);

    assertNotNull(table.getId());
    assertEquals(request.getName(), table.getName());
  }
}

2. Key Concepts

TestNamespace

Every test method receives a TestNamespace parameter that provides unique prefixes for entity names:

@Test
void myTest(TestNamespace ns) {
  String uniqueName = ns.prefix("myEntity");  // e.g., "abc123_myEntity"
}

This ensures tests don't conflict when running in parallel.

SdkClients

Get pre-configured SDK clients for different users:

OpenMetadataClient adminClient = SdkClients.adminClient();
OpenMetadataClient user1Client = SdkClients.user1Client();
OpenMetadataClient botClient = SdkClients.botClient();

SharedEntities

Access pre-created entities for tests:

DatabaseService service = SharedEntities.getService();
Database database = SharedEntities.getDatabase();
DatabaseSchema schema = SharedEntities.getSchema();
User adminUser = SharedEntities.getAdminUser();

3. BaseEntityIT Features

When extending BaseEntityIT, you get these tests automatically:

Test	Description
`post_entityCreate_200`	Create entity successfully
`get_entity_200_OK`	Get entity by ID
`get_entityByName_200`	Get entity by FQN
`get_entityNotFound_404`	Get non-existent entity
`put_entityCreate_200`	Create via PUT
`patch_entityAttributes_200`	Patch entity attributes
`delete_entityAsAdmin_200`	Delete entity
`get_entityListWithPagination_200`	List with pagination
`test_sdkCRUDOperations`	Full CRUD via SDK
... and 30+ more

4. Controlling Test Behavior

Use flags to customize which inherited tests run:

public class MyEntityIT extends BaseEntityIT<MyEntity, CreateMyEntity> {
  {
    supportsPatch = true;          // Enable PATCH tests
    supportsTags = true;           // Enable tag tests
    supportsOwner = true;          // Enable owner tests
    supportsSearchIndex = true;    // Enable search tests
    supportsDomains = true;        // Enable domain tests
  }
}

5. Best Practices

Use TestNamespace.prefix() for all entity names to ensure uniqueness
Don't clean up entities - TestNamespace isolation handles this
Use specific imports - No wildcard imports (import static ....*)
Keep tests independent - Don't rely on order of execution
Use Awaitility for async operations - Not Thread.sleep()

Awaitility.await()
    .atMost(Duration.ofSeconds(30))
    .pollInterval(Duration.ofMillis(500))
    .until(() -> someCondition());

Avoid single-line comments - Write self-documenting code

Project Structure

openmetadata-integration-tests/
├── src/test/java/org/openmetadata/it/
│   ├── auth/           # JWT token generation
│   ├── env/            # Test infrastructure (TestSuiteBootstrap)
│   ├── factories/      # Entity factory classes
│   ├── tests/          # Integration test classes
│   └── util/           # Utilities (SdkClients, TestNamespace)
└── src/test/resources/
    ├── openmetadata-secure-test.yaml  # Test config
    └── *.der                          # JWT keys

Test Infrastructure

Tests use TestSuiteBootstrap (a JUnit LauncherSessionListener) that:

Starts database container (MySQL or PostgreSQL)
Starts search container (Elasticsearch or OpenSearch)
Starts Fuseki SPARQL container (for RDF tests)
Starts the OpenMetadata application
Initializes SharedEntities

All containers are started once per test run and shared across all tests.

Running in CI

GitHub workflows run these tests on every PR:

integration-tests-mysql-elasticsearch.yml - MySQL + Elasticsearch
integration-tests-postgres-opensearch.yml - PostgreSQL + OpenSearch

Tests require the "safe to test" label on PRs.