OpenMetadata/docs/csv-relation-types-plan.md
Sriharsha Chintalapani 6d99ba2dc0
Glossary relations (#25886)
* Glossary Term Relations

* Add GlossaryTerm Relations

* Add GlossaryTerm Relations, Add custom relations, onotolgoy explorer

* Add Translations

* Update generated TypeScript types

* Address comments

* Address comments

* Address comments

* Update generated TypeScript types

* Update yarn.lock after merging cytoscape dependencies from glossary_relations

* fix zoom in and out functionality and added missing translate keys

* fix test

* Remove unwanted changes

* nit

* nit

* nit

* Remove conflict test

* nit

* fix test

* Add test for ontology explorer

* New yarn lock and 2.0.0 schema changes missed during merge conflicts

* Revamped glossary term relation settings

* Refactor code

* Addressed comments

* nit

* Update generated TypeScript types

* Java Checkstyle and Yarn lock

* Update generated TypeScript types

* fix unit test

* Remove 2.0.0 migration folders placed at wrong loc

* Merge main

* fix navigation to relation graph in glossary

* fix ontology explorer spec

* Added filter support in the data mode

* Fix glossary term relation CI failures

### Canonical Relation Storage (GlossaryTermRepository)

* Introduced `computeCanonicalRelationType()` to normalize relation direction
  using UUID ordering (lower UUID is always treated as "from")
* Prevents duplicate and inconsistent relation rows when created from either side
* Updated `setTermRelations()` and `addRelation()` to store canonical relation types
* Fixed `setFields()` read logic:

  * Invert relation type for `fromRecords` (entity is the TO side)
  * Keep `toRecords` unchanged
* Updated `deleteBidirectionalRelatedTo()` to match canonical storage format
* Added `RequestEntityCache.invalidate()` after relation mutations to ensure consistency

### Lazy RDF Resource Initialization

* Added `RdfRepository.getInstanceOrNull()` for null-safe access without throwing
* Refactored `RdfResource` constructor to avoid eager `RdfRepository.getInstance()` call
* Enabled resource registration even when Fuseki is not initialized
* Introduced lazy getters:

  * `getRdfRepository()`
  * `getSemanticSearchEngine()`
* Updated all endpoints to guard with null checks before `isEnabled()`

  * Return `503 Service Unavailable` when RDF is not ready

### Graceful Test Degradation (Fuseki-dependent tests)

* Added `TestSuiteBootstrap.isFusekiEnabled()` to detect Fuseki availability
* `GlossaryOntologyExportIT`:

  * Falls back to Testcontainers-based local Fuseki when bootstrap Fuseki is unavailable
* `GlossaryTermRelationIT`:

  * Skipped via `assumeTrue` when Fuseki is unavailable
* `MetricResourceIT`:

  * Skips RDF-specific tests when Fuseki is unavailable

* fix package conflicts

* nit

* Fix merge conflicts, Python test, RDF reliability, and VectorDocBuilder tests

- Fix Python test_patch_glossary_term_related_terms to use TermRelation
  instead of EntityReferenceList (schema changed relatedTerms type)
- Rewrite VectorDocBuilder tests for current buildEmbeddingFields API
- Improve JenaFusekiStorage retry logic to retry on all HTTP errors
- Increase Fuseki tmpfs size to prevent disk space exhaustion in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix pycheck

* Address all 8 PR review findings

1. Add authorization check on getTermRelationGraph endpoint
2. Add null guard on getBaseUri() to prevent NPE
3. Add React key prop on RelatedTermTagButton in map renders
4. Mark RdfResource lazy-init fields as volatile for thread safety
5. Replace exception messages with generic errors in API responses
6. Unify DEFAULT_RELATION_TYPES between CSV and repository (10 types)
7. Add jitter backoff to deadlock retry in CollectionDAO
8. Replace N+1 queries in prefetchGraphTerms with batch fetch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix Fuseki tmpfs exhaustion and GlossaryTermRelationIT double init

- Remove tmpfs size limit on Fuseki container to prevent disk exhaustion
- Guard RdfUpdater.initialize() in GlossaryTermRelationIT to skip if
  already initialized by bootstrap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix duplicate edges, null term NPE, and silent exception in graph builder

- Deduplicate edges in buildGraph() using edgesSeen set
- Skip TermRelation entries with null term references to prevent NPE
- Add warning log when glossary term relation settings fail to load

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix cardinality count after canonical swap and double-checked locking

- getRelationCount now matches inverse relation type for fromRecords
  where the term is the target, fixing cardinality bypass after
  bidirectional UUID canonicalization
- Use double-checked locking in RdfResource.getSemanticSearchEngine()
  to prevent duplicate instance creation under concurrency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anuj-kumary <anujf0510@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-03-18 10:51:03 +05:30

8.8 KiB

CSV Import/Export Enhancement for Glossary Term Relations

Problem Statement

Currently, the glossary CSV import/export only captures related term FQNs without the relation type:

  • Export: Only exports FQNs like Glossary.Term1;Glossary.Term2
  • Import: Hardcodes all relations to "relatedTo"

This causes data loss when:

  1. A term has synonym, broader, narrower, or custom relation types
  2. CSV is exported and re-imported - all relation types become "relatedTo"

Proposed Solution

New CSV Format

Format: relationType:termFQN pairs separated by semicolons

Examples:

# New format with relation types
relatedTerms
synonym:Finance.Revenue;broader:Finance.Income;narrower:Finance.Net Revenue

# Backward compatible - no prefix defaults to "relatedTo"
relatedTerms
Finance.Revenue;Finance.Income

# Mixed format (new and legacy)
relatedTerms
synonym:Finance.Revenue;Finance.Income;broader:Finance.Gross Income

Parsing Rules

  1. If a value contains : and the part before : is a valid relation type → use that relation type
  2. If no : or the prefix is not a valid relation type → default to "relatedTo"
  3. Valid relation types are determined by checking glossaryTermRelationSettings or using defaults

Default Relation Types

Relation Type Description
relatedTo Generic related term (default)
synonym Equivalent term
broader More general term
narrower More specific term
antonym Opposite meaning
partOf Component of
hasPart Contains

Implementation Plan

Phase 1: Backend Changes

1.1 CsvUtil.java - Export Enhancement

File: openmetadata-service/src/main/java/org/openmetadata/csv/CsvUtil.java

Current (line 253-263):

public static List<String> addTermRelations(
    List<String> csvRecord, List<TermRelation> termRelations) {
  csvRecord.add(
      nullOrEmpty(termRelations)
          ? null
          : termRelations.stream()
              .map(tr -> tr.getTerm().getFullyQualifiedName())
              .sorted()
              .collect(Collectors.joining(FIELD_SEPARATOR)));
  return csvRecord;
}

New:

public static List<String> addTermRelations(
    List<String> csvRecord, List<TermRelation> termRelations) {
  csvRecord.add(
      nullOrEmpty(termRelations)
          ? null
          : termRelations.stream()
              .map(tr -> {
                String relationType = tr.getRelationType();
                String fqn = tr.getTerm().getFullyQualifiedName();
                // Only include relation type prefix if not the default "relatedTo"
                if (relationType != null && !relationType.equals("relatedTo")) {
                  return relationType + ":" + fqn;
                }
                return fqn;
              })
              .sorted()
              .collect(Collectors.joining(FIELD_SEPARATOR)));
  return csvRecord;
}

1.2 GlossaryRepository.java - Import Enhancement

File: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryRepository.java

Current (line 315-327):

private List<TermRelation> getTermRelationsFromCsv(
    CSVPrinter printer, CSVRecord csvRecord, int fieldNumber) throws IOException {
  List<EntityReference> entityRefs =
      getEntityReferences(printer, csvRecord, fieldNumber, GLOSSARY_TERM);
  if (entityRefs == null) {
    return null;
  }
  List<TermRelation> termRelations = new ArrayList<>();
  for (EntityReference ref : entityRefs) {
    termRelations.add(new TermRelation().withTerm(ref).withRelationType("relatedTo"));
  }
  return termRelations;
}

New:

private static final Set<String> VALID_RELATION_TYPES = Set.of(
    "relatedTo", "synonym", "broader", "narrower", "antonym", "partOf", "hasPart"
);

private List<TermRelation> getTermRelationsFromCsv(
    CSVPrinter printer, CSVRecord csvRecord, int fieldNumber) throws IOException {
  String fieldValue = csvRecord.get(fieldNumber);
  if (nullOrEmpty(fieldValue)) {
    return null;
  }

  List<TermRelation> termRelations = new ArrayList<>();
  String[] entries = fieldValue.split(FIELD_SEPARATOR);

  for (String entry : entries) {
    String relationType = "relatedTo"; // Default
    String termFqn = entry.trim();

    // Check for relationType:fqn format
    int colonIndex = entry.indexOf(':');
    if (colonIndex > 0) {
      String prefix = entry.substring(0, colonIndex).trim();
      String suffix = entry.substring(colonIndex + 1).trim();

      // Validate if prefix is a known relation type
      if (VALID_RELATION_TYPES.contains(prefix) || isCustomRelationType(prefix)) {
        relationType = prefix;
        termFqn = suffix;
      }
      // If prefix is not a valid relation type, treat entire string as FQN
      // (handles FQNs that contain colons like "Database:Schema.Table")
    }

    EntityReference termRef = getEntityReference(printer, csvRecord, GLOSSARY_TERM, termFqn);
    if (termRef != null) {
      termRelations.add(new TermRelation().withTerm(termRef).withRelationType(relationType));
    }
  }

  return termRelations.isEmpty() ? null : termRelations;
}

private boolean isCustomRelationType(String relationType) {
  // Check against glossaryTermRelationSettings for custom relation types
  try {
    // Fetch from settings cache or use default list
    return false; // Implement based on settings lookup
  } catch (Exception e) {
    return false;
  }
}

1.3 Documentation Update

File: openmetadata-service/src/main/resources/json/data/glossary/glossaryCsvDocumentation.json

Update the relatedTerms field documentation:

{
  "name": "relatedTerms",
  "required": false,
  "description": "Related glossary terms with optional relation types. Format: 'relationType:FQN' or just 'FQN'. Multiple values separated by ';'. Valid relation types: relatedTo (default), synonym, broader, narrower, antonym, partOf, hasPart. Example: 'synonym:Glossary.Term1;broader:Glossary.Term2;Glossary.Term3'",
  "examples": [
    "Glossary.Term1;Glossary.Term2",
    "synonym:Glossary.Term1;broader:Glossary.Term2",
    "synonym:Glossary.Revenue;Glossary.Income;narrower:Glossary.Net Revenue"
  ]
}

Phase 2: Testing

2.1 Unit Tests

File: openmetadata-service/src/test/java/org/openmetadata/csv/CsvUtilTest.java

@Test
void testAddTermRelationsWithRelationType() {
  // Test that relation types are included in export
}

@Test
void testAddTermRelationsDefaultRelationType() {
  // Test that "relatedTo" terms don't include prefix
}

2.2 Integration Tests

File: openmetadata-service/src/test/java/org/openmetadata/service/resources/glossary/GlossaryTermResourceTest.java

@Test
void testGlossaryTermCsvImportWithRelationTypes() {
  // Test importing CSV with relation type prefixes
}

@Test
void testGlossaryTermCsvExportWithRelationTypes() {
  // Test exporting terms with various relation types
}

@Test
void testGlossaryTermCsvBackwardCompatibility() {
  // Test importing old format CSV (no relation types)
}

@Test
void testGlossaryTermCsvRoundTripWithRelationTypes() {
  // Test that export -> import preserves relation types
}

Phase 3: Edge Cases

  1. FQN contains colon: Handle cases like Database:Schema.Term by validating the prefix against known relation types
  2. Invalid relation type: If prefix is not a valid relation type, treat entire string as FQN with default relatedTo
  3. Empty relation type: ":Glossary.Term" should default to relatedTo
  4. Custom relation types: Check against glossaryTermRelationSettings for user-defined relation types

Backward Compatibility

CSV Format Import Behavior
Glossary.Term1;Glossary.Term2 All relations → relatedTo
synonym:Glossary.Term1;Glossary.Term2 First → synonym, Second → relatedTo
synonym:Glossary.Term1;broader:Glossary.Term2 Preserves both relation types

Files to Modify

File Change
CsvUtil.java Update addTermRelations() to include relation type prefix
GlossaryRepository.java Update getTermRelationsFromCsv() to parse relation types
glossaryCsvDocumentation.json Update field documentation and examples
GlossaryTermResourceTest.java Add tests for new format
CsvUtilTest.java Add unit tests for parsing

Migration Notes

  • No database migration needed: The database already stores relation types correctly
  • Existing CSVs: Will continue to work (all imported as relatedTo)
  • New exports: Will include relation type prefixes for non-default relations

Summary

This enhancement:

  1. Preserves relation types during CSV export/import
  2. Maintains backward compatibility with existing CSVs
  3. Defaults to relatedTo when no relation type specified
  4. Follows existing OpenMetadata CSV patterns (type:value)
  5. Supports custom relation types via settings