# CSV Import/Export Enhancement for Glossary Term Relations ## Problem Statement Currently, the glossary CSV import/export only captures related term FQNs without the relation type: - **Export**: Only exports FQNs like `Glossary.Term1;Glossary.Term2` - **Import**: Hardcodes all relations to `"relatedTo"` This causes data loss when: 1. A term has `synonym`, `broader`, `narrower`, or custom relation types 2. CSV is exported and re-imported - all relation types become `"relatedTo"` ## Proposed Solution ### New CSV Format **Format**: `relationType:termFQN` pairs separated by semicolons **Examples**: ```csv # New format with relation types relatedTerms synonym:Finance.Revenue;broader:Finance.Income;narrower:Finance.Net Revenue # Backward compatible - no prefix defaults to "relatedTo" relatedTerms Finance.Revenue;Finance.Income # Mixed format (new and legacy) relatedTerms synonym:Finance.Revenue;Finance.Income;broader:Finance.Gross Income ``` ### Parsing Rules 1. If a value contains `:` and the part before `:` is a valid relation type → use that relation type 2. If no `:` or the prefix is not a valid relation type → default to `"relatedTo"` 3. Valid relation types are determined by checking `glossaryTermRelationSettings` or using defaults ### Default Relation Types | Relation Type | Description | |---------------|-------------| | `relatedTo` | Generic related term (default) | | `synonym` | Equivalent term | | `broader` | More general term | | `narrower` | More specific term | | `antonym` | Opposite meaning | | `partOf` | Component of | | `hasPart` | Contains | ## Implementation Plan ### Phase 1: Backend Changes #### 1.1 CsvUtil.java - Export Enhancement **File**: `openmetadata-service/src/main/java/org/openmetadata/csv/CsvUtil.java` **Current** (line 253-263): ```java public static List addTermRelations( List csvRecord, List termRelations) { csvRecord.add( nullOrEmpty(termRelations) ? null : termRelations.stream() .map(tr -> tr.getTerm().getFullyQualifiedName()) .sorted() .collect(Collectors.joining(FIELD_SEPARATOR))); return csvRecord; } ``` **New**: ```java public static List addTermRelations( List csvRecord, List termRelations) { csvRecord.add( nullOrEmpty(termRelations) ? null : termRelations.stream() .map(tr -> { String relationType = tr.getRelationType(); String fqn = tr.getTerm().getFullyQualifiedName(); // Only include relation type prefix if not the default "relatedTo" if (relationType != null && !relationType.equals("relatedTo")) { return relationType + ":" + fqn; } return fqn; }) .sorted() .collect(Collectors.joining(FIELD_SEPARATOR))); return csvRecord; } ``` #### 1.2 GlossaryRepository.java - Import Enhancement **File**: `openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryRepository.java` **Current** (line 315-327): ```java private List getTermRelationsFromCsv( CSVPrinter printer, CSVRecord csvRecord, int fieldNumber) throws IOException { List entityRefs = getEntityReferences(printer, csvRecord, fieldNumber, GLOSSARY_TERM); if (entityRefs == null) { return null; } List termRelations = new ArrayList<>(); for (EntityReference ref : entityRefs) { termRelations.add(new TermRelation().withTerm(ref).withRelationType("relatedTo")); } return termRelations; } ``` **New**: ```java private static final Set VALID_RELATION_TYPES = Set.of( "relatedTo", "synonym", "broader", "narrower", "antonym", "partOf", "hasPart" ); private List getTermRelationsFromCsv( CSVPrinter printer, CSVRecord csvRecord, int fieldNumber) throws IOException { String fieldValue = csvRecord.get(fieldNumber); if (nullOrEmpty(fieldValue)) { return null; } List termRelations = new ArrayList<>(); String[] entries = fieldValue.split(FIELD_SEPARATOR); for (String entry : entries) { String relationType = "relatedTo"; // Default String termFqn = entry.trim(); // Check for relationType:fqn format int colonIndex = entry.indexOf(':'); if (colonIndex > 0) { String prefix = entry.substring(0, colonIndex).trim(); String suffix = entry.substring(colonIndex + 1).trim(); // Validate if prefix is a known relation type if (VALID_RELATION_TYPES.contains(prefix) || isCustomRelationType(prefix)) { relationType = prefix; termFqn = suffix; } // If prefix is not a valid relation type, treat entire string as FQN // (handles FQNs that contain colons like "Database:Schema.Table") } EntityReference termRef = getEntityReference(printer, csvRecord, GLOSSARY_TERM, termFqn); if (termRef != null) { termRelations.add(new TermRelation().withTerm(termRef).withRelationType(relationType)); } } return termRelations.isEmpty() ? null : termRelations; } private boolean isCustomRelationType(String relationType) { // Check against glossaryTermRelationSettings for custom relation types try { // Fetch from settings cache or use default list return false; // Implement based on settings lookup } catch (Exception e) { return false; } } ``` #### 1.3 Documentation Update **File**: `openmetadata-service/src/main/resources/json/data/glossary/glossaryCsvDocumentation.json` Update the `relatedTerms` field documentation: ```json { "name": "relatedTerms", "required": false, "description": "Related glossary terms with optional relation types. Format: 'relationType:FQN' or just 'FQN'. Multiple values separated by ';'. Valid relation types: relatedTo (default), synonym, broader, narrower, antonym, partOf, hasPart. Example: 'synonym:Glossary.Term1;broader:Glossary.Term2;Glossary.Term3'", "examples": [ "Glossary.Term1;Glossary.Term2", "synonym:Glossary.Term1;broader:Glossary.Term2", "synonym:Glossary.Revenue;Glossary.Income;narrower:Glossary.Net Revenue" ] } ``` ### Phase 2: Testing #### 2.1 Unit Tests **File**: `openmetadata-service/src/test/java/org/openmetadata/csv/CsvUtilTest.java` ```java @Test void testAddTermRelationsWithRelationType() { // Test that relation types are included in export } @Test void testAddTermRelationsDefaultRelationType() { // Test that "relatedTo" terms don't include prefix } ``` #### 2.2 Integration Tests **File**: `openmetadata-service/src/test/java/org/openmetadata/service/resources/glossary/GlossaryTermResourceTest.java` ```java @Test void testGlossaryTermCsvImportWithRelationTypes() { // Test importing CSV with relation type prefixes } @Test void testGlossaryTermCsvExportWithRelationTypes() { // Test exporting terms with various relation types } @Test void testGlossaryTermCsvBackwardCompatibility() { // Test importing old format CSV (no relation types) } @Test void testGlossaryTermCsvRoundTripWithRelationTypes() { // Test that export -> import preserves relation types } ``` ### Phase 3: Edge Cases 1. **FQN contains colon**: Handle cases like `Database:Schema.Term` by validating the prefix against known relation types 2. **Invalid relation type**: If prefix is not a valid relation type, treat entire string as FQN with default `relatedTo` 3. **Empty relation type**: `":Glossary.Term"` should default to `relatedTo` 4. **Custom relation types**: Check against `glossaryTermRelationSettings` for user-defined relation types ### Backward Compatibility | CSV Format | Import Behavior | |------------|----------------| | `Glossary.Term1;Glossary.Term2` | All relations → `relatedTo` | | `synonym:Glossary.Term1;Glossary.Term2` | First → `synonym`, Second → `relatedTo` | | `synonym:Glossary.Term1;broader:Glossary.Term2` | Preserves both relation types | ### Files to Modify | File | Change | |------|--------| | `CsvUtil.java` | Update `addTermRelations()` to include relation type prefix | | `GlossaryRepository.java` | Update `getTermRelationsFromCsv()` to parse relation types | | `glossaryCsvDocumentation.json` | Update field documentation and examples | | `GlossaryTermResourceTest.java` | Add tests for new format | | `CsvUtilTest.java` | Add unit tests for parsing | ### Migration Notes - **No database migration needed**: The database already stores relation types correctly - **Existing CSVs**: Will continue to work (all imported as `relatedTo`) - **New exports**: Will include relation type prefixes for non-default relations ## Summary This enhancement: 1. ✅ Preserves relation types during CSV export/import 2. ✅ Maintains backward compatibility with existing CSVs 3. ✅ Defaults to `relatedTo` when no relation type specified 4. ✅ Follows existing OpenMetadata CSV patterns (`type:value`) 5. ✅ Supports custom relation types via settings