mirror of
https://github.com/open-metadata/OpenMetadata
synced 2026-05-24 09:39:11 +00:00
276 lines
8.8 KiB
Markdown
276 lines
8.8 KiB
Markdown
|
|
# CSV Import/Export Enhancement for Glossary Term Relations
|
||
|
|
|
||
|
|
## Problem Statement
|
||
|
|
|
||
|
|
Currently, the glossary CSV import/export only captures related term FQNs without the relation type:
|
||
|
|
- **Export**: Only exports FQNs like `Glossary.Term1;Glossary.Term2`
|
||
|
|
- **Import**: Hardcodes all relations to `"relatedTo"`
|
||
|
|
|
||
|
|
This causes data loss when:
|
||
|
|
1. A term has `synonym`, `broader`, `narrower`, or custom relation types
|
||
|
|
2. CSV is exported and re-imported - all relation types become `"relatedTo"`
|
||
|
|
|
||
|
|
## Proposed Solution
|
||
|
|
|
||
|
|
### New CSV Format
|
||
|
|
|
||
|
|
**Format**: `relationType:termFQN` pairs separated by semicolons
|
||
|
|
|
||
|
|
**Examples**:
|
||
|
|
```csv
|
||
|
|
# New format with relation types
|
||
|
|
relatedTerms
|
||
|
|
synonym:Finance.Revenue;broader:Finance.Income;narrower:Finance.Net Revenue
|
||
|
|
|
||
|
|
# Backward compatible - no prefix defaults to "relatedTo"
|
||
|
|
relatedTerms
|
||
|
|
Finance.Revenue;Finance.Income
|
||
|
|
|
||
|
|
# Mixed format (new and legacy)
|
||
|
|
relatedTerms
|
||
|
|
synonym:Finance.Revenue;Finance.Income;broader:Finance.Gross Income
|
||
|
|
```
|
||
|
|
|
||
|
|
### Parsing Rules
|
||
|
|
|
||
|
|
1. If a value contains `:` and the part before `:` is a valid relation type → use that relation type
|
||
|
|
2. If no `:` or the prefix is not a valid relation type → default to `"relatedTo"`
|
||
|
|
3. Valid relation types are determined by checking `glossaryTermRelationSettings` or using defaults
|
||
|
|
|
||
|
|
### Default Relation Types
|
||
|
|
|
||
|
|
| Relation Type | Description |
|
||
|
|
|---------------|-------------|
|
||
|
|
| `relatedTo` | Generic related term (default) |
|
||
|
|
| `synonym` | Equivalent term |
|
||
|
|
| `broader` | More general term |
|
||
|
|
| `narrower` | More specific term |
|
||
|
|
| `antonym` | Opposite meaning |
|
||
|
|
| `partOf` | Component of |
|
||
|
|
| `hasPart` | Contains |
|
||
|
|
|
||
|
|
## Implementation Plan
|
||
|
|
|
||
|
|
### Phase 1: Backend Changes
|
||
|
|
|
||
|
|
#### 1.1 CsvUtil.java - Export Enhancement
|
||
|
|
|
||
|
|
**File**: `openmetadata-service/src/main/java/org/openmetadata/csv/CsvUtil.java`
|
||
|
|
|
||
|
|
**Current** (line 253-263):
|
||
|
|
```java
|
||
|
|
public static List<String> addTermRelations(
|
||
|
|
List<String> csvRecord, List<TermRelation> termRelations) {
|
||
|
|
csvRecord.add(
|
||
|
|
nullOrEmpty(termRelations)
|
||
|
|
? null
|
||
|
|
: termRelations.stream()
|
||
|
|
.map(tr -> tr.getTerm().getFullyQualifiedName())
|
||
|
|
.sorted()
|
||
|
|
.collect(Collectors.joining(FIELD_SEPARATOR)));
|
||
|
|
return csvRecord;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**New**:
|
||
|
|
```java
|
||
|
|
public static List<String> addTermRelations(
|
||
|
|
List<String> csvRecord, List<TermRelation> termRelations) {
|
||
|
|
csvRecord.add(
|
||
|
|
nullOrEmpty(termRelations)
|
||
|
|
? null
|
||
|
|
: termRelations.stream()
|
||
|
|
.map(tr -> {
|
||
|
|
String relationType = tr.getRelationType();
|
||
|
|
String fqn = tr.getTerm().getFullyQualifiedName();
|
||
|
|
// Only include relation type prefix if not the default "relatedTo"
|
||
|
|
if (relationType != null && !relationType.equals("relatedTo")) {
|
||
|
|
return relationType + ":" + fqn;
|
||
|
|
}
|
||
|
|
return fqn;
|
||
|
|
})
|
||
|
|
.sorted()
|
||
|
|
.collect(Collectors.joining(FIELD_SEPARATOR)));
|
||
|
|
return csvRecord;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 1.2 GlossaryRepository.java - Import Enhancement
|
||
|
|
|
||
|
|
**File**: `openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/GlossaryRepository.java`
|
||
|
|
|
||
|
|
**Current** (line 315-327):
|
||
|
|
```java
|
||
|
|
private List<TermRelation> getTermRelationsFromCsv(
|
||
|
|
CSVPrinter printer, CSVRecord csvRecord, int fieldNumber) throws IOException {
|
||
|
|
List<EntityReference> entityRefs =
|
||
|
|
getEntityReferences(printer, csvRecord, fieldNumber, GLOSSARY_TERM);
|
||
|
|
if (entityRefs == null) {
|
||
|
|
return null;
|
||
|
|
}
|
||
|
|
List<TermRelation> termRelations = new ArrayList<>();
|
||
|
|
for (EntityReference ref : entityRefs) {
|
||
|
|
termRelations.add(new TermRelation().withTerm(ref).withRelationType("relatedTo"));
|
||
|
|
}
|
||
|
|
return termRelations;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**New**:
|
||
|
|
```java
|
||
|
|
private static final Set<String> VALID_RELATION_TYPES = Set.of(
|
||
|
|
"relatedTo", "synonym", "broader", "narrower", "antonym", "partOf", "hasPart"
|
||
|
|
);
|
||
|
|
|
||
|
|
private List<TermRelation> getTermRelationsFromCsv(
|
||
|
|
CSVPrinter printer, CSVRecord csvRecord, int fieldNumber) throws IOException {
|
||
|
|
String fieldValue = csvRecord.get(fieldNumber);
|
||
|
|
if (nullOrEmpty(fieldValue)) {
|
||
|
|
return null;
|
||
|
|
}
|
||
|
|
|
||
|
|
List<TermRelation> termRelations = new ArrayList<>();
|
||
|
|
String[] entries = fieldValue.split(FIELD_SEPARATOR);
|
||
|
|
|
||
|
|
for (String entry : entries) {
|
||
|
|
String relationType = "relatedTo"; // Default
|
||
|
|
String termFqn = entry.trim();
|
||
|
|
|
||
|
|
// Check for relationType:fqn format
|
||
|
|
int colonIndex = entry.indexOf(':');
|
||
|
|
if (colonIndex > 0) {
|
||
|
|
String prefix = entry.substring(0, colonIndex).trim();
|
||
|
|
String suffix = entry.substring(colonIndex + 1).trim();
|
||
|
|
|
||
|
|
// Validate if prefix is a known relation type
|
||
|
|
if (VALID_RELATION_TYPES.contains(prefix) || isCustomRelationType(prefix)) {
|
||
|
|
relationType = prefix;
|
||
|
|
termFqn = suffix;
|
||
|
|
}
|
||
|
|
// If prefix is not a valid relation type, treat entire string as FQN
|
||
|
|
// (handles FQNs that contain colons like "Database:Schema.Table")
|
||
|
|
}
|
||
|
|
|
||
|
|
EntityReference termRef = getEntityReference(printer, csvRecord, GLOSSARY_TERM, termFqn);
|
||
|
|
if (termRef != null) {
|
||
|
|
termRelations.add(new TermRelation().withTerm(termRef).withRelationType(relationType));
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
return termRelations.isEmpty() ? null : termRelations;
|
||
|
|
}
|
||
|
|
|
||
|
|
private boolean isCustomRelationType(String relationType) {
|
||
|
|
// Check against glossaryTermRelationSettings for custom relation types
|
||
|
|
try {
|
||
|
|
// Fetch from settings cache or use default list
|
||
|
|
return false; // Implement based on settings lookup
|
||
|
|
} catch (Exception e) {
|
||
|
|
return false;
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 1.3 Documentation Update
|
||
|
|
|
||
|
|
**File**: `openmetadata-service/src/main/resources/json/data/glossary/glossaryCsvDocumentation.json`
|
||
|
|
|
||
|
|
Update the `relatedTerms` field documentation:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"name": "relatedTerms",
|
||
|
|
"required": false,
|
||
|
|
"description": "Related glossary terms with optional relation types. Format: 'relationType:FQN' or just 'FQN'. Multiple values separated by ';'. Valid relation types: relatedTo (default), synonym, broader, narrower, antonym, partOf, hasPart. Example: 'synonym:Glossary.Term1;broader:Glossary.Term2;Glossary.Term3'",
|
||
|
|
"examples": [
|
||
|
|
"Glossary.Term1;Glossary.Term2",
|
||
|
|
"synonym:Glossary.Term1;broader:Glossary.Term2",
|
||
|
|
"synonym:Glossary.Revenue;Glossary.Income;narrower:Glossary.Net Revenue"
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Phase 2: Testing
|
||
|
|
|
||
|
|
#### 2.1 Unit Tests
|
||
|
|
|
||
|
|
**File**: `openmetadata-service/src/test/java/org/openmetadata/csv/CsvUtilTest.java`
|
||
|
|
|
||
|
|
```java
|
||
|
|
@Test
|
||
|
|
void testAddTermRelationsWithRelationType() {
|
||
|
|
// Test that relation types are included in export
|
||
|
|
}
|
||
|
|
|
||
|
|
@Test
|
||
|
|
void testAddTermRelationsDefaultRelationType() {
|
||
|
|
// Test that "relatedTo" terms don't include prefix
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 2.2 Integration Tests
|
||
|
|
|
||
|
|
**File**: `openmetadata-service/src/test/java/org/openmetadata/service/resources/glossary/GlossaryTermResourceTest.java`
|
||
|
|
|
||
|
|
```java
|
||
|
|
@Test
|
||
|
|
void testGlossaryTermCsvImportWithRelationTypes() {
|
||
|
|
// Test importing CSV with relation type prefixes
|
||
|
|
}
|
||
|
|
|
||
|
|
@Test
|
||
|
|
void testGlossaryTermCsvExportWithRelationTypes() {
|
||
|
|
// Test exporting terms with various relation types
|
||
|
|
}
|
||
|
|
|
||
|
|
@Test
|
||
|
|
void testGlossaryTermCsvBackwardCompatibility() {
|
||
|
|
// Test importing old format CSV (no relation types)
|
||
|
|
}
|
||
|
|
|
||
|
|
@Test
|
||
|
|
void testGlossaryTermCsvRoundTripWithRelationTypes() {
|
||
|
|
// Test that export -> import preserves relation types
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Phase 3: Edge Cases
|
||
|
|
|
||
|
|
1. **FQN contains colon**: Handle cases like `Database:Schema.Term` by validating the prefix against known relation types
|
||
|
|
2. **Invalid relation type**: If prefix is not a valid relation type, treat entire string as FQN with default `relatedTo`
|
||
|
|
3. **Empty relation type**: `":Glossary.Term"` should default to `relatedTo`
|
||
|
|
4. **Custom relation types**: Check against `glossaryTermRelationSettings` for user-defined relation types
|
||
|
|
|
||
|
|
### Backward Compatibility
|
||
|
|
|
||
|
|
| CSV Format | Import Behavior |
|
||
|
|
|------------|----------------|
|
||
|
|
| `Glossary.Term1;Glossary.Term2` | All relations → `relatedTo` |
|
||
|
|
| `synonym:Glossary.Term1;Glossary.Term2` | First → `synonym`, Second → `relatedTo` |
|
||
|
|
| `synonym:Glossary.Term1;broader:Glossary.Term2` | Preserves both relation types |
|
||
|
|
|
||
|
|
### Files to Modify
|
||
|
|
|
||
|
|
| File | Change |
|
||
|
|
|------|--------|
|
||
|
|
| `CsvUtil.java` | Update `addTermRelations()` to include relation type prefix |
|
||
|
|
| `GlossaryRepository.java` | Update `getTermRelationsFromCsv()` to parse relation types |
|
||
|
|
| `glossaryCsvDocumentation.json` | Update field documentation and examples |
|
||
|
|
| `GlossaryTermResourceTest.java` | Add tests for new format |
|
||
|
|
| `CsvUtilTest.java` | Add unit tests for parsing |
|
||
|
|
|
||
|
|
### Migration Notes
|
||
|
|
|
||
|
|
- **No database migration needed**: The database already stores relation types correctly
|
||
|
|
- **Existing CSVs**: Will continue to work (all imported as `relatedTo`)
|
||
|
|
- **New exports**: Will include relation type prefixes for non-default relations
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
This enhancement:
|
||
|
|
1. ✅ Preserves relation types during CSV export/import
|
||
|
|
2. ✅ Maintains backward compatibility with existing CSVs
|
||
|
|
3. ✅ Defaults to `relatedTo` when no relation type specified
|
||
|
|
4. ✅ Follows existing OpenMetadata CSV patterns (`type:value`)
|
||
|
|
5. ✅ Supports custom relation types via settings
|