OpenMetadata

mirror of https://github.com/open-metadata/OpenMetadata synced 2026-05-24 09:39:11 +00:00

Author	SHA1	Message	Date
Eugenio	483461a003	Add migrations to ensure PII are really enabled (#27921 ) This is especially needed for instances that had already upgraded to 1.12.0 onwards, those instaces skipped the migration cherry-picked in 1.12.6	2026-05-08 15:39:29 +00:00
sonika-shah	52548550e8	fix migration: update legacy relatedTerms in glossaryTerm version history after the glossary term realtion changes (#27770 ) * fix: strip stale relatedTerms from glossary term version snapshots Extends PR #26586. That fix cleaned glossary_term_entity but not the version snapshots in entity_extension, so GET /versions/{v} still 500s on any pre-1.13 term whose relatedTerms had legacy shape: UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation, has only "term" and "relationType") Predicate matches only legacy snapshots — first item has bare `id` (EntityReference) instead of `term` (TermRelation). Skips correctly- shaped snapshots written on 1.13+. Stripping is safe: relatedTerms is loaded from entity_relationship at read time post-#25886. * v1130: transform legacy relatedTerms in version snapshots instead of stripping Replace the SQL UPDATE that stripped relatedTerms from entity_extension version snapshots with a Java migration that wraps each legacy EntityReference[] item as TermRelation[] (term + relationType="relatedTo"). Version reads deserialize entity_extension JSON directly without rehydrating from entity_relationship, so a strip would lose history per version. The transform preserves it. Designed for tables with millions of rows: keyset paginated by PK (id, extension), batched updates, idempotent on re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mysql): remove leftover entity_extension strip in v1130 post-migration The previous edit added the comment pointer above the legacy UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block without removing it. On MySQL that SQL would have stripped relatedTerms from version snapshots BEFORE the Java transform runs, defeating the migration and losing related-term history. Postgres was already correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 03:35:45 +00:00
Eugenio	88c44502ae	feat: Add auto-classification support for storage service containers (#26495 ) Some checks failed Java Checkstyle / java-checkstyle (push) Waiting to run Details Maven Collate Tests / maven-collate-ci (push) Waiting to run Details OpenMetadata Service Unit Tests / Detect Changes (push) Waiting to run Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (mysql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests (postgresql) (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / k8s_operator-unit-tests (push) Blocked by required conditions Details OpenMetadata Service Unit Tests / openmetadata-service-unit-tests-status (push) Blocked by required conditions Details Integration Tests - MySQL + Elasticsearch / Detect Changes (push) Has been cancelled Details Integration Tests - PostgreSQL + OpenSearch / Detect Changes (push) Has been cancelled Details Publish Package to Maven Central Repository / publish-maven-packages (push) Has been cancelled Details Integration Tests - PostgreSQL + OpenSearch / integration-tests-postgres-opensearch (push) Has been cancelled Details Integration Tests - MySQL + Elasticsearch / integration-tests-mysql-elasticsearch (push) Has been cancelled Details * Add schema support for container auto-classification Extend container entity schema to support sample data storage, enabling PII detection and classification workflows on storage service containers. Changes: - Add sampleData field to container.json for storing sample data - Create storageServiceAutoClassificationPipeline.json schema defining configuration for storage service auto-classification pipelines - Update workflow.json to include StorageServiceAutoClassificationPipeline as a supported pipeline type This provides the schema foundation for running auto-classification workflows on S3, GCS, and other storage service containers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add backend support for container sample data and classification Implement Java backend functionality to handle sample data ingestion, storage, and PII masking for container entities. Changes: - ContainerRepository: Add sample data retrieval and storage operations - EntityRepository: Extend sample data support to container entities - ContainerResource: Add REST endpoint for container sample data ingestion - PIIMasker: Extend PII masking to support container entities This enables the backend to process and store sample data from storage service containers and apply PII masking rules during data retrieval. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Extend classifiable entity types to include containers Add Container to the ClassifiableEntityType union, enabling PII detection and auto-classification workflows to process storage service containers alongside database tables. Changes: - Update ClassifiableEntityType from Table-only to Union[Table, Container] - Import Container entity type - Update module docstring to reflect current support This type extension allows the PII processor to handle both database tables and storage containers uniformly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add container sample data ingestion to OpenMetadata API Implement container-specific API mixin for sample data operations and integrate it into the main OpenMetadata client. Changes: - Add OMetaContainerMixin with ingest_container_sample_data method - Handle binary data encoding (base64) and serialization errors - Register mixin in OpenMetadata class hierarchy - Mirror table sample data ingestion patterns for consistency This provides the Python API layer for ingesting sample data from storage service containers into OpenMetadata. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Implement storage service samplers for S3 and GCS Add sampler implementations for storage services to extract sample data from structured containers (Parquet, CSV) for auto-classification. Changes: - Create base StorageSamplerInterface for storage service sampling - Implement S3Sampler for AWS S3 containers with structured file support - Implement GCSSampler for Google Cloud Storage containers - Support column extraction and data sampling for structured formats - Handle dataModel-based column definitions from containers Storage samplers read container metadata, fetch file contents, and generate sample datasets for downstream PII detection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update PII processor to support container entities Extend the base PII processor to handle both Table and Container entities with unified column extraction logic. Changes: - Add _get_entity_columns helper to extract columns from Table or Container - Handle Container entities with optional dataModel.columns structure - Improve column matching with safe fallback for missing columns - Use generic entity reference in error reporting - Add early return when entity has no columns to process This enables PII detection to run on storage containers the same way it processes database tables. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add storage service support to sampler processor Extend the sampler processor to handle both database and storage service entities with appropriate sampler class selection. Changes: - Detect service type from source config (Database vs Storage) - Import StorageServiceAutoClassificationPipeline - Handle both Table and Container entity types in _run method - Add column validation for Container entities (via dataModel.columns) - Create storage-specific sampler interfaces for S3 and GCS - Update sampler_interface to support Container entities - Improve error messages with entity type context The processor now dynamically selects database or storage samplers based on the pipeline configuration type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add storage fetcher strategy for container classification Implement fetcher strategy pattern for storage services to retrieve containers for auto-classification workflows. Changes: - Add StorageFetcherStrategy to handle storage service entity fetching - Update EntityFetcher to select appropriate strategy based on service type - Support both DatabaseService and StorageService in strategy selection - Import StorageService type for service detection - Improve error messages with specific service type information The fetcher now dynamically creates database or storage-specific strategies to retrieve entities based on pipeline configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Register auto-classification pipeline in storage service specs Add AutoClassification pipeline support to S3 and GCS storage service specifications, enabling UI and workflow registration. Changes: - Add AutoClassification to S3ServiceSpec supported pipelines - Add AutoClassification to GCSServiceSpec supported pipelines - Import StorageServiceAutoClassificationPipeline in both specs This registers the auto-classification workflow type for storage services in the ingestion framework's service registry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add container support to metadata sink and patch operations Extend metadata sink and patch mixin to handle container entities, enabling sample data ingestion and tag updates for containers. Changes: - Add Container to MetadataRestSink entity type handling - Implement container sample data ingestion in sink._run - Add Container to PatchMixin tag operations - Import Container entity type in both modules This completes the metadata ingestion pipeline by allowing the sink to persist sample data and classification tags for container entities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update classification workflow for storage service support Extend the auto-classification workflow to handle both database and storage service pipelines with unified step orchestration. Changes: - Import StorageServiceAutoClassificationPipeline - Add type checking for both Database and Storage pipeline configs - Remove unnecessary cast, use direct type checks - Add validation warning for unsupported config types - Preserve enableAutoClassification flag behavior for both types The workflow now supports running PII detection and classification on both database tables and storage containers based on config type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add unit tests for container classification components Add test coverage for container-specific fetcher and sampler components. Changes: - Add test_container_fetcher.py for StorageFetcherStrategy tests - Add test_container_sampler_processor.py for container sampler tests Tests validate: - Storage service fetcher strategy selection and instantiation - Container sampler processor initialization and execution - Proper handling of Container entities vs Table entities 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Reorganize integration tests by entity type Restructure auto-classification integration tests into separate directories for databases and containers to improve organization. Changes: - Move database classification tests to databases/ subdirectory - Move conftest.py, init.sql, and test_tag_processor.py into databases/ - Container tests already organized in containers/ subdirectory - Remove old flat test structure This organization makes it clearer which tests target database entities vs storage container entities in classification workflows. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Properly retrieve sample data * Update generated TypeScript types * Apply Gitar bot * Fix tests * feat: Add supportsProfiler to storage connection schemas Add supportsProfiler field to storage connection schemas (S3, GCS, ADLS, Custom Storage) to enable auto-classification pipeline support for storage services. This aligns with the backend changes in PR #26495 that added container auto-classification functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add UI support for storage service auto-classification - Update IngestionWorkflowUtils to route storage services to storage-specific auto-classification schema - Modify getSupportedPipelineTypes to filter pipeline types based on service category (storage services only show AutoClassification, not Profiler) - Update AddIngestionButton to pass serviceCategory parameter - Add unit test to verify storage services only get AutoClassification option This enables users to configure and run auto-classification agents on storage services (S3, GCS, ADLS) for PII detection on containers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Add BucketArn field to S3BucketResponse model AWS S3 API now returns a BucketArn field in list_buckets() responses. Add this optional field to prevent Pydantic extra_forbidden validation errors. Error: BucketArn Extra inputs are not permitted [type=extra_forbidden] 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Add Container permissions to AutoClassificationBotPolicy Add Container entity permissions to AutoClassificationBotPolicy to allow the autoClassification-bot to apply tags and sample data to storage containers. Previously, the bot only had permissions for Table entities, causing permission denied errors when running auto-classification on storage services. Changes: - Add Container rule with EditAll and ViewAll operations to policy seed data - Create migrations for MySQL and PostgreSQL to update existing installations Error fixed: Principal: CatalogPrincipal{name='autoclassification-bot'} operations [EditTags] not allowed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update generated TypeScript types * fix: Add fallback for storage service type detection in sampler Add fallback logic to detect storage services by source type name when the pipeline config type check fails. This handles cases where the Airflow environment might not have the updated schema/package with StorageServiceAutoClassificationPipeline. Changes: - Add fallback detection for s3, gcs, azuredatalake, customstorage - Add debug logging for service type detection - Preserve primary instanceof check for proper type detection This fixes the "No module named 'metadata.ingestion.source.database.gcs'" error when running storage auto-classification pipelines. * Guide to support new entities in classification agent * docs: Update auto-classification guide with debugging learnings Add critical troubleshooting information discovered during container classification debugging: 1. storeSampleData defaults to false - Sample data NOT ingested unless explicitly enabled - Document why this is by design (avoid large datasets) - Add troubleshooting steps to verify flag is set 2. Service type detection fallback pattern - Explain why fallback is needed (Airflow package caching) - Show complete implementation with source type lists - Add debug logging pattern 3. Troubleshooting section - Sample data not appearing: check storeSampleData, database, logs - Module import errors: service type detection issues - PII tags not applied: config and data issues 4. Common pitfalls additions - Emphasize storeSampleData default value - Service type detection in cached environments These updates reflect real debugging scenarios and will help future developers avoid the same issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Apply gitar bot suggestions * Fix suggestions, linting, and SonarCloud issues * More gitar bot suggestions * Fix compile error * Fix linting * Fix broken tests * Fix unorganized import * Improve config parsing This is so that we rightly discover polymorphic properties of `source` when the config does not provide enough fields for Pydantic to correctly discriminate between models (e.g: confusing database source config with storage source config) * Gitar bot comment * Fix s3 source test * Apply comments from reviews * Extract cantidate column logic in samplers * Fix tests * Fix container customization test --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-04-24 06:29:16 -07:00
sonika-shah	077982c348	Move ontology/glossary relation migration from 1.14.0 back to 1.13.0 (#27431 ) * Move ontology/glossary relation migration from 1.14.0 back to 1.13.0 Ontology feature will ship in 1.13.0, not 1.14.0. Move the glossary term relation migrations (relationType backfill, settings insert, stale relatedTerms strip, conceptMappings backfill) back to the 1.13.0 postDataMigrationSQLScript for both MySQL and PostgreSQL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Restore empty 1.14.0 SQL migration files for Java migration framework The V114 MigrationUtil.java package requires the 1.14.0 migration directory to exist with SQL files for the migration to be picked up. Keep them as empty files (matching convention of other versions with no post-data SQL). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add schemaChanges.sql and comment all 1.14.0 SQL migration files Add both schemaChanges.sql and postDataMigrationSQLScript.sql for mysql and postgres with a comment explaining the directory is required for the V114 Java migrations to be picked up by the migration framework. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix missing trailing newline in postgres postDataMigrationSQLScript Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * address feedback --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>	2026-04-16 16:45:01 +00:00
Rajdeep Singh	5e1416447f	fix(sampler): Respect randomizedSample flag at 100% percentage sampling (#26966 ) * fix(sampler): respect randomizedSample flag at 100% percentage sampling When profileSample is 100% with PERCENTAGE type, the sampler short-circuits and returns the raw dataset without any randomization, even when randomizedSample is True (the default). Split the combined condition so: - No profileSample set -> return raw dataset (no sampling configured) - 100% PERCENTAGE + randomizedSample=False -> return raw dataset (optimization) - 100% PERCENTAGE + randomizedSample=True -> go through normal sampling path which applies RandomNumFn/df.sample for proper row shuffling Fixes #21304 * Address review: use 'is False' for Optional[bool] and add unit tests - Fix randomizedSample check from 'not' to 'is False' in both SQASampler and DatalakeSampler to correctly handle None (Optional[bool] default=True) - Add unit tests verifying 100%% PERCENTAGE behavior for randomizedSample values True, False, and None * Add ORDER BY on random column in fetch_sample_data for true randomization The get_dataset() fix ensures 100% PERCENTAGE + randomizedSample routes through get_sample_query() which produces a CTE with a random column. Now fetch_sample_data() detects that column and applies ORDER BY before LIMIT, so each call returns a different subset of rows. Also add real-DB integration tests using SQLite for the 100% PERCENTAGE edge case (True, False, None). * Address review: remove stale comment, unused import, add return assertions * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Address review: move ORDER BY to get_sample_query, clean up fetch_sample_data - Move ORDER BY rnd.c.random into get_sample_query() PERCENTAGE branch, gated on randomizedSample is not False (mirrors ABSOLUTE branch pattern) - Revert fetch_sample_data() to original: remove ds_columns variable, random_column detection, and ORDER BY logic (ordering now handled in CTE) - Remove duplicate assertions in DatalakeSampler100Pct tests * Address review: None defaults to False for randomizedSample Per TeddyCr's feedback, randomization is computationally heavy and should not be the default. Changed from 'is False'/'is not False' to truthiness checks so None (unset) behaves the same as False. Only explicit randomizedSample=True triggers ORDER BY and skips the 100% fast path. This is consistent with the ABSOLUTE branch which already uses truthiness checks. * Fix integration test: None should skip sample_query (matches truthiness semantics) * fix(tests): update BigQuery view sampling expected queries with ORDER BY BigQuery views fall through to SQASampler.get_sample_query() which now adds ORDER BY rnd.random when randomizedSample is enabled. Update the expected SQL strings in test_sampling_for_views and test_sampling_view_with_partition to match. * refactor: use explicit is False for randomizedSample checks Address review comments: SampleConfig.randomizedSample defaults to True, so only an explicit False should disable randomization. Using is False / is not False instead of truthiness ensures None follows the model default (enabled) rather than being incorrectly treated as disabled. * ci: re-trigger checks after SIGSEGV flake * refactor: only explicit True randomizes, add non-determinism tests * test: increase non-determinism iterations to reduce flakiness * chore: added randomize as false * fix: align randomizedSample defaults with schema (false) * fix: remove ORDER BY from BigQuery test expectations BigQuery sampling tests create SampleConfig without setting randomizedSample, which now defaults to False. Since ORDER BY is only added when randomizedSample is True, the expected query strings should not include ORDER BY. Also fix inaccurate docstring in test_sample.py. * test: increase non-determinism test iterations to reduce flakiness Increase fetch_sample_data loop from 10 to 20 iterations to further reduce the theoretical probability of a false failure in the randomized ordering test. --------- Co-authored-by: Teddy <teddy.crepineau@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-14 10:28:54 -07:00
sonika-shah	733921f510	Fix: align glossary term relation type colors with design system (#27142 ) * Fix: align glossary term relation type colors with design system System-defined relation types (relatedTo, synonym, antonym, etc.) were initialized with old Ant Design palette colors (#1890ff, #722ed1, …) while the frontend RELATION_META constants had been updated to the new design system colors (#1570ef, #b42318, …). Because renderColorBadge used record.color (from the backend) unconditionally, the stale Ant Design colors were always displayed instead of the intended ones. - Frontend: renderColorBadge now treats RELATION_META as authoritative for system-defined types so the correct design-system color is always shown, regardless of what color value is stored in the backend. - Backend (SettingsCache.java): default colors updated for new installs. - DB migration (2.0.0): postDataMigrationSQLScript added for MySQL and PostgreSQL to update colors in existing deployments without touching user-added custom relation types. - Tests: unit tests for renderColorBadge color-resolution logic; integration test asserting all ten system-defined types return the expected hex values from the API. Fixes #openmetadata/OpenMetadata * Remove dev-only MySQL 2.0.0 migration script * Remove dev-only PostgreSQL 2.0.0 migration script * Fix: align glossary term relation settings colors and remove duplicate 1.13.0 migration; Remove glossary term relation migrations mistakenly re-added in 1.13.0 and update relation type colors in the 1.14.0 migration INSERT to use design system tokens instead of old Ant Design colors. * fix lint * add more test * address feedback * fix prettier formatting in test file Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * remove GlossaryTermRelationSettings test file from branch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 11:03:35 +00:00
Suman Maharana	a06b7e74cc	Chore: Remove iceberg standalone connector (#26365 ) * Chore: Remove iceberg standalone connector * add migration scripts * Update generated TypeScript types * py_format * address comments * Addressed changes * add tests * migrate to custom database * fix tests * fix tests * fix migrations * hard delete exising ingestion pipelines for iceberg * Update generated TypeScript types * Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/services/ingestionPipelines/ingestionPipeline.ts * Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/automations/workflow.ts * Delete openmetadata-ui/src/main/resources/ui/src/generated/api/automations/createWorkflow.ts * Delete openmetadata-ui/src/main/resources/ui/src/generated/api/services/ingestionPipelines/createIngestionPipeline.ts * Delete openmetadata-ui/src/main/resources/ui/src/generated/api/services/createDatabaseService.ts * Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/automations/testServiceConnection.ts * Update generated TypeScript types * Update bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-02 14:55:23 +00:00
Ram Narayan Balaji	10cf2f9ea0	Move ontology/glossary relation migration from 1.13.0 to 1.14.0 (#26755 ) The glossary term relation migration (relationType backfill, default glossaryTermRelationSettings insert, relatedTerms cleanup, conceptMappings backfill) was accidentally placed in the 1.13.0 migration scripts. This commit moves it to the correct 1.14.0 slot, restoring 1.13.0 to its original content (computeMetrics profiler pipeline cleanup only). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-25 14:53:10 +05:30
sonika-shah	aff1343643	fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter (#26586 ) * fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter Pre-1.13.0, relatedTerms was stored as EntityReference[] directly in the glossary_term_entity JSON column. PR #25886 changed relatedTerms to TermRelation[] and moved storage to entity_relationship table, but missed adding a migration to clean up the old EntityReference data still present in existing rows. When listAfter() deserializes the entity JSON, Jackson fails with: UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation) The existing migration already backfilled entity_relationship rows with relationType="relatedTo", so stripping relatedTerms from entity JSON is safe — the data is already in entity_relationship and will be loaded from there. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter Pre-1.13.0, relatedTerms was stored as EntityReference[] directly in the glossary_term_entity JSON column. PR #25886 changed relatedTerms to TermRelation[] and moved storage to entity_relationship table, but missed adding a migration to clean up the old EntityReference data still present in existing rows. When listAfter() deserializes the entity JSON, Jackson fails with: UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation) The existing migration already backfilled entity_relationship rows with relationType="relatedTo", so stripping relatedTerms from entity JSON is safe — the data is already in entity_relationship and will be loaded from there. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>	2026-03-20 10:29:26 +05:30
Sriharsha Chintalapani	6d99ba2dc0	Glossary relations (#25886 ) * Glossary Term Relations * Add GlossaryTerm Relations * Add GlossaryTerm Relations, Add custom relations, onotolgoy explorer * Add Translations * Update generated TypeScript types * Address comments * Address comments * Address comments * Update generated TypeScript types * Update yarn.lock after merging cytoscape dependencies from glossary_relations * fix zoom in and out functionality and added missing translate keys * fix test * Remove unwanted changes * nit * nit * nit * Remove conflict test * nit * fix test * Add test for ontology explorer * New yarn lock and 2.0.0 schema changes missed during merge conflicts * Revamped glossary term relation settings * Refactor code * Addressed comments * nit * Update generated TypeScript types * Java Checkstyle and Yarn lock * Update generated TypeScript types * fix unit test * Remove 2.0.0 migration folders placed at wrong loc * Merge main * fix navigation to relation graph in glossary * fix ontology explorer spec * Added filter support in the data mode * Fix glossary term relation CI failures ### Canonical Relation Storage (GlossaryTermRepository) * Introduced `computeCanonicalRelationType()` to normalize relation direction using UUID ordering (lower UUID is always treated as "from") * Prevents duplicate and inconsistent relation rows when created from either side * Updated `setTermRelations()` and `addRelation()` to store canonical relation types * Fixed `setFields()` read logic: * Invert relation type for `fromRecords` (entity is the TO side) * Keep `toRecords` unchanged * Updated `deleteBidirectionalRelatedTo()` to match canonical storage format * Added `RequestEntityCache.invalidate()` after relation mutations to ensure consistency ### Lazy RDF Resource Initialization * Added `RdfRepository.getInstanceOrNull()` for null-safe access without throwing * Refactored `RdfResource` constructor to avoid eager `RdfRepository.getInstance()` call * Enabled resource registration even when Fuseki is not initialized * Introduced lazy getters: * `getRdfRepository()` * `getSemanticSearchEngine()` * Updated all endpoints to guard with null checks before `isEnabled()` * Return `503 Service Unavailable` when RDF is not ready ### Graceful Test Degradation (Fuseki-dependent tests) * Added `TestSuiteBootstrap.isFusekiEnabled()` to detect Fuseki availability * `GlossaryOntologyExportIT`: * Falls back to Testcontainers-based local Fuseki when bootstrap Fuseki is unavailable * `GlossaryTermRelationIT`: * Skipped via `assumeTrue` when Fuseki is unavailable * `MetricResourceIT`: * Skips RDF-specific tests when Fuseki is unavailable * fix package conflicts * nit * Fix merge conflicts, Python test, RDF reliability, and VectorDocBuilder tests - Fix Python test_patch_glossary_term_related_terms to use TermRelation instead of EntityReferenceList (schema changed relatedTerms type) - Rewrite VectorDocBuilder tests for current buildEmbeddingFields API - Improve JenaFusekiStorage retry logic to retry on all HTTP errors - Increase Fuseki tmpfs size to prevent disk space exhaustion in tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix pycheck * Address all 8 PR review findings 1. Add authorization check on getTermRelationGraph endpoint 2. Add null guard on getBaseUri() to prevent NPE 3. Add React key prop on RelatedTermTagButton in map renders 4. Mark RdfResource lazy-init fields as volatile for thread safety 5. Replace exception messages with generic errors in API responses 6. Unify DEFAULT_RELATION_TYPES between CSV and repository (10 types) 7. Add jitter backoff to deadlock retry in CollectionDAO 8. Replace N+1 queries in prefetchGraphTerms with batch fetch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix Fuseki tmpfs exhaustion and GlossaryTermRelationIT double init - Remove tmpfs size limit on Fuseki container to prevent disk exhaustion - Guard RdfUpdater.initialize() in GlossaryTermRelationIT to skip if already initialized by bootstrap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix duplicate edges, null term NPE, and silent exception in graph builder - Deduplicate edges in buildGraph() using edgesSeen set - Skip TermRelation entries with null term references to prevent NPE - Add warning log when glossary term relation settings fail to load Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix cardinality count after canonical swap and double-checked locking - getRelationCount now matches inverse relation type for fromRecords where the term is the target, fixing cardinality bypass after bidirectional UUID canonicalization - Use double-checked locking in RdfResource.getSemanticSearchEngine() to prevent duplicate instance creation under concurrency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: anuj-kumary <anujf0510@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com> Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>	2026-03-18 10:51:03 +05:30
Teddy	40bf82f604	Minor move 20 migrations (#26236 ) * FIX - Redshift converter (#26229) (cherry picked from commit `ce8e1e5b5b`) * chore: move 2.0 migration to 1.13.0 --------- Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>	2026-03-05 08:11:15 -08:00

11 commits