Commit graph

7 commits

Author SHA1 Message Date
Rajdeep Singh
5e1416447f
fix(sampler): Respect randomizedSample flag at 100% percentage sampling (#26966)
* fix(sampler): respect randomizedSample flag at 100% percentage sampling

When profileSample is 100% with PERCENTAGE type, the sampler
short-circuits and returns the raw dataset without any randomization,
even when randomizedSample is True (the default).

Split the combined condition so:
- No profileSample set -> return raw dataset (no sampling configured)
- 100% PERCENTAGE + randomizedSample=False -> return raw dataset (optimization)
- 100% PERCENTAGE + randomizedSample=True -> go through normal sampling path
  which applies RandomNumFn/df.sample for proper row shuffling

Fixes #21304

* Address review: use 'is False' for Optional[bool] and add unit tests

- Fix randomizedSample check from 'not' to 'is False' in both SQASampler
  and DatalakeSampler to correctly handle None (Optional[bool] default=True)
- Add unit tests verifying 100%% PERCENTAGE behavior for randomizedSample
  values True, False, and None

* Add ORDER BY on random column in fetch_sample_data for true randomization

The get_dataset() fix ensures 100% PERCENTAGE + randomizedSample routes
through get_sample_query() which produces a CTE with a random column.
Now fetch_sample_data() detects that column and applies ORDER BY before
LIMIT, so each call returns a different subset of rows.

Also add real-DB integration tests using SQLite for the 100% PERCENTAGE
edge case (True, False, None).

* Address review: remove stale comment, unused import, add return assertions

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review: move ORDER BY to get_sample_query, clean up fetch_sample_data

- Move ORDER BY rnd.c.random into get_sample_query() PERCENTAGE branch,
  gated on randomizedSample is not False (mirrors ABSOLUTE branch pattern)
- Revert fetch_sample_data() to original: remove ds_columns variable,
  random_column detection, and ORDER BY logic (ordering now handled in CTE)
- Remove duplicate assertions in DatalakeSampler100Pct tests

* Address review: None defaults to False for randomizedSample

Per TeddyCr's feedback, randomization is computationally heavy and
should not be the default. Changed from 'is False'/'is not False' to
truthiness checks so None (unset) behaves the same as False.

Only explicit randomizedSample=True triggers ORDER BY and skips the
100% fast path. This is consistent with the ABSOLUTE branch which
already uses truthiness checks.

* Fix integration test: None should skip sample_query (matches truthiness semantics)

* fix(tests): update BigQuery view sampling expected queries with ORDER BY

BigQuery views fall through to SQASampler.get_sample_query() which now
adds ORDER BY rnd.random when randomizedSample is enabled. Update the
expected SQL strings in test_sampling_for_views and
test_sampling_view_with_partition to match.

* refactor: use explicit is False for randomizedSample checks

Address review comments: SampleConfig.randomizedSample defaults to True,
so only an explicit False should disable randomization. Using is False
/ is not False instead of truthiness ensures None follows the model
default (enabled) rather than being incorrectly treated as disabled.

* ci: re-trigger checks after SIGSEGV flake

* refactor: only explicit True randomizes, add non-determinism tests

* test: increase non-determinism iterations to reduce flakiness

* chore: added randomize as false

* fix: align randomizedSample defaults with schema (false)

* fix: remove ORDER BY from BigQuery test expectations

BigQuery sampling tests create SampleConfig without setting
randomizedSample, which now defaults to False. Since ORDER BY
is only added when randomizedSample is True, the expected query
strings should not include ORDER BY.

Also fix inaccurate docstring in test_sample.py.

* test: increase non-determinism test iterations to reduce flakiness

Increase fetch_sample_data loop from 10 to 20 iterations to further
reduce the theoretical probability of a false failure in the
randomized ordering test.

---------

Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-14 10:28:54 -07:00
sonika-shah
733921f510
Fix: align glossary term relation type colors with design system (#27142)
* Fix: align glossary term relation type colors with design system

System-defined relation types (relatedTo, synonym, antonym, etc.) were
initialized with old Ant Design palette colors (#1890ff, #722ed1, …) while
the frontend RELATION_META constants had been updated to the new design
system colors (#1570ef, #b42318, …). Because renderColorBadge used
record.color (from the backend) unconditionally, the stale Ant Design
colors were always displayed instead of the intended ones.

- Frontend: renderColorBadge now treats RELATION_META as authoritative for
  system-defined types so the correct design-system color is always shown,
  regardless of what color value is stored in the backend.
- Backend (SettingsCache.java): default colors updated for new installs.
- DB migration (2.0.0): postDataMigrationSQLScript added for MySQL and
  PostgreSQL to update colors in existing deployments without touching
  user-added custom relation types.
- Tests: unit tests for renderColorBadge color-resolution logic; integration
  test asserting all ten system-defined types return the expected hex values
  from the API.

Fixes #openmetadata/OpenMetadata

* Remove dev-only MySQL 2.0.0 migration script

* Remove dev-only PostgreSQL 2.0.0 migration script

* Fix: align glossary term relation settings colors and remove duplicate 1.13.0 migration; Remove glossary term relation migrations mistakenly re-added in 1.13.0 and update relation type colors in the 1.14.0 migration INSERT to use design system tokens instead of old Ant Design colors.

* fix lint

* add more test

* address feedback

* fix prettier formatting in test file

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* remove GlossaryTermRelationSettings test file from branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:03:35 +00:00
Suman Maharana
a06b7e74cc
Chore: Remove iceberg standalone connector (#26365)
* Chore: Remove iceberg standalone connector

* add migration scripts

* Update generated TypeScript types

* py_format

* address comments

* Addressed changes

* add tests

* migrate to custom database

* fix tests

* fix tests

* fix migrations

* hard delete exising ingestion pipelines for iceberg

* Update generated TypeScript types

* Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/services/ingestionPipelines/ingestionPipeline.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/automations/workflow.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/api/automations/createWorkflow.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/api/services/ingestionPipelines/createIngestionPipeline.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/api/services/createDatabaseService.ts

* Delete openmetadata-ui/src/main/resources/ui/src/generated/entity/automations/testServiceConnection.ts

* Update generated TypeScript types

* Update bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-02 14:55:23 +00:00
Ram Narayan Balaji
10cf2f9ea0
Move ontology/glossary relation migration from 1.13.0 to 1.14.0 (#26755)
The glossary term relation migration (relationType backfill, default
glossaryTermRelationSettings insert, relatedTerms cleanup, conceptMappings
backfill) was accidentally placed in the 1.13.0 migration scripts. This
commit moves it to the correct 1.14.0 slot, restoring 1.13.0 to its
original content (computeMetrics profiler pipeline cleanup only).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 14:53:10 +05:30
sonika-shah
aff1343643
fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter (#26586)
* fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter

Pre-1.13.0, relatedTerms was stored as EntityReference[] directly in the
glossary_term_entity JSON column. PR #25886 changed relatedTerms to TermRelation[]
and moved storage to entity_relationship table, but missed adding a migration to
clean up the old EntityReference data still present in existing rows.

When listAfter() deserializes the entity JSON, Jackson fails with:
  UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation)

The existing migration already backfilled entity_relationship rows with
relationType="relatedTo", so stripping relatedTerms from entity JSON is safe —
the data is already in entity_relationship and will be loaded from there.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* fix: strip stale relatedTerms from glossary_term_entity JSON to fix 500 on listAfter

Pre-1.13.0, relatedTerms was stored as EntityReference[] directly in the
glossary_term_entity JSON column. PR #25886 changed relatedTerms to TermRelation[]
and moved storage to entity_relationship table, but missed adding a migration to
clean up the old EntityReference data still present in existing rows.

When listAfter() deserializes the entity JSON, Jackson fails with:
  UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation)

The existing migration already backfilled entity_relationship rows with
relationType="relatedTo", so stripping relatedTerms from entity JSON is safe —
the data is already in entity_relationship and will be loaded from there.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-03-20 10:29:26 +05:30
Sriharsha Chintalapani
6d99ba2dc0
Glossary relations (#25886)
* Glossary Term Relations

* Add GlossaryTerm Relations

* Add GlossaryTerm Relations, Add custom relations, onotolgoy explorer

* Add Translations

* Update generated TypeScript types

* Address comments

* Address comments

* Address comments

* Update generated TypeScript types

* Update yarn.lock after merging cytoscape dependencies from glossary_relations

* fix zoom in and out functionality and added missing translate keys

* fix test

* Remove unwanted changes

* nit

* nit

* nit

* Remove conflict test

* nit

* fix test

* Add test for ontology explorer

* New yarn lock and 2.0.0 schema changes missed during merge conflicts

* Revamped glossary term relation settings

* Refactor code

* Addressed comments

* nit

* Update generated TypeScript types

* Java Checkstyle and Yarn lock

* Update generated TypeScript types

* fix unit test

* Remove 2.0.0 migration folders placed at wrong loc

* Merge main

* fix navigation to relation graph in glossary

* fix ontology explorer spec

* Added filter support in the data mode

* Fix glossary term relation CI failures

### Canonical Relation Storage (GlossaryTermRepository)

* Introduced `computeCanonicalRelationType()` to normalize relation direction
  using UUID ordering (lower UUID is always treated as "from")
* Prevents duplicate and inconsistent relation rows when created from either side
* Updated `setTermRelations()` and `addRelation()` to store canonical relation types
* Fixed `setFields()` read logic:

  * Invert relation type for `fromRecords` (entity is the TO side)
  * Keep `toRecords` unchanged
* Updated `deleteBidirectionalRelatedTo()` to match canonical storage format
* Added `RequestEntityCache.invalidate()` after relation mutations to ensure consistency

### Lazy RDF Resource Initialization

* Added `RdfRepository.getInstanceOrNull()` for null-safe access without throwing
* Refactored `RdfResource` constructor to avoid eager `RdfRepository.getInstance()` call
* Enabled resource registration even when Fuseki is not initialized
* Introduced lazy getters:

  * `getRdfRepository()`
  * `getSemanticSearchEngine()`
* Updated all endpoints to guard with null checks before `isEnabled()`

  * Return `503 Service Unavailable` when RDF is not ready

### Graceful Test Degradation (Fuseki-dependent tests)

* Added `TestSuiteBootstrap.isFusekiEnabled()` to detect Fuseki availability
* `GlossaryOntologyExportIT`:

  * Falls back to Testcontainers-based local Fuseki when bootstrap Fuseki is unavailable
* `GlossaryTermRelationIT`:

  * Skipped via `assumeTrue` when Fuseki is unavailable
* `MetricResourceIT`:

  * Skips RDF-specific tests when Fuseki is unavailable

* fix package conflicts

* nit

* Fix merge conflicts, Python test, RDF reliability, and VectorDocBuilder tests

- Fix Python test_patch_glossary_term_related_terms to use TermRelation
  instead of EntityReferenceList (schema changed relatedTerms type)
- Rewrite VectorDocBuilder tests for current buildEmbeddingFields API
- Improve JenaFusekiStorage retry logic to retry on all HTTP errors
- Increase Fuseki tmpfs size to prevent disk space exhaustion in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix pycheck

* Address all 8 PR review findings

1. Add authorization check on getTermRelationGraph endpoint
2. Add null guard on getBaseUri() to prevent NPE
3. Add React key prop on RelatedTermTagButton in map renders
4. Mark RdfResource lazy-init fields as volatile for thread safety
5. Replace exception messages with generic errors in API responses
6. Unify DEFAULT_RELATION_TYPES between CSV and repository (10 types)
7. Add jitter backoff to deadlock retry in CollectionDAO
8. Replace N+1 queries in prefetchGraphTerms with batch fetch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix Fuseki tmpfs exhaustion and GlossaryTermRelationIT double init

- Remove tmpfs size limit on Fuseki container to prevent disk exhaustion
- Guard RdfUpdater.initialize() in GlossaryTermRelationIT to skip if
  already initialized by bootstrap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix duplicate edges, null term NPE, and silent exception in graph builder

- Deduplicate edges in buildGraph() using edgesSeen set
- Skip TermRelation entries with null term references to prevent NPE
- Add warning log when glossary term relation settings fail to load

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix cardinality count after canonical swap and double-checked locking

- getRelationCount now matches inverse relation type for fromRecords
  where the term is the target, fixing cardinality bypass after
  bidirectional UUID canonicalization
- Use double-checked locking in RdfResource.getSemanticSearchEngine()
  to prevent duplicate instance creation under concurrency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anuj-kumary <anujf0510@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Ram Narayan Balaji <ramnarayanb3005@gmail.com>
Co-authored-by: Ram Narayan Balaji <81347100+yan-3005@users.noreply.github.com>
2026-03-18 10:51:03 +05:30
Teddy
40bf82f604
Minor move 20 migrations (#26236)
* FIX - Redshift converter (#26229)

(cherry picked from commit ce8e1e5b5b)

* chore: move 2.0 migration to 1.13.0

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2026-03-05 08:11:15 -08:00
Renamed from bootstrap/sql/migrations/native/2.0.0/mysql/postDataMigrationSQLScript.sql (Browse further)