OpenMetadata/bootstrap/sql/migrations/native
Rajdeep Singh 5e1416447f
fix(sampler): Respect randomizedSample flag at 100% percentage sampling (#26966)
* fix(sampler): respect randomizedSample flag at 100% percentage sampling

When profileSample is 100% with PERCENTAGE type, the sampler
short-circuits and returns the raw dataset without any randomization,
even when randomizedSample is True (the default).

Split the combined condition so:
- No profileSample set -> return raw dataset (no sampling configured)
- 100% PERCENTAGE + randomizedSample=False -> return raw dataset (optimization)
- 100% PERCENTAGE + randomizedSample=True -> go through normal sampling path
  which applies RandomNumFn/df.sample for proper row shuffling

Fixes #21304

* Address review: use 'is False' for Optional[bool] and add unit tests

- Fix randomizedSample check from 'not' to 'is False' in both SQASampler
  and DatalakeSampler to correctly handle None (Optional[bool] default=True)
- Add unit tests verifying 100%% PERCENTAGE behavior for randomizedSample
  values True, False, and None

* Add ORDER BY on random column in fetch_sample_data for true randomization

The get_dataset() fix ensures 100% PERCENTAGE + randomizedSample routes
through get_sample_query() which produces a CTE with a random column.
Now fetch_sample_data() detects that column and applies ORDER BY before
LIMIT, so each call returns a different subset of rows.

Also add real-DB integration tests using SQLite for the 100% PERCENTAGE
edge case (True, False, None).

* Address review: remove stale comment, unused import, add return assertions

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review: move ORDER BY to get_sample_query, clean up fetch_sample_data

- Move ORDER BY rnd.c.random into get_sample_query() PERCENTAGE branch,
  gated on randomizedSample is not False (mirrors ABSOLUTE branch pattern)
- Revert fetch_sample_data() to original: remove ds_columns variable,
  random_column detection, and ORDER BY logic (ordering now handled in CTE)
- Remove duplicate assertions in DatalakeSampler100Pct tests

* Address review: None defaults to False for randomizedSample

Per TeddyCr's feedback, randomization is computationally heavy and
should not be the default. Changed from 'is False'/'is not False' to
truthiness checks so None (unset) behaves the same as False.

Only explicit randomizedSample=True triggers ORDER BY and skips the
100% fast path. This is consistent with the ABSOLUTE branch which
already uses truthiness checks.

* Fix integration test: None should skip sample_query (matches truthiness semantics)

* fix(tests): update BigQuery view sampling expected queries with ORDER BY

BigQuery views fall through to SQASampler.get_sample_query() which now
adds ORDER BY rnd.random when randomizedSample is enabled. Update the
expected SQL strings in test_sampling_for_views and
test_sampling_view_with_partition to match.

* refactor: use explicit is False for randomizedSample checks

Address review comments: SampleConfig.randomizedSample defaults to True,
so only an explicit False should disable randomization. Using is False
/ is not False instead of truthiness ensures None follows the model
default (enabled) rather than being incorrectly treated as disabled.

* ci: re-trigger checks after SIGSEGV flake

* refactor: only explicit True randomizes, add non-determinism tests

* test: increase non-determinism iterations to reduce flakiness

* chore: added randomize as false

* fix: align randomizedSample defaults with schema (false)

* fix: remove ORDER BY from BigQuery test expectations

BigQuery sampling tests create SampleConfig without setting
randomizedSample, which now defaults to False. Since ORDER BY
is only added when randomizedSample is True, the expected query
strings should not include ORDER BY.

Also fix inaccurate docstring in test_sample.py.

* test: increase non-determinism test iterations to reduce flakiness

Increase fetch_sample_data loop from 10 to 20 iterations to further
reduce the theoretical probability of a false failure in the
randomized ordering test.

---------

Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-14 10:28:54 -07:00
..
1.1.0 Fix postgres migration files (#12923) 2023-08-18 14:54:43 +02:00
1.1.1 Fix postgres migration files (#12923) 2023-08-18 14:54:43 +02:00
1.1.2 Issue 8930 - Update profiler timestamp from seconds to milliseconds (#12948) 2023-08-25 08:47:16 +02:00
1.1.5 only add collation to hash columns (#13201) 2023-09-15 12:49:11 +05:30
1.1.6 Add 1.1.6 migrations dir (#13305) 2023-09-22 09:45:00 +02:00
1.1.7 Prep v1.1.7 migrations to address test cases & suites (#13345) 2023-09-27 11:49:21 +02:00
1.2.0 Migration Fixes (#16131) 2024-05-07 22:07:25 +05:30
1.2.1 fix: comment in sql migration (#13979) 2023-11-15 10:32:11 +01:00
1.2.3 Minor: Fix migration location for unity catalog (#14339) 2024-01-03 18:26:11 +05:30
1.2.4 Fix #13982: Fix userFQN encoding while creating mentions (#14496) 2023-12-25 17:28:13 -08:00
1.3.0 Migration Fixes (#16131) 2024-05-07 22:07:25 +05:30
1.3.1 fix: move migration to 1.3.1 (#15463) 2024-03-05 15:30:43 +01:00
1.3.2 Remove SQls from 1.3.2 (#15917) 2024-04-16 18:51:03 +05:30
1.3.3 Move migration for apps to 1.3.3 all together (#15944) 2024-04-18 14:26:05 +05:30
1.4.0 ISSUE #2681 - Add Missing test parameters in PSQL (#25323) 2026-01-16 12:09:15 +01:00
1.4.2 Fix Test Suite Filter (#16615) 2024-06-12 10:40:05 +05:30
1.4.4 Fix #16788: Optimize feed query performance issues introduced in 1.4.2 (#16862) 2024-07-01 19:58:47 -07:00
1.4.5 MINOR - Clean automations_workflow in 1.4.5 (#17006) 2024-07-12 13:54:46 +02:00
1.4.6 Move Migration to 1.4.6 (#17095) 2024-07-19 12:16:53 +05:30
1.4.7 Migrate NameHash (#17317) 2024-08-06 18:41:37 +05:30
1.5.0 Improve count/feed api performance for 1.5 (#17576) 2024-08-23 11:20:34 -07:00
1.5.6 [Search] Indexing Fixes (#18048) 2024-09-30 23:39:27 +05:30
1.5.7 migration: fix duplicate param key insertion (#20802) 2025-04-15 14:10:51 +02:00
1.5.9 MINOR - Move appName migration to 1.5.9 (#18435) 2024-10-28 16:29:56 +01:00
1.5.11 Fix Search Index Contention (#18605) 2024-11-12 20:36:23 +05:30
1.5.15 Domain Policy Update to be non-system (#19060) 2024-12-15 01:18:12 +05:30
1.6.0 Feat# Implementation of Custom Workflows (#23023) 2025-10-08 18:57:44 +05:30
1.6.2 Improvement #19065 : Support removing existing enumKeys (for enum type custom property) (#19054) 2025-01-07 19:25:59 -08:00
1.6.3 Cleanup App data (#19571) 2025-01-28 19:22:33 +05:30
1.6.7 MINOR: chore: added missing timestamp indexes for time series tables (#20373) 2025-03-24 07:43:07 +01:00
1.7.0 Add cleanup apps_extension_time_series (#20857) 2025-04-16 14:54:11 +05:30
1.7.1 Escape ? to causing issues in jdbi binding (#21381) 2025-05-23 17:13:45 +05:30
1.7.2 FIX - Automation Workflows should not be updated by the SM & cleanup migration (#21435) 2025-06-03 12:17:14 +02:00
1.7.4 Disabled bot creating activity feeds (#21773) 2025-06-14 19:21:00 +05:30
1.8.0 Add Data Contracts Specification and APIs (#21164) 2025-06-04 06:36:28 +02:00
1.8.1 Fix #20621: User Status Tracking in the System (#21911) 2025-07-02 14:37:36 -07:00
1.8.2 Fix #20145: Implemented Prefix For Dashboard Service (#21585) 2025-07-08 18:54:35 +02:00
1.8.4 MINOR - Add columns.description in search settings (#22299) 2025-07-15 09:21:57 +02:00
1.8.5 Added missing migration sql files [1.8.5 and 1.10.2] (#24399) 2025-11-18 08:02:35 +01:00
1.8.7 Feature: Security Service (#22450) 2025-07-31 06:38:21 +02:00
1.8.8 Feature: Security Service (#22450) 2025-07-31 06:38:21 +02:00
1.8.9 Feature: Security Service (#22450) 2025-07-31 06:38:21 +02:00
1.9.0 MINOR - Add Tests & fix migrations (#22714) 2025-08-03 15:19:54 +02:00
1.9.2 Add missing domain migrations for entity version history (#23032) 2025-08-21 14:33:37 +05:30
1.9.5 MINOR - Move migrations to 1.9.5 (#23095) 2025-08-28 09:23:21 +02:00
1.9.6 ISSUE #1534 - Profiler Refactor for Metadata Extraction Application (#23200) 2025-09-05 13:07:04 +02:00
1.9.9 Minor fix broken 1.9.8 migrations (#23487) 2025-09-22 13:13:25 +00:00
1.9.10 Fixes #23356: Databricks & UnityCatalog OAuth and Azure AD Auth (#23561) 2025-10-03 19:53:19 +05:30
1.9.11 add entityType.keyword aggregation in searchSettings.json (#23559) 2025-09-25 17:04:49 +05:30
1.10.0 Move migrations to 1.11.x (#24074) 2025-10-30 01:02:45 +05:30
1.10.2 Added missing migration sql files [1.8.5 and 1.10.2] (#24399) 2025-11-18 08:02:35 +01:00
1.10.3 MINOR: dbt migration fix (#23980) 2025-10-23 12:54:34 +02:00
1.10.4 chore: move dbt migration to 1.11 (#24076) 2025-11-03 08:46:47 +01:00
1.10.5 TRUNCATE Flowable History Tables in both 1.10.5 and 1.10.7 Migration (#24323) 2025-11-13 21:05:31 +00:00
1.10.6 Fixes #24132: Airbyte Cloud Support (#24261) 2025-11-11 16:24:09 +05:30
1.10.7 TRUNCATE Flowable History Tables in both 1.10.5 and 1.10.7 Migration (#24323) 2025-11-13 21:05:31 +00:00
1.10.8 Fix email configuration templates default value from 'collate' to 'openmetadata' (#24352) 2025-11-17 08:39:41 +01:00
1.11.0 Moved AI Application and LLM Model entities migrations to 1.12.0 (#25659) 2026-02-02 08:50:37 +01:00
1.11.1 chore: realign main migration with 1.11.1 branch (#24938) 2025-12-22 09:03:28 +01:00
1.11.2 Fix #24578: Datamodels not visible if . in service name (#24779) 2025-12-27 10:00:26 -08:00
1.11.4 Fix search percentile rank scoring (#24859) 2025-12-23 18:06:27 +00:00
1.11.5 Tagging explanation (#24817) 2026-01-08 17:02:40 +01:00
1.11.6 Fix: remove overrideLineage config from database service metadata pipeline (#25379) 2026-01-20 09:08:26 +05:30
1.11.8 Fixes #24546: Add sobjectNames field for multi-object selection in Salesforce connector (#24547) 2026-02-02 16:05:59 +01:00
1.11.9 Add bulk apis for pipeline status (#25731) 2026-02-10 18:14:06 +05:30
1.11.11 Fix-20713: Add support for metadata ingestion using local file in REST connector (#26036) 2026-02-23 21:50:26 +05:30
1.11.12 Fix #26178: Add support for IAM auth for redshift (#26179) 2026-03-02 21:57:28 +05:30
1.12.0 MINOR - Allow app definition to pass the impersonation rules for bots (#25909) 2026-02-17 19:52:56 +01:00
1.12.1 Continuous indexing to handle failures (#26111) 2026-03-18 16:23:04 +05:30
1.12.2 Fixes #26225: Add index and FORCE INDEX for listLastTestCaseResultsForTestSuite (MySQL) (#26235) 2026-03-06 07:55:41 -08:00
1.12.4 Move Migration to 1.12.4 from 1.12.3 (#26629) 2026-03-20 09:41:15 +00:00
1.12.5 Update indexing schedule (#27204) 2026-04-10 19:15:08 +05:30
1.13.0 fix(sampler): Respect randomizedSample flag at 100% percentage sampling (#26966) 2026-04-14 10:28:54 -07:00
1.14.0 Fix: align glossary term relation type colors with design system (#27142) 2026-04-13 11:03:35 +00:00
2.0.0 MCP services (#23623) 2026-04-01 22:15:20 +05:30