DataDesigner/packages/data-designer-engine/tests/engine
Nabin Mulepati bbcd7d3995
Some checks are pending
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
fix: harden resume checkpoint handling (#624)
* fix: harden resume checkpoint handling

Persist config identity in metadata, make checkpoints atomic, and reject unsafe resume states so interrupted runs do not mix incompatible or post-processed data.

* fix: close resume edge cases

Let IF_POSSIBLE start fresh for resize configs and mark after-generation processing before mutation so interrupted processors cannot be resumed unsafely.

* refactor: drop dataset directory lock

Single-user CLI/notebook flows don't race on the artifact directory, and
the timestamped-directory fallback already handles the "ran it twice"
case. The lock added complexity (re-entrancy, stale cleanup, the
cached-property trap where IF_POSSIBLE→NEVER moves writes to a
timestamped directory while the lock stays pinned to the original) for
no real protection. Atomic metadata writes still cover the actual hazard
(crash mid-write).

Also fix a pre-existing test bug in
test_initial_actual_num_records_uses_actual_parquet_rows_for_partial_row_group
where the mocked scheduler hit the partial-completion path with
unconfigured Mock attributes.

* fix: address Greptile review on resume edge cases

* Drop the unreachable ResumeMode.IF_POSSIBLE branch in
  _post_generation_processed_resume_result. By the time this helper
  runs, build() has normalised IF_POSSIBLE to ALWAYS or NEVER, so the
  guard now matches reality. Tighten the docstring to document the
  three outcomes (no-op return / fall through / raise).

* Split the post-processed extension/raise into two cases. When
  num_records < prior_target the user just asked for fewer records than
  already exist; the previous "would mix pre- and post-processor
  records" message only describes the extension case. Mirror the
  wording used by _load_resume_state and add a regression test.

* Remove the dead _find_completed_row_group_ids wrapper now that
  _build_async uses _find_completed_row_groups directly. Rename the
  related test to match.

* refactor: unify sync + async resume around filesystem-derived progress

Both engines now derive `num_completed_batches` and `actual_num_records`
from `parquet-files/batch_*.parquet` via `_recover_progress_from_disk`.
`metadata.json` keeps describing the run *configuration* (`buffer_size`,
`target_num_records`, `original_target_num_records`, config fingerprint),
while the filesystem is the source of truth for *progress*. This closes
the sync engine's race window between `move_partial_result_to_final_file_path`
and the metadata write that follows it, matching the crash-recovery the
async engine already had.

The sync engine additionally rejects non-contiguous batch IDs (a hole can
only mean external mutation or a directory written by an incompatible
engine); the async engine continues to tolerate gaps from out-of-order
completion via `allow_holes=True`.

Existing sync resume tests now seed parquet files alongside metadata,
and two new tests cover the unified behaviour: filesystem progress wins
when metadata lags, and sync rejects non-contiguous IDs.

* docs: clarify DatasetCreationResults observability scope on resume

`load_dataset`, `count_records`, `load_analysis`, `export`, and `push_to_hub`
all read from the artifact directory, so they reflect the cumulative dataset
(original + resume rows). `task_traces`, model-usage logs, and telemetry
events are scoped to the current invocation only because the original run's
in-memory state is not persisted. Document this in the class docstring,
the architecture note, and the Fern resume guide.

* docs: explain DeprecationWarning re-raise in create()/preview()

Future readers were puzzled by the ``except DeprecationWarning: raise``
short-circuits before the generic generation-error wrappers. Add a
comment in ``create()`` (with a back-reference from ``preview()``) to
record that strict warning filters (``pytest.warns``,
``-W error::DeprecationWarning``) turn the engine's
``warnings.warn(..., DeprecationWarning)`` calls — most notably the
``allow_resize=True`` deprecation in ``_resolve_async_compatibility`` —
into raised exceptions, and we want them to surface untouched instead of
being swallowed by ``DataDesignerGenerationError``.

* fix: close after-generation crash window and tighten metadata typing on resume

Address review feedback on resume hardening:

* Run after-generation processors unconditionally on the on-disk dataset
  rather than gating on the generation return value. The previous gate
  silently skipped after-generation when resume saw every row group
  already on disk, leaving a crash window between the final parquet write
  and the ``post_generation_state="started"`` marker write: in that
  window the dataset is complete but after-generation never ran, and the
  on-disk parquet files are still clean. The "started" short-circuit
  still rejects the other direction (crashed mid-rewrite, ambiguous
  state), so resume only re-runs after-generation when it is safe to do
  so.

* Raise ``DatasetGenerationError`` (instead of letting a raw
  ``TypeError`` leak out of ``num_records < prior_target``) when a
  post-processed dataset's metadata is missing ``target_num_records``.
  Mirrors the wording used by ``_load_resume_state``.

* Document the new behaviour in ``architecture/dataset-builders.md`` and
  the Fern resume invariants.

Tests:

* ``test_build_resume_complete_dataset_runs_after_generation_when_no_marker``
  covers the closed crash window via the public ``set_processor_runner``
  API.
* ``test_build_resume_post_generation_processed_missing_target_raises_clearly``
  covers the typed-error gap.
2026-05-11 11:44:46 -06:00
..
analysis chore: Improve CLI startup with lazy heavy import cleanup (#330) 2026-02-18 16:24:15 -05:00
column_generators feat: let column configs declare all model aliases for the startup health check (#626) 2026-05-11 11:33:50 -06:00
dataset_builders fix: harden resume checkpoint handling (#624) 2026-05-11 11:44:46 -06:00
mcp refactor: Decouple ModelFacade from LiteLLM via ModelClient adapter (#373) 2026-03-11 14:30:40 -06:00
models feat: make async engine the default execution path (#592) 2026-05-04 16:22:13 -03:00
processing feat: add RunConfig jinja rendering engine (#557) 2026-04-17 15:06:27 -04:00
registry chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
resources fix: normalize rollout timestamps before deriving started_at/ended_at (#556) 2026-05-07 14:13:10 -04:00
sampling_gen feat: add RunConfig jinja rendering engine (#557) 2026-04-17 15:06:27 -04:00
storage fix: harden resume checkpoint handling (#624) 2026-05-11 11:44:46 -06:00
testing fix(engine): validate processor plugin impls (#609) 2026-05-06 14:31:12 -04:00
validators chore: Improve CLI startup with lazy heavy import cleanup (#330) 2026-02-18 16:24:15 -05:00
__init__.py feat: Native Anthropic adapter with shared HTTP client infrastructure (#426) 2026-03-19 11:18:40 -06:00
conftest.py feat: Refactor person data reading for client ddb connection control (#393) 2026-03-19 09:34:57 -05:00
test_compiler.py refactor: slim package refactor into three subpackages (#240) 2026-01-27 13:53:20 -05:00
test_configurable_task.py chore: Improve CLI startup with lazy heavy import cleanup (#330) 2026-02-18 16:24:15 -05:00
test_dataset_metadata.py refactor: slim package refactor into three subpackages (#240) 2026-01-27 13:53:20 -05:00
test_engine_errors.py chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
test_model_provider.py feat(models): deprecate implicit default provider routing (#594) 2026-05-05 13:39:12 -06:00
test_secret_resolver.py chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
test_validation.py feat: add skip.when conditional column generation (#502) 2026-04-15 09:31:50 -06:00