DataDesigner

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Author	SHA1	Message	Date
Nabin Mulepati	2a487cdc5c	feat: add dropped column preservation toggle (#691 ) * feat: add dropped column preservation toggle Closes #690 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: reject dropped column policy resume mismatch Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>	2026-05-21 13:19:20 -06:00
Eric W. Tramel	c0a4dcbb85	feat: implement async scheduling admission control (#661 ) Some checks are pending CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test Config (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Config (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Config (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Coverage Check (Python 3.11) (push) Blocked by required conditions Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details	2026-05-20 20:58:05 -04:00
Nabin Mulepati	a83968f701	feat: preserve multimodal MCP tool results (#689 ) Some checks are pending CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Coverage Check (Python 3.11) (push) Blocked by required conditions Details CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Lint and Format Check (push) Blocked by required conditions Details CI / Check License Headers (push) Blocked by required conditions Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details * feat: preserve multimodal MCP tool results Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: gate MCP generic image payload detection Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: validate MCP image payload blocks Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>	2026-05-20 14:49:22 -06:00
Nabin Mulepati	0860d62e23	fix: restore chat completion multi-choice support (#672 ) * fix chat completion multi-choice support Restore the chat completion n request field and preserve all returned choices in the canonical response while keeping response.message as the first choice. Add coverage for request forwarding, compatibility access, multi-choice parsing, and generate forwarding. Fixes #620 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * strip n from generate requests Prevent generate and agenerate from forwarding multi-choice requests that they cannot expose, while keeping completion() multi-choice support intact. Add coverage for async parsing and Anthropic n exclusion. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * strip configured n from generate requests Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * rename multiple choice completion flag Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * move choice sanitizer to private helpers Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * order private facade helpers Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>	2026-05-20 11:40:10 -06:00
Nabin Mulepati	bd0410bb05	fix(engine): actionable error when a Jinja field is missing/None/empty (#633 ) Some checks are pending CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Coverage Check (Python 3.11) (push) Blocked by required conditions Details CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Lint and Format Check (push) Blocked by required conditions Details CI / Check License Headers (push) Blocked by required conditions Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details * fix(engine): actionable error when a Jinja field is missing/None/empty Empty-render and missing-attribute failures used to surface as the generic "User provided prompt generation template is invalid." either because `sanitize_user_exceptions` stripped the detail or because Jinja's raw `UndefinedError` leaked through. Both now raise a new `EmptyTemplateRenderError` carrying a row-level diagnostic that names the offending chain and includes copy-pasteable Jinja conditional and SkipConfig fix patterns. Closes #629. * fix(engine): address PR review feedback on EmptyTemplateRenderError Addresses the open review comments on #633: 1. (Greptile P1) Gate expression in the suggested remediation template was one accessor too deep when the root variable was entirely absent from the record, causing the suggested fix to itself raise UndefinedError. Fall back to gating on the root name alone when sample_name is not in record. 2. (andreatgretel) The AST walker reported loop-local names as missing culprits (e.g. ``person`` in ``{% for person in people %}...{% endfor %}``). Filter extracted chains through ``meta.find_undeclared_variables`` to defer to Jinja's canonical scope tracking. 3. (andreatgretel follow-up) Empty collections used as loop iterables (``items=[]``) fell through to the no-culprit fallback. Add a new ``_CULPRIT_EMPTY_COLLECTION`` classification so they're surfaced. 4. Minor: add ``from exception`` to ``safe_render``'s UndefinedError re-raise for traceback consistency with the native engine path, and add a note on the load-bearing exception ordering in ``sanitize_user_exceptions``.	2026-05-20 09:51:21 -06:00
Andre Manoel	6055290136	feat: add workflow chaining (#636 ) Some checks are pending CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Lint and Format Check (push) Waiting to run Details CI / Check License Headers (push) Waiting to run Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details Publish Fern devnotes / deploy (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details * feat: add workflow chaining * test: tidy workflow chaining coverage * fix: harden workflow chaining concurrency * docs: update workflow chaining plan * feat: add workflow stage postprocessors * feat: expose workflow stage outputs * fix: align workflow selected output export * fix: address workflow chaining review issues * fix: align workflow parquet export selection Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix: preserve generated columns in drop validation Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix: clarify workflow output processors Signed-off-by: Andre Manoel <amanoel@nvidia.com> * docs: add workflow chaining page * docs: align workflow chaining warning * fix: address workflow review nits --------- Signed-off-by: Andre Manoel <amanoel@nvidia.com>	2026-05-18 20:15:47 -03:00
Nabin Mulepati	71997624b3	feat: track reasoning token usage (#670 ) Some checks are pending CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Config (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Config (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Config (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Config (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Config (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Config (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details * feat: track reasoning token usage Capture provider-reported reasoning-token breakdowns alongside output tokens without changing output token totals. Carry the field through model usage aggregation and add coverage for parsing, facade tracking, and deltas. Refs #665 * fix: show reasoning tokens in usage summary Include reasoning token counts in the local model usage summary while preserving output and total token semantics. Telemetry remains unchanged. Refs #665 * fix: estimate missing reasoning token counts When providers return reasoning content without a numeric usage breakdown, estimate reasoning tokens from that content while preserving provider-reported output and total token counts. Refs #665 * fix: track reasoning token count source * fix: simplify reasoning token source * fix: omit unknown reasoning tokens from logs * refactor: clarify reasoning token count helpers * test: move token counting tests * fix: enforce reasoning token source * fix: address reasoning usage review	2026-05-18 12:15:31 -06:00
Eric W. Tramel	a4085c441a	feat: add AIMD startup ramp (#638 ) Some checks failed CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Coverage Check (Python 3.11) (push) Has been cancelled Details CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Lint and Format Check (push) Has been cancelled Details CI / Check License Headers (push) Has been cancelled Details Publish devnotes / deploy (push) Has been cancelled Details Publish Fern devnotes / deploy (push) Has been cancelled Details CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details	2026-05-13 16:25:03 -04:00
Eric W. Tramel	0fdea845ac	feat: add fair async task scheduling (#639 )	2026-05-13 13:47:45 -04:00
Nabin Mulepati	bbcd7d3995	fix: harden resume checkpoint handling (#624 ) Some checks are pending CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details * fix: harden resume checkpoint handling Persist config identity in metadata, make checkpoints atomic, and reject unsafe resume states so interrupted runs do not mix incompatible or post-processed data. * fix: close resume edge cases Let IF_POSSIBLE start fresh for resize configs and mark after-generation processing before mutation so interrupted processors cannot be resumed unsafely. * refactor: drop dataset directory lock Single-user CLI/notebook flows don't race on the artifact directory, and the timestamped-directory fallback already handles the "ran it twice" case. The lock added complexity (re-entrancy, stale cleanup, the cached-property trap where IF_POSSIBLE→NEVER moves writes to a timestamped directory while the lock stays pinned to the original) for no real protection. Atomic metadata writes still cover the actual hazard (crash mid-write). Also fix a pre-existing test bug in test_initial_actual_num_records_uses_actual_parquet_rows_for_partial_row_group where the mocked scheduler hit the partial-completion path with unconfigured Mock attributes. * fix: address Greptile review on resume edge cases * Drop the unreachable ResumeMode.IF_POSSIBLE branch in _post_generation_processed_resume_result. By the time this helper runs, build() has normalised IF_POSSIBLE to ALWAYS or NEVER, so the guard now matches reality. Tighten the docstring to document the three outcomes (no-op return / fall through / raise). * Split the post-processed extension/raise into two cases. When num_records < prior_target the user just asked for fewer records than already exist; the previous "would mix pre- and post-processor records" message only describes the extension case. Mirror the wording used by _load_resume_state and add a regression test. * Remove the dead _find_completed_row_group_ids wrapper now that _build_async uses _find_completed_row_groups directly. Rename the related test to match. * refactor: unify sync + async resume around filesystem-derived progress Both engines now derive `num_completed_batches` and `actual_num_records` from `parquet-files/batch_.parquet` via `_recover_progress_from_disk`. `metadata.json` keeps describing the run configuration* (`buffer_size`, `target_num_records`, `original_target_num_records`, config fingerprint), while the filesystem is the source of truth for progress. This closes the sync engine's race window between `move_partial_result_to_final_file_path` and the metadata write that follows it, matching the crash-recovery the async engine already had. The sync engine additionally rejects non-contiguous batch IDs (a hole can only mean external mutation or a directory written by an incompatible engine); the async engine continues to tolerate gaps from out-of-order completion via `allow_holes=True`. Existing sync resume tests now seed parquet files alongside metadata, and two new tests cover the unified behaviour: filesystem progress wins when metadata lags, and sync rejects non-contiguous IDs. * docs: clarify DatasetCreationResults observability scope on resume `load_dataset`, `count_records`, `load_analysis`, `export`, and `push_to_hub` all read from the artifact directory, so they reflect the cumulative dataset (original + resume rows). `task_traces`, model-usage logs, and telemetry events are scoped to the current invocation only because the original run's in-memory state is not persisted. Document this in the class docstring, the architecture note, and the Fern resume guide. * docs: explain DeprecationWarning re-raise in create()/preview() Future readers were puzzled by the ``except DeprecationWarning: raise`` short-circuits before the generic generation-error wrappers. Add a comment in ``create()`` (with a back-reference from ``preview()``) to record that strict warning filters (``pytest.warns``, ``-W error::DeprecationWarning``) turn the engine's ``warnings.warn(..., DeprecationWarning)`` calls — most notably the ``allow_resize=True`` deprecation in ``_resolve_async_compatibility`` — into raised exceptions, and we want them to surface untouched instead of being swallowed by ``DataDesignerGenerationError``. * fix: close after-generation crash window and tighten metadata typing on resume Address review feedback on resume hardening: * Run after-generation processors unconditionally on the on-disk dataset rather than gating on the generation return value. The previous gate silently skipped after-generation when resume saw every row group already on disk, leaving a crash window between the final parquet write and the ``post_generation_state="started"`` marker write: in that window the dataset is complete but after-generation never ran, and the on-disk parquet files are still clean. The "started" short-circuit still rejects the other direction (crashed mid-rewrite, ambiguous state), so resume only re-runs after-generation when it is safe to do so. * Raise ``DatasetGenerationError`` (instead of letting a raw ``TypeError`` leak out of ``num_records < prior_target``) when a post-processed dataset's metadata is missing ``target_num_records``. Mirrors the wording used by ``_load_resume_state``. * Document the new behaviour in ``architecture/dataset-builders.md`` and the Fern resume invariants. Tests: * ``test_build_resume_complete_dataset_runs_after_generation_when_no_marker`` covers the closed crash window via the public ``set_processor_runner`` API. * ``test_build_resume_post_generation_processed_missing_target_raises_clearly`` covers the typed-error gap.	2026-05-11 11:44:46 -06:00
Nabin Mulepati	4b93f5b245	feat: let column configs declare all model aliases for the startup health check (#626 ) * feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>	2026-05-11 11:33:50 -06:00
Przemysław Boruta	810c681f7a	feat: resume interrupted dataset generation runs (sync + async engine) (#526 ) Some checks failed CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Coverage Check (Python 3.11) (push) Has been cancelled Details CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Lint and Format Check (push) Has been cancelled Details CI / Check License Headers (push) Has been cancelled Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details * docs: add implementation plan for resume mechanism Fixes #525 * feat(storage): add resume flag and clear_partial_results() - ArtifactStorage gains a `resume: bool = False` field - resolved_dataset_name skips timestamp logic when resume=True, returning the existing dataset folder name as-is - Raises ArtifactStorageError on resume=True when the target folder is absent or empty (no data to resume from) - New clear_partial_results() removes in-flight partial results left over from an interrupted run Fixes #525 * feat(batch-manager): add start_batch param to start() DatasetBatchManager.start() now accepts: - start_batch: int = 0 — first batch index to process - initial_actual_num_records: int = 0 — records already on disk Both default to 0 so all existing call sites are unaffected. Fixes #525 * feat(builder): implement resume logic in DatasetBuilder - build() gains a resume: bool = False parameter - _load_resume_state() reads metadata.json and validates that num_records and buffer_size match the original run - _build_with_resume() skips completed batches, clears in-flight partial results, and continues from the first incomplete batch - Raises DatasetGenerationError with clear messages for: - missing metadata.json (interrupted before first batch completes) - num_records mismatch - buffer_size mismatch - DATA_DESIGNER_ASYNC_ENGINE=1 (not yet supported) - Logs a warning and returns early when dataset is already complete Fixes #525 * feat(interface): expose resume on DataDesigner.create() - create() gains resume: bool = False - _create_resource_provider() passes resume to ArtifactStorage - builder.build() receives the resume flag Fixes #525 * test: add tests for resume mechanism Covers: - ArtifactStorage.resolved_dataset_name with resume=True - ArtifactStorage.clear_partial_results() - DatasetBatchManager.start() with start_batch and initial_actual_num_records - DatasetBuilder.build(resume=True): missing metadata, num_records mismatch, buffer_size mismatch, already-complete detection Fixes #525 * feat(builder): extend resume to async engine (DATA_DESIGNER_ASYNC_ENGINE=1) - Add _find_completed_row_group_ids() to scan parquet-files/ for already-written row groups by parsing batch_.parquet filenames - _build_async() now accepts resume=True: loads metadata, finds completed row groups, clears partial results, and logs progress; returns early if all row groups are done - _prepare_async_run() accepts skip_row_groups, initial_actual_num_records, and initial_total_num_batches so the scheduler only processes remaining row groups and RowGroupBufferManager starts from the correct counts - RowGroupBufferManager.__init__ gains initial_actual_num_records and initial_total_num_batches params to seed the counters on resume - finalize_row_group closure now writes incremental metadata after each checkpoint so any run (resume or not) can be resumed if interrupted mid-way - Remove the guard that rejected resume=True with DATA_DESIGNER_ASYNC_ENGINE=1 - Add tests for all new paths fix(builder): skip after-generation processors when resume finds dataset already complete _build_with_resume and _build_async now return False when the dataset is already complete (early-return path), True otherwise. build() skips _processor_runner.run_after_generation() on False, preventing processors from calling shutil.rmtree and rewriting an already-finalized dataset. Fixes the issue raised in review: greptile P1 comment on PR #526. * fix(builder): use filesystem count for initial_total_num_batches on async resume Metadata can lag by one row group if a crash occurs between move_partial_result_to_final_file_path and write_metadata. Using len(completed_ids) from the filesystem scan instead of state.num_completed_batches ensures the final metadata reflects the actual number of parquet files present, not the potentially stale metadata count. * feat(results): add export() method and --output-format CLI flag Adds DatasetCreationResults.export(path, format=) supporting jsonl, csv, and parquet. The CLI create command gains --output-format / -f which writes dataset.<format> alongside the parquet batch files. * fix(builder): handle resume when metadata.json missing (interrupted before first batch) When a run is interrupted before any row group or batch completes, metadata.json is never written. Previously resume=True would raise DatasetGenerationError in this case. Now build() detects the missing file, logs an info message, clears any leftover partial results and falls back to a clean fresh run. This is the common scenario for small datasets (fewer records than buffer_size) where all records fit in a single row group. * docs(interface): fix resume docstring — async engine is supported * fix(builder): derive initial_actual_num_records from filesystem in async resume In the crash window (row group written to disk but write_metadata crashed before updating the file), both initial_total_num_batches and initial_actual_num_records now use the filesystem-discovered completed_ids as source of truth. Previously initial_actual_num_records was read from potentially stale metadata, causing actual_num_records in the final metadata to be undercounted by one row group. Also adds a test covering the partial-resume crash-window scenario. * feat(resume): replace resume: bool with ResumeMode enum (NEVER/ALWAYS/IF_POSSIBLE) - Introduces ResumeMode(StrEnum) in artifact_storage.py for use across all layers - Replaces resume: bool with resume: ResumeMode in DatasetBuilder.build(), DataDesigner.create(), ArtifactStorage, and _build_async() - Adds _check_resume_config_compatibility() using config fingerprints to support IF_POSSIBLE: falls back to a fresh run when config has changed since last run - Relaxes num_records validation from strict equality to num_records >= actual_num_records, allowing dataset extension on resume; buffer_size must still match exactly - Preserves exception chain with 'from exc' on FileNotFoundError in _load_resume_state - Exports ResumeMode from data_designer.interface for users to import - Adds skip_row_groups assertion test and IF_POSSIBLE storage behavior tests * fix(resume): invalidate resolved_dataset_name cache when IF_POSSIBLE downgrades to NEVER ArtifactStorage's Pydantic model validator accesses base_dataset_path at construction time, caching resolved_dataset_name under IF_POSSIBLE semantics before build() can set resume=NEVER. Pop the stale cache entry so the property re-resolves with the correct NEVER semantics (timestamped directory). Also fixes _check_resume_config_compatibility() to use artifact_path/dataset_name directly instead of base_dataset_path, and adds a regression test covering the cache-bypass scenario. * fix(builder): move partial-completion warning before return in _build_async * fix(builder): IF_POSSIBLE now starts fresh when no dataset directory exists _check_resume_config_compatibility returned True when config_path was absent, even when the dataset directory itself didn't exist. This caused IF_POSSIBLE to upgrade to ALWAYS, which then raised ArtifactStorageError on the first-ever run because ALWAYS requires an existing directory. Fix: return False early when the dataset directory is absent. Also sets actual_num_records on mock buffer managers in two async resume tests that started failing after the partial-completion warning block was made reachable. * fix(builder): use original target_num_records in async resume record count When extending a non-aligned run (e.g. original num_records=5, buffer_size=2), the last completed row group has 1 record, not buffer_size=2. Using new num_records in the formula would overcount: min(2, 7-22)=2 instead of min(2, 5-22)=1. Fix: capture state from _load_resume_state (previously discarded) and pass state.target_num_records into the sum formula. Added target_num_records field to _ResumeState, populated from metadata.json. Test: test_build_async_resume_initial_actual_num_records_uses_original_target * fix(builder): IF_POSSIBLE starts fresh on empty dataset directory Empty directory (crash between mkdir and first file write) was treated as compatible — _check_resume_config_compatibility returned True, IF_POSSIBLE upgraded to ALWAYS, which then raised ArtifactStorageError. Fix: treat empty directory the same as missing — return False from _check_resume_config_compatibility when any(dir.iterdir()) is False. Test: test_if_possible_starts_fresh_when_directory_is_empty * fix(builder): ALWAYS raises DatasetGenerationError on config fingerprint mismatch ResumeMode.ALWAYS was documented to raise when column/model config changed, but _check_resume_config_compatibility() was only called in the IF_POSSIBLE branch. A user resuming with ALWAYS after changing the config would silently mix records from two different configs. Fix: - Refactor _check_resume_config_compatibility() to return _ConfigCompatibility enum (COMPATIBLE / INCOMPATIBLE / NO_PRIOR_DATASET) instead of bool so callers can distinguish 'no prior run' from 'configs differ' - Call the check for both ALWAYS and IF_POSSIBLE before _write_builder_config() - ALWAYS + INCOMPATIBLE → DatasetGenerationError - IF_POSSIBLE + INCOMPATIBLE → silent fresh start (existing behaviour) - IF_POSSIBLE + NO_PRIOR_DATASET → silent fresh start (existing behaviour) Test: test_build_resume_always_raises_on_config_mismatch * fix(resume): address nabinchha review — drop export collision, add CLI flag, fix edge cases C1: drop commit `0bdf24ab` — remove export() / --output-format from this PR; that feature belongs to #540 which has a superior streaming implementation C2: add --resume / -r flag to data-designer create CLI, thread ResumeMode through GenerationController.run_create() into DataDesigner.create() C3: fix already-complete warning text — replace stale "Remove resume=True" with "Use resume=ResumeMode.NEVER" in _build_with_resume and _build_async C4: fix docstrings — ALWAYS does NOT raise when no checkpoint exists (silently restarts from scratch); clarify num_records >= actual semantics C5: sync artifact_storage.resume = NEVER when no-metadata fallback fires so both state holders agree after the downgrade C6: fix return_value=False → _ConfigCompatibility.INCOMPATIBLE in IF_POSSIBLE test; drop 3 direct _find_completed_row_group_ids tests (private API, covered by build()) W1: add logger.warning when builder_config.json is absent (silent COMPATIBLE was footgun) W2: narrow except Exception → (OSError, json.JSONDecodeError, ValidationError) W3: run make check-all-fix — ruff reformatted test_if_possible_starts_fresh_when_directory_is_empty * fix(builder): replace stdlib StrEnum with project compat shim for Python 3.10 * fix(builder): guard extension row groups in initial_actual_num_records formula on async resume When extending an async run (num_records > state.target_num_records) and a crash occurs after an extension row group is written to disk but before write_metadata, the formula `min(buffer_size, state.target_num_records - rg_id * buffer_size)` yields a negative value for any extension row group (rg_id * buffer_size >= target), making initial_actual_num_records silently undercount. The RowGroupBufferManager then starts at the wrong offset, and the final metadata reports an incorrect actual_num_records with a false partial-completion warning. Fix: use state.target_num_records for original row groups and num_records for extension row groups (guarded by rg_id * buffer_size < state.target_num_records). Covers the scenario with a new regression test. * fix(builder): pre-compute row-group list in _build_async to fix sizes on non-aligned extension resume The partitioning loop in _prepare_async_run decremented remaining by min(buffer_size, remaining) for every row group, including skipped ones. For a non-aligned original run (e.g. target=5, buffer_size=2, last group has 1 record), the loop deducted 2 for the skipped last group, leaving remaining one short. Extension row groups received smaller sizes than intended, so the generated dataset was silently short by the deficit and a false partial-completion warning fired. Fix: pre-compute the full row-group list with correct per-group sizes in _build_async where state.target_num_records is available, then pass it to _prepare_async_run as precomputed_row_groups (replacing the skip_row_groups param). Original groups use min(buffer_size, target - rgbs); extension groups use min(buffer_size, extension_records - ext_idxbs). Also updates the skip_row_groups test to assert on precomputed_row_groups and adds a regression test for the non-aligned extension case. * chore: remove stale implementation plan for #525 The plan described the initial resume: bool design which has since been replaced by the full ResumeMode enum (NEVER/ALWAYS/IF_POSSIBLE), async engine support, filesystem reconciliation, and config compatibility checks. The PR description is the authoritative record of what shipped. * fix(engine): fix false 'already complete' when extension fits in last group's slack original_target=5, buffer_size=2 produces 3 groups [2,2,1]. Extending to num_records=6: ceil(6/2)=3 equalled len(completed_ids)=3, triggering the already-complete branch on both the async and sync paths — returning the 5-record dataset silently. Fix (async): replace ceil(num_records/bs) with num_original_groups + ceil(extension_records/bs) so any extension always adds new groups beyond num_original_groups. Fix (sync): add num_records_list param to DatasetBatchManager.start() and pass the correct per-batch sizes in _build_with_resume, giving the batch manager the right total batch count (4 instead of 3 in the example). * fix(engine): raise error when num_records is below original target on resume Prevents negative extension_records in async path which silently truncated the dataset and corrupted metadata without triggering a partial-completion warning. * fix(storage): refresh MediaStorage path after IF_POSSIBLE → NEVER downgrade When build() detected an incompatible config and downgraded resume from IF_POSSIBLE to NEVER, _media_storage.base_path remained bound to the original directory while all other path properties resolved to the new timestamped directory — causing broken image references in image-column runs. * fix(engine): preserve original_target_num_records across extension resume writes After finalize_row_group successfully wrote incremental metadata during an extension run, target_num_records in metadata was updated to the extension target. A subsequent resume would read this as the original target, making _rg_size() incorrect for all row groups and silently corrupting actual_num_records. Stores original_target_num_records as an immutable field in metadata so the original group boundaries are always recoverable regardless of how many incremental writes have occurred. --------- Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>	2026-05-08 15:37:56 -06:00
Eric W. Tramel	8d4d59303d	fix: normalize rollout timestamps before deriving started_at/ended_at (#556 )	2026-05-07 14:13:10 -04:00
Johnny Greco	9214637a5b	fix(engine): validate processor plugin impls (#609 ) * fix(engine): validate processor plugin impls Add the processor implementation base to assert_valid_plugin so processor plugins are checked against Processor instead of only the generic config contract. Keep plugin type validation table-driven and raise explicit AssertionError messages so checks are not skipped under optimized Python. Signed-off-by: Johnny Greco <jogreco@nvidia.com> * test(engine): require plugin base map coverage --------- Signed-off-by: Johnny Greco <jogreco@nvidia.com>	2026-05-06 14:31:12 -04:00
Nabin Mulepati	f73da1975c	feat(models): deprecate implicit default provider routing (#594 ) Some checks failed CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Coverage Check (Python 3.11) (push) Has been cancelled Details CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details * feat(models): deprecate implicit default provider routing Emit DeprecationWarning whenever the legacy "implicit default provider" path is exercised: `ModelConfig.provider=None`, the registry-level `ModelProviderRegistry.default`, the YAML `default:` key in `~/.data-designer/model_providers.yaml`, and the CLI's "Change default provider" workflow. `resolve_model_provider_registry` skips passing `default=` in the single-provider case so the common construction path stays quiet. Multi-provider registries still pass `default` (per `check_implicit_default`) and warn accordingly. Update docs, the package README, and test fixtures to specify `provider=` explicitly on every `ModelConfig`. New tests cover each warning entry point and pin the post-deprecation happy paths. Refs #589 Made-with: Cursor * fix(models): address PR #594 review feedback Greptile P1: ProviderRepository.load emitted its DeprecationWarning inside a `try/except Exception` block. Under `filterwarnings("error", DeprecationWarning)` the warn would raise, the except would swallow it, and `load()` would silently return None (losing the registry). Move the warn outside the catch-all so the strict-warning path no longer drops valid configs. Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and `_warn_on_explicit_default` use `stacklevel=2`, which lands inside pydantic v2's validator dispatch rather than at the user's `ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke both attribution (the source line was unhelpful) and Python's once-per-location dedup (every call collapsed to the same pydantic-internal key, suppressing all but the first warning). Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`, which walks past the helper, validator, and any pydantic frames to find the user's call site and emits via `warnings.warn_explicit` with the user frame's `__warningregistry__`. Keeps attribution accurate and dedup keyed on the user's (filename, lineno). johnnygreco: align the `provider_repository.py` warning copy with the sibling site in `default_model_settings.py` ("specify provider= explicitly on each ModelConfig instead") so both YAML-default warning sites give the same migration instruction. The previous wording pointed users at "ModelConfig entries" inside `model_providers.yaml`, where ModelConfig entries don't actually live. johnnygreco: dedup the cascade in `DataDesigner.__init__`. With `model_providers=None` and a YAML `default:`, the user previously saw two DeprecationWarnings for the same root cause — `get_default_provider_name()` warns about the YAML key, then `resolve_model_provider_registry(...)` re-warns from `_warn_on_explicit_default`. Suppress the registry-level duplicate in the YAML-fallback branch via `warnings.catch_warnings()` so users see exactly one warning per user action. johnnygreco: tighten `_warn_on_explicit_default` to fire only when `default is not None`. Passing `default=None` explicitly is semantically equivalent to omitting it (caller is opting out of a registry-level default), and shouldn't trigger the deprecation nudge. johnnygreco: add a `model_validate({...})` regression test for `ModelConfig` so the deserialization path (legacy on-disk configs) is pinned alongside the construction path. Tests: - Update `test_load_exists` and `test_save` to omit `default=` so the roundtrip stops exercising the deprecated YAML-default path unguarded (Greptile note). - Wrap `test_resolve_model_provider_registry_with_explicit_default`, `test_get_provider`, and `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default` in `pytest.warns` so the suite stays green under `-W error::DeprecationWarning` (Greptile note). - Add `test_explicit_default_none_does_not_emit_deprecation_warning` to pin the tightened predicate. - Add `test_init_yaml_default_emits_single_deprecation_warning` to pin the cascade-dedup behavior. Refs #589 Made-with: Cursor * fix(models): make deprecation warnings visible under default filters andreatgretel (PR #594): the YAML-default warning in `get_default_provider_name` and the registry-default warning emitted from inside DataDesigner helpers were attributing to data_designer library frames, not user code. Python's default filter chain includes `ignore::DeprecationWarning`, so library-attributed entries are silenced — meaning a normal `DataDesigner()` call with a YAML `default:` set showed nothing, and `resolve_model_provider_registry` warnings were similarly invisible. Two related changes: 1. `warn_at_caller`: extend the default skip-list from `("pydantic",)` to `("pydantic", "pydantic_core", "data_designer")` so the walk escapes both pydantic's validator-dispatch frames and data_designer helper frames before attributing. Also tighten the prefix predicate to exact-or-dotted-prefix matching (`name == p or name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not falsely matched as part of `pydantic` (johnnygreco nit). Allow callers to pass a custom `skip_prefixes` for flexibility. Drop the "skip frame 0+1 unconditionally" guard now that prefix matching covers it. 2. `get_default_provider_name`: switch from `warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous stacklevel pointed into `default_model_settings.py`, which is a library file → silenced under default filters. Verified the fix empirically with `python -W default`: warning is now attributed to the user's call site and rendered. johnnygreco (PR #594): add the missing `test_explicit_default_none_does_not_emit_deprecation_warning` regression for the `self.default is not None` predicate landed in the prior round. Tests: - New `test_warning_helpers.py` pins prefix-matching precision (rejects `pydantic_helpers` / `data_designer_other`), default skip-list contents, attribution past skip-prefix frames, and per-call-site dedup behavior. - `test_get_default_provider_name_warning_attributes_to_user_frame` pins andreatgretel's repro for the YAML-default site. - `test_explicit_default_warning_attributes_to_user_frame` pins the multi-frame case: construction goes through `resolve_model_provider_registry`, so the walk has to escape both pydantic and data_designer before landing on the test file. - `test_explicit_default_none_does_not_emit_deprecation_warning` pins johnnygreco's predicate-tightening regression. 3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net from this round). Refs #589 Made-with: Cursor * fix(models): apply warn_at_caller to remaining deprecation sites greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`, which attributes to whichever data_designer frame called `load()` — controllers, services, list/reset commands, agent introspection. Every real call path lands on `data_designer.cli.*`, which falls under Python's default `ignore::DeprecationWarning` filter and is silenced. Audit found two more sites with the same problem: - `DatasetBuilder._resolve_async_compatibility` (`allow_resize` / issue #552) — was using `stacklevel=4` to walk past `_resolve_async_compatibility -> build/build_preview -> interface -> user`. Brittle: any added frame (decorator, async wrapping, the `try/except DeprecationWarning: raise` boundary) shifts attribution silently. The existing test passed only because it used `simplefilter("always") + record=True`, which records warnings regardless of attribution. - `ProviderController._handle_change_default` — was using `stacklevel=2`, which lands on the menu dispatcher in the same controller module. `print_warning` already shows the message visually, but programmatic observers (`pytest.warns`, `filterwarnings("error", ...)`) saw a library-attributed entry that default filters silenced. All three migrated to `warn_at_caller` (the helper from `247fa30`) so attribution lands on the user's call site regardless of internal chain shape. `data_designer` is already in `DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library in one pass. Added attribution regression tests at each site asserting `warning.filename == __file__`. A future regression to `warnings.warn(stacklevel=N)` now fails CI instead of silently silencing the user-facing nudge: - `test_load_with_yaml_default_attributes_warning_to_caller` (test_provider_repository.py) - `test_resolve_async_compatibility` extended with the same assertion - `test_handle_change_default_emits_deprecation_warning` rewritten from `pytest.warns(...)` to a `catch_warnings(record=True)` block that filters for the message and asserts `filename == __file__` (`pytest.warns` does not check attribution, so the rewrite is required to actually catch the regression). 3,125 tests pass (548 config + 1,923 engine + 654 interface). Refs #589	2026-05-05 13:39:12 -06:00
Andre Manoel	61cdeefb17	feat: make async engine the default execution path (#592 ) Some checks failed CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details Publish devnotes / deploy (push) Has been cancelled Details * feat: make async engine the default execution path The async engine has been hardening as opt-in for several releases. Make it the default and address the prerequisites flagged for the flip. Default flip - DATA_DESIGNER_ASYNC_ENGINE defaults to "1" at both consumption sites - Set DATA_DESIGNER_ASYNC_ENGINE=0 for one transitional release to opt out - allow_resize=True still falls back to sync with a DeprecationWarning Python 3.10 support - Replace asyncio.TaskGroup (3.11+) in async_concurrency.py with gather-with-explicit-cancel; semantics preserved because _run_task already swallows its own exceptions and uses _shutdown_event for sibling cancellation - Remove the sys.version_info < (3, 11) runtime guard - Remove the matching pytest skipif so the executor tests run on 3.10 too Derived timeouts (replaces two hardcoded 300s constants) - ThrottleManager.acquire_sync/async default to timeout=None (no deadline) instead of DEFAULT_ACQUIRE_TIMEOUT=300; HTTP request timeout already bounds actual work, queue waits scale with provider speed and AIMD - _AsyncBridgedModelFacade derives the sync->async bridge timeout from the model's inference_parameters.timeout and the call's max_correction_steps; one knob (per-model timeout) drives both deadlines, no new config surface - Add ModelFacade.request_timeout property so the bridge can read the effective timeout the client is configured with Root-cause surfacing - AsyncTaskScheduler captures the first non-retryable error and exposes it via first_non_retryable_error - Interface threads it through DataDesignerGenerationError when 0 records are produced without early-shutdown, so deterministic failures (e.g. bad seed sources) surface their original message instead of a wrapped FileNotFoundError on the parquet path Tests - New: throttle no-deadline default behavior (sync+async), parametrized derived bridge timeout, restored async_concurrency tests on 3.10 - Updated: test_dataset_builder.py uses an autouse fixture to pin its Mock-based tests to the sync engine they cover; existing bridge tests set facade.request_timeout for the new derivation Docs - Replace the stale LiteLLM security notice in README with a short async-default heads-up and link to the migration guide - Add docs/migration-async-default.md covering per-model timeouts, custom-column thread safety, mocking model calls, run outcomes, and the opt-out - Append a short Update section to the async-all-the-way-down dev note * test: extract _compute_bridge_timeout helper for direct testing The parametrized bridge-timeout test was patching ``concurrent.futures.Future.result`` to capture the timeout the bridge passed in. That reaches into stdlib internals (DEVELOPMENT.md "Mock at boundaries: Keep mocking shallow") and the ``ids=`` argument on the parametrize was missing. Extracts the formula into a module-level ``_compute_bridge_timeout`` helper. The test now calls the helper directly with no mocking, and the parametrize gets readable ids. Behavior is unchanged. * test(e2e): align demo plugins with async engine contracts The e2e demo plugins exercise plugin discovery and full DD lifecycle. Two of them were written against sync-engine semantics that the async engine restricts: - DemoColumnGeneratorImpl was a ColumnGeneratorFullColumn with no required_columns. The async engine routes ``no-upstream`` columns through the from-scratch path, which passes an empty DataFrame to generators that aren't FromScratchColumnGenerator subclasses. The generator then produces 0 rows and the scheduler raises ``update_batch received 0 values``. Switching the plugin to FromScratchColumnGenerator with generate_from_scratch(num_records) matches what the plugin actually does (produces a constant column without input) and works on both engines. - RegexFilterProcessor implemented process_before_batch with row-count changes. The async engine enforces row-count invariance in pre- and post-batch processor stages by design. Moving the filter to process_after_generation preserves the plugin's purpose (regex-based row filtering) at a stage that supports row-count changes on both engines. Test assertions check the final dataset, so the stage shift is transparent. Both changes are demo-plugin updates only; no production code change. * fix: address Codex review findings on async-default flip Three bugs and two test-quality concerns surfaced by an independent review of the prior commits. Each was real and worth fixing in the flip PR. Bug fixes - Sync-fallback path was creating async-only model clients. The default flip meant ``client_concurrency_mode = ASYNC`` for every default run, but the ``allow_resize=True`` path falls back to the sync engine — sync ``model.generate()`` calls then hit ``SyncClientUnavailableError``. The resolution decision now lives at the DataDesigner interface level via ``_resolve_client_concurrency_mode``: it considers both the env var and the config (allow_resize forces sync clients) and is passed explicitly to ``create_resource_provider``. Direct callers of the factory still get the env-var default. - Sync→async bridge timeout ignored the per-call ``timeout=`` override. A custom column calling ``model.generate(timeout=600)`` against a slow endpoint was being cancelled at the model-config default, not 600s. The bridge now prefers ``kwargs.get("timeout")`` over ``facade.request_timeout``. - Bridge timeout formula missed ``max_conversation_restarts``. One logical generation can do ``(1 + max_conversation_restarts) × (1 + max_correction_steps)`` HTTP requests; the formula now multiplies both, matching the worst-case attempt budget. Engine routing fix (also surfaced by failing e2e plugin tests) - ``_run_from_scratch`` else-branch passed an empty DataFrame to non-FromScratch generators classified as seeds (no upstream columns), so ``ColumnGeneratorFullColumn`` with no required_columns produced 0 rows for an ``rg_size``-row buffer. Now passes an ``rg_size``-row snapshot of the row-group buffer, mirroring the sync engine's FULL_COLUMN contract. - The earlier ``DemoColumnGeneratorImpl`` workaround (rewrite as ``FromScratchColumnGenerator``) is reverted; the engine fix subsumes it. The processor-plugin fix (``process_after_generation`` for the regex filter) stays — pre-batch row-count change is intentionally rejected by the async engine. Test improvements - Throttle no-deadline tests are parametrized over ``(timeout=0.0, raises)`` and ``(timeout=None, waits)``, pinning that ``None`` is genuinely distinct from any finite default. Sync and async counterparts mirror. - New regression tests for ``first_non_retryable_error`` surfacing covering both load-raises and load-returns-empty paths, asserting the original exception is chained via ``__cause__`` and that the typed ``DataDesignerEarlyShutdownError`` doesn't fire in this branch. - New parametrized regression test for ``_resolve_client_concurrency_mode`` covering all four (env × allow_resize) combinations. - New parametrized test for the per-call ``timeout=`` override flowing into the bridge timeout calculation. - Bridge formula tests extended with ``max_conversation_restarts`` cases. * test: trim redundant parametrize cases in async-default tests Three parametrize cases were duplicating coverage already provided by existing standalone tests: - ``test_acquire__timeout_branches`` parametrized over ``(0.0, raises)`` and ``(None, waits)``. The ``raises`` half duplicates ``test_acquire__raises_timeout_when_at_capacity``. Replaced with two focused ``..._default_no_deadline_waits_for_release`` tests covering only the no-deadline branch. - ``test_resolve_client_concurrency_mode_matches_engine_choice`` had four cases. The ``async-off + allow-resize`` case asserts ``SYNC`` because the env var alone forces it; the allow_resize input is moot. Dropped. - ``test_async_bridge_honors_per_call_timeout`` had three cases. The "override below floor" case cross-products the per-call override flow with the floor-clamping behavior already covered by ``test_compute_bridge_timeout``. Dropped. Net: -25 lines of test code with no loss of essential coverage. * docs: fold migration page into existing concept docs The standalone ``Migrating to the async default`` page didn't fit the existing docs style — present tense, behavior over comparisons, content in the natural concept home. Folding it in: - ``architecture-and-performance.md`` gets a new ``Async Engine`` section covering per-model timeouts, run outcomes (partial completion + ``DataDesignerEarlyShutdownError``), and the transitional opt-out. Three stale ``async engine is landing soon`` callouts updated to reflect the flip. - ``custom_columns.md`` gets two short notes: a thread-safety callout near Generation Strategies, and a mocking-with-spec note in Development Testing. - ``async-all-the-way-down.md`` Update section now points at the new arch-and-perf section. - README heads-up links to the same anchor. - ``migration-async-default.md`` removed; mkdocs.yml entry dropped. * docs: frame Execution Model as sync-engine specifics Small targeted edits to make the user-facing concept docs consistent with the post-flip state. No restructuring. - ``architecture-and-performance.md``: the ``Execution Model`` callout now opens with two engines, links to the new ``Async Engine`` section, and frames the existing column-at-a-time description as sync-engine semantics. The ``Step 2: Process columns sequentially`` paragraph notes the async engine relaxes this. The ``Key Concepts`` table differentiates per-engine for ``Batching`` and ``Sequential columns``; ``Parallel cells`` is the same on both. - ``processors.md``: added a warning callout about the async engine's row-count invariance in pre- and post-batch stages, with the guidance to use ``process_after_generation()`` for row-filtering or expansion. * fix: address review nits from PR #592 (Nabin) Four targeted fixes from the review. Worth-addressing (warning): - ``test_acquire_async_default_no_deadline_waits_for_release`` was spawning the release task without holding a strong reference. The loop's weak-ref bookkeeping could GC it before the inner ``await`` observes the release, producing a CI flake. Hold the task and ``await`` it in ``finally``. Take-it-or-leave-it (applied): - Root-cause error surfacing now includes the exception type name: ``f"🛑 {type(root_cause).__name__}: {root_cause}"`` so users see ``ValueError: ...`` instead of just the message string. The ``__cause__`` chain is preserved either way. - Drop the defensive ``getattr(c, "allow_resize", False)`` in ``_resolve_client_concurrency_mode`` — every member of ``ColumnConfigT`` inherits ``allow_resize: bool = False`` from ``SingleColumnConfig``. - One-line comment near the root-cause surfacing branch noting that ``actual_num_records == 0`` is async-only (sync runs leave it at ``-1``), so the branch is async-only by construction. Not addressed in this PR (filing as follow-ups): - ``SYNC_BRIDGE_TIMEOUT = 300`` still hardcoded in ``column_generators/generators/base.py:_run_coroutine_sync``. That bridge has no model-facade context to derive a timeout from, so the fix is a structural refactor outside this PR's scope. - First-error capture loses subsequent-error context. The "first wins" heuristic is documented; richer aggregation is a follow-up. * fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync This was the third hardcoded 300s timeout (Nabin flagged it on PR #592). The path is the generic sync→async bridge in ``ColumnGenerator.generate()``: when a subclass overrides only ``agenerate()``, the sync entry point runs the coroutine in a background thread. Same philosophy we applied to the throttle queue wait elsewhere in the PR: a defensive deadline on top of work that's already bounded by the HTTP timeout doesn't add safety, it just produces spurious failures on slow self-hosted endpoints. Drop the constant, the timeout exception handling, and the ``timed_out`` bookkeeping. ``pool.shutdown(wait=True)`` becomes the simple cleanup. Tests in ``test_async_generators.py`` exercise the happy path only and don't depend on the timeout firing. * Revert "fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync" This reverts commit `7a0b77d44c`. * docs+feat: deprecate the sync-engine opt-out path Nabin asked whether the docs should adopt explicit "deprecation" language on the opt-out path. Doing both: - Doc: ``architecture-and-performance.md``'s ``Opting out`` section now uses an ``!!! warning "Deprecated"`` admonition that names the env var as a deprecated escape hatch and notes the run-time warning. - Code: ``DataDesigner._resolve_client_concurrency_mode`` emits a ``DeprecationWarning`` when ``DATA_DESIGNER_ASYNC_ENGINE=0`` is detected. Same precedent as the existing ``allow_resize=True`` warning. Auto-fallback via ``allow_resize`` does not double-warn here; the builder layer emits its own warning later. - Test: parametrized regression now asserts ``pytest.warns(DeprecationWarning)`` on the opt-out branch and treats any warning on the async-on branches as a failure (``simplefilter("error")`` inside the ``catch_warnings`` block). * fix: emit logger.warning alongside DeprecationWarning on env-var opt-out Parity fix from Nabin's re-review of PR #592. The ``allow_resize=True`` auto-fallback path in ``_resolve_async_compatibility`` emits both a ``logger.warning("⚠️ ...")`` and a ``DeprecationWarning``. The new ``DATA_DESIGNER_ASYNC_ENGINE=0`` opt-out path was only emitting the ``DeprecationWarning``, leaving users who run with default warning filters silenced and inconsistent with the established precedent. Match the pattern: same message body, both signals, same stacklevel. * docs: breadcrumb explaining why SYNC_BRIDGE_TIMEOUT survives PR #592 Nabin's re-review pointed out that ``base.py`` is the lone place where the 300s pattern survives, while ``custom.py`` and ``throttle_manager.py`` both retired theirs. Without a comment, a future reader (or a lint sweep) could mistake this for an oversight and "consistency-fix" it the wrong way. Add a short note at the constant naming the two retired siblings, the reason this one stayed (no ``ModelFacade`` context to derive from), and the fact that it's tracked for a structural follow-up.	2026-05-04 16:22:13 -03:00
Andre Manoel	47c72b3d87	fix(async): pack of fixes for async engine under degraded providers (#585 ) Some checks are pending CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Lint and Format Check (push) Waiting to run Details CI / Check License Headers (push) Waiting to run Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details * fix(async): exclude all retryable errors from early-shutdown gate The gate previously only excluded `ModelRateLimitError`, leaving `ModelTimeoutError`, `ModelInternalServerError`, and `ModelAPIConnectionError` to count toward the sliding-window error rate. Under provider degradation these errors cluster in time (concurrent in-flight requests time out together), so 5/10 in a row is easy and trips the gate even when salvage could recover the rows. Refs #575. * feat(async): WARN log when provider showing degraded performance Diagnostic A/Bs against build.nvidia.com showed runs failing silently under provider degradation - no log indication that retryable errors were piling up until the early-shutdown gate fired (or, post-fix, until salvage exhaustion). Surfacing this earlier helps users distinguish "DataDesigner is broken" from "the upstream provider is slow today." Tracks a separate sliding window over retryable-vs-not for every task outcome (independent of the early-shutdown gate's window) and emits a throttled WARN when the rolling fraction crosses the threshold. Refs #575. * fix(async): salvage partial row groups on early shutdown Before: when the early-shutdown gate fired, any row group still in flight stayed in `_rg_states` un-checkpointed. The buffer manager later raised `FileNotFoundError` when the builder tried to read the finalized parquet. User-visible result: `0 records produced`. After: a new `_finalize_after_shutdown` step runs in `run()`'s finally block, after `_cancel_workers` has drained in-flight tasks (Codex caveat: in-flight `from_scratch`/`batch` tasks must not be allowed to write into a buffer that's being finalized). For each remaining row group it drops rows that aren't fully complete, then delegates to the existing `_checkpoint_completed_row_groups` so the buffer manager's zero-survivor handling (skip empty parquet, free buffer) kicks in unchanged. Also surfaces partial completion as a structured signal: scheduler exposes `early_shutdown: bool` and `partial_row_groups: tuple[int, ...]` properties so callers can detect partial completion programmatically rather than parsing log lines. Builder uses this to emit a more specific WARN distinguishing early shutdown from non-shutdown drops. Refs #575. * fix(throttle): reset consecutive_429s on non-rate-limit failure In `release_failure`, the cascade counter wasn't reset, so a sequence like 429 → 500 → 429 was treated as 2 consecutive 429s. The cascade counter feeds AIMD's reduce-once-per-cascade logic; the second 429 should start a fresh cascade and trigger another concurrency reduction, but currently doesn't. Standalone bug surfaced during #575 investigation; not on the failure path that drives the gate-trip outcome but worth fixing while we're in this code. * fix(custom): preserve retryability through CustomColumnGenerator wrap A real-workload run of #575 showed the early-shutdown gate still trips even with the gate-exclusion fix in place: the trigger is 10 timeouts inside Anonymizer's QA-repair custom columns, all wrapped in CustomColumnGenerationError (non-retryable) by the catch-all in CustomColumnGenerator. Two fixes here: 1. Re-raise RETRYABLE_MODEL_ERRORS unchanged before the wrap so the scheduler's _is_retryable correctly classifies them. 2. Surface _AsyncBridgedModelFacade timeouts as ModelTimeoutError instead of stdlib TimeoutError. Without this the sync bridge times out as the wrong exception type and is still classified non-retryable even after fix #1. Also moves _RETRYABLE_MODEL_ERRORS from async_scheduler to models/errors as the public RETRYABLE_MODEL_ERRORS tuple - both the scheduler and the wrap site need it, and models/errors is the appropriate home alongside the error class definitions. Refs #575. * feat(interface): typed DataDesignerEarlyShutdownError on zero-record runs When the async scheduler hits early shutdown and produces zero records, the buffer manager skips writing parquet (correctly), so ArtifactStorage.load_dataset_with_dropped_columns() raises FileNotFoundError. Previously this surfaced as a generic DataDesignerGenerationError wrapping the FileNotFoundError, which is ambiguous (could be missing files for any reason). This commit: - Adds DataDesignerEarlyShutdownError as a subclass of DataDesignerGenerationError so existing handlers still match while callers that want to react programmatically (retry on different alias, surface a degraded-provider message, etc.) can catch the specific type. - Plumbs the scheduler's structured signals (early_shutdown, partial_row_groups) up through the builder so they're available at data_designer.create() time without re-introspecting the scheduler. - create() raises the typed error in both failure modes (load fails or empty DataFrame returned) when builder.early_shutdown is True. Refs #575. * fix(async): emit first degraded-provider WARN regardless of clock state Initialize _last_degraded_warn_at to -inf so the first WARN is always emitted. The previous initialization to 0.0 suppressed the first WARN on fresh CI runners where time.monotonic() returns a small value (system boot uptime), making the throttle interval check (now - 0.0 < interval) true on the first attempt. * fix(async): address review findings on early-shutdown salvage PR Five real correctness issues caught in review of the original PR, plus a few smaller cleanups and test simplifications. Throttle - cascade reset (regression of existing AIMD invariant): release_failure() now resets consecutive_429s only when in_flight == 0. Resetting unconditionally broke "reduce once per cascade" when 429/500/429 arrived interleaved within a single in-flight burst - the second 429 was treated as a new cascade and the limit got halved twice for what was effectively one rate-limit event. Interface - typed-error gating: DataDesignerEarlyShutdownError now fires only when early_shutdown is true AND actual_num_records == 0. Without this, a partial-salvage run that fails to load for unrelated reasons (corrupt parquet, schema drift, disk hiccup) was misdiagnosed as "zero records produced," hiding the real cause. Async - WARN window scope: the degraded-provider warning was fed by every task outcome, including samplers and non-LLM customs. In realistic pipelines (one model column, several non-model columns) the rate stayed under threshold even when every model call was failing, silencing the WARN exactly when it mattered. Now gated on is_llm. Async/builder - signal preservation across raises: scheduler.early_shutdown and partial_row_groups are captured in a try/finally around future.result(), so a processor failure during the salvage path doesn't drop the structured signal. Both build() and build_preview() now reset per-run state at the start so reused builders don't leak prior-run flags. Async - dead code: dispatch_error capture in run() was unread (the post- finally check is unreachable on the exception path). Removed. Smaller cleanups: - early-shutdown WARN says "non-retryable error rate exceeded threshold" - bridge timeout WARN demoted to debug (ModelTimeoutError already surfaces it; the throttled degraded-provider WARN is the user-facing signal) - TODO note for threading degraded_warn_* through RunConfig - doc note in _finalize_after_shutdown clarifying that pre-batch processor isn't re-run on partial-salvage row groups Tests: - new regression tests for the cascade burst case, partial-salvage error gating, and LLM-only WARN window - direct unit test for _reset_run_state - dedup via _make_storage / _seed_plus_cell_setup helpers - WARN emission cases parametrized into a single test - shared parametrize lists hoisted to module-level constants - redundant cascade test dropped in favor of the more thorough drain variant; redundant healthy-baseline test folded into the zero-survivor test * chore(async): address Nabin's review comments Style cleanups, parametrization, docstring polish, and one consistency fix in the typed-error path. All non-blocking ("Ship it (with nits)"). interface/data_designer.py: - preview() now raises DataDesignerEarlyShutdownError when shutdown produced zero records (parity with create()), and also gates on actual_num_records == 0 so partial-salvage runs that fail to load don't get misdiagnosed - create()'s defensive empty-DF guard mirrors the load-failure guard with the same actual_num_records == 0 check async_scheduler.py: - _record_retryable_outcome docstring clarifies that the call site filters by is_llm; the function alone reads as if every outcome feeds the window dataset_builder.py: - moved _reset_run_state() down past the public methods to match the project's public-before-private convention test_custom.py: - flattened TestAsyncBridgedModelFacade class into module-level test functions (matches the rest of the file) - hoisted inline imports (asyncio, threading, patch, _AsyncBridgedModelFacade, SyncClientUnavailableError) to top of file - driven retryable-error parametrize off RETRYABLE_MODEL_ERRORS directly instead of the hand-rolled factory list, so new retryable types pick up coverage automatically - dropped the redundant "Sanity" block in test_async_bridge_timeout_raises_ model_timeout_error - pytest.raises already enforces the type, the duplicate block was running the same slow scenario twice test_async_scheduler.py: - parametrize over RETRYABLE_MODEL_ERRORS directly (same as above) test_data_designer.py: - added preview-path tests for the typed-error and partial-salvage fall-through cases - updated the existing empty-DF test to also patch actual_num_records=0 (otherwise the new gating in the empty-DF guard skips the typed error) * test(interface): consolidate create() error-dispatch tests into a matrix Five separate tests (two existing, three new from earlier in this PR) all probed the same dispatch logic in create(): "given a load outcome and a builder state, which error type should fire?" Pulled them into a single parametrized matrix indexed by (load_side_effect, early_shutdown, actual_num_records). Net result: 5 named tests → 1 parametrized test with 6 cells, and the previously-missing empty_df + shutdown + partial salvage cell is now covered. Test names retain readable IDs (load_fails_shutdown_zero_records etc.) so failures still pinpoint the exact case in pytest output.	2026-04-30 14:43:35 -03:00
Nabin Mulepati	05c2e8df2e	fix: normalize image_url blocks to OpenAI-compliant dict format (#577 ) * fix: normalize image_url blocks to OpenAI-compliant dict format (#576) ImageContext.get_contexts() produced bare-string and non-standard dict shapes for image_url content blocks, which broke the native OpenAI adapter (passes blocks through as-is) and only worked with Anthropic by accident via defensive handling in the translation layer. - Wrap all image_url values in {"url": ...} dict (OpenAI spec) - Remove non-standard "format" key from base64 dicts - Tighten Anthropic translate_image_url_block to require dict input Fixes #576 Made-with: Cursor * fix: reject malformed image_url blocks instead of silently dropping them translate_image_url_block now raises TypeError when image_url is not a dict. Since all image_url blocks are constructed internally, a bare string indicates an internal bug and should fail loudly. Made-with: Cursor * address review: tighten return type, add OpenAI + data-URI tests - Narrow _auto_resolve_context_value return type to dict[str, str] - Add OpenAI-client regression tests for image_url dict passthrough - Cover both bare-URL and bare-data-URI rejection in Anthropic tests Made-with: Cursor	2026-04-28 09:35:27 -06:00
Andre Manoel	bfa7a46c9a	chore: async engine readiness - blockers and polish before default (#553 ) * chore: async engine readiness blockers (#462) - Processor callback failures (pre-batch and post-batch) now raise DatasetGenerationError instead of silently dropping row groups - Early shutdown and all error paths drain in-flight workers via a finally block in AsyncTaskScheduler.run() - Pre-batch and post-batch processors that change row count in async mode raise immediately (strict_row_count guard) - Partial completion logs a warning when actual < target records - allow_resize=True auto-falls back to sync engine with a deprecation warning instead of raising, using a per-run _use_async flag - Preview path mirrors the trace check from the full build path; PreviewResults exposes task_traces Closes #462 * fix: address review findings for async engine readiness - Prevent double-wrapping of DatasetGenerationError in scheduler callbacks - Fix stacklevel in allow_resize DeprecationWarning to point at user code - Update stale comment to reflect fail-fast behavior - Rename misleading test and remove unused caplog fixture - Add zero-warnings assertion for happy-path case - Move warnings import to module level * fix: address review comments on async engine readiness - Extract _is_async_trace_enabled() helper to deduplicate trace check - Post-batch row-count guard now raises DatasetProcessingError (not DatasetGenerationError) so the scheduler wraps it with rg_id symmetrically with the pre-batch path - Add test_dropped_rows_reduce_actual_record_count for partial completion path * fix: address second-round review feedback on async engine readiness - DeprecationWarning no longer swallowed by interface error wrapper - Incomplete-RG log only fires on clean scheduler exits - Post-batch row-count guard moved into ProcessorRunner (strict_row_count) - Expose active_worker_count property on AsyncTaskScheduler - Drop unused monkeypatch fixture and pytest import * test: fold metadata-count test into dropped-rows test Remove test_write_metadata_records_actual_and_target_counts (poked _actual_num_records directly) and assert metadata counts in test_dropped_rows_reduce_actual_record_count instead, which exercises the same path through the public API.	2026-04-22 19:31:21 -03:00
Andre Manoel	f6128220da	fix: prevent sticky progress bar ghost lines from terminal wrapping (#565 ) * fix: prevent sticky progress bar ghost lines from terminal wrapping When a formatted bar line exceeded the terminal width, it wrapped to multiple physical lines but _drawn_lines only counted 1 per bar. Subsequent cursor-up clears missed the wrapped portions, leaving ghost lines that accumulated every redraw cycle. - Count physical lines in _redraw based on visible width vs terminal width - Cap rate at 9999.9 and eta at 999s to prevent stats field overflow - Remove max(10, ...) bar_width floor; degrade gracefully on narrow terminals - Add update_many() for batch updates with a single redraw cycle - Use update_many() in AsyncProgressReporter to reduce N redraws to 1 * test: strengthen wrapping and degradation tests per review feedback - Monkeypatch shutil.get_terminal_size to force narrow terminal in tests - Inject oversized lines via _format_bar patch to exercise physical line counting (ceiling division) code path - Assert output line width <= width-1 in graceful degradation test - Assert _drawn_lines == 1 in degradation mode (no false wrapping) * test: flatten test classes and replace private attribute assertions Address review feedback: use flat pytest functions per DEVELOPMENT.md conventions instead of class-based test suites. Move inline imports to module level. Replace most _drawn_lines and _bars assertions with public output proxies (counting CURSOR_UP_CLEAR sequences and checking rendered bar content). Keep _drawn_lines access only where no clean public proxy exists (multi-checkpoint add/remove test, zero-bars-remaining after log_final). * refactor: expose drawn_lines as public read-only property Replace _drawn_lines access in tests with a public property, consistent with the existing is_active pattern.	2026-04-22 13:19:12 -03:00
Przemysław Boruta	956b8cd30d	refactor: unify duplicate DAG construction (dag.py + ExecutionGraph) (#511 ) * refactor: unify DAG construction by moving topological sort into execution_graph.py Eliminates dag.py and its networkx dependency by moving topologically_sort_column_configs into execution_graph.py as a module-level function. Side-effect resolution is now O(1) via a side_effect_map dict (previously O(n²) linear scan). Kahn's algorithm is reused in-place rather than leaning on networkx.topological_sort. Closes #510 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: relax non-deterministic ordering assertion in test_dag test_judge and test_code_and_depends_on_validation_reasoning_traces have no mutual dependency and reach in-degree 0 simultaneously in Kahn's algorithm. Set iteration order varies with PYTHONHASHSEED, making the strict list assertion flaky. Assert only the topological invariants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(engine): document intentional skip.columns omission in topologically_sort_column_configs ExecutionGraph.create handles skip.when ordering edges in its own two-pass build; the pre-sort function only needs required_columns to produce a valid ColumnConfigT ordering for config compilation. * fix(engine): restore skip.columns edges in topologically_sort_column_configs The sync builder executes generators in compile-time sort order (from _column_configs, populated via this function), not ExecutionGraph order. Dropping skip.columns edges caused evaluate_skip_when to hit UndefinedError when the referenced column hadn't been generated yet, silently skipping rows. Also: - Refactor edge building into _add_edge() helper with a label parameter to distinguish "required" from "skip.when" edges in debug output - Rename test_dag.py -> test_topological_sort.py to match the new module location - Add from __future__ import annotations (required by AGENTS.md) - Add test_side_effect_column_ordering covering the side_effect_map.get() path - Add test_skip_when_column_ordering covering the skip.columns edge path * fix(tests): move SkipConfig import to module level in test_topological_sort * refactor(execution-graph): extract nested closures to module-level helpers, fix docs and test style - Extract `resolve`/`_add_edge` nested closures in `topologically_sort_column_configs` to module-level `_resolve_dag_column` and `_add_dag_edge` per STYLEGUIDE.md - Add self-edge guard in `_add_dag_edge` (consistent with `ExecutionGraph.create`) - Update `architecture/dataset-builders.md` to remove stale `dag.py`/NetworkX references - Fix import order in `test_topological_sort.py` (SkipConfig before column_configs) - Add `-> None` return annotations to legacy test functions * refactor(execution-graph): extract shared Kahn helper, inline resolve, fix module-level ordering - Extract `_kahns_topological_sort` shared helper used by both `ExecutionGraph.get_topological_order` and `topologically_sort_column_configs` - Inline `_resolve_dag_column` into `_add_dag_edge` (no other call sites) - Move private helpers after the public function (public-before-private per STYLEGUIDE) - Add docstring to `topologically_sort_column_configs` - Update architecture/dataset-builders.md to mention both sort sites for skip.when edges --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>	2026-04-22 10:13:46 -03:00
Eric W. Tramel	8be4ff787f	feat: add RunConfig jinja rendering engine (#557 )	2026-04-17 15:06:27 -04:00
Andre Manoel	a965bc1542	fix: bridge model.generate() to agenerate() for custom columns in async engine (#545 ) * feat: bridge model.generate() to agenerate() for custom columns in async engine Custom column generators that call model.generate() fail under the async engine because the sync HTTP client is unavailable. Add an _AsyncBridgedModelFacade proxy in _build_models_dict() that intercepts the sync-client RuntimeError and schedules agenerate() on the engine's persistent event loop via run_coroutine_threadsafe. Includes a deadlock guard for async custom columns running on the event loop. * refactor: wrap facades at sync call site, not in _build_models_dict Move _AsyncBridgedModelFacade wrapping from _build_models_dict() into _invoke_generator_function() so the async path gets raw facades. The bridge proxy is only needed for sync custom columns; async columns already have direct access to model.agenerate(). * fix: address review feedback - typed exception, timeout cleanup, kwargs test - Introduce SyncClientUnavailableError so the facade catches by type instead of matching error strings (review comment #1) - Add future.cancel() + logger.warning() on timeout to match the _run_coroutine_sync pattern in base.py (review comment #2) - Assert kwargs forwarding in the async bridge test (review comment #4) * fix: let SyncClientUnavailableError propagate through @catch_llm_exceptions The decorator catches all exceptions and wraps them into DataDesignerError, which prevented the async bridge proxy from ever seeing the original error. Add an early match case that re-raises SyncClientUnavailableError directly. * refactor: make SYNC_BRIDGE_TIMEOUT a public constant Drop the underscore prefix since the constant is exported and used across modules (base.py and custom.py).	2026-04-17 13:01:55 -03:00
Nabin Mulepati	a9af365e8e	feat: add skip.when conditional column generation (#502 ) * plan: add skip_when for conditional column generation (#479) Adds implementation plan for a `skip_when` field on `SingleColumnConfig` that enables conditional column generation. When the Jinja2 expression evaluates truthy, the cell is set to None and the generator is skipped. Skips auto-propagate through the DAG to downstream columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: remove HopChain example from skip_when plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: replace HopChain example with generic product review example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: add open questions on skip sentinel value and row filtering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * plan: major revision — SkipConfig model, sync engine support, decouple propagation - Introduce SkipConfig(when, value) as nested model on SingleColumnConfig - Move propagate_skip to SingleColumnConfig as independent field, fixing bug where columns with no SkipConfig couldn't participate in propagation - Add full sync engine implementation (Steps 4a-4d) covering both _fan_out_with_threads and _run_full_column_generator dispatch paths - Add serialization boundary stripping for both DatasetBatchManager (sync) and RowGroupBufferManager (async) - Simplify architecture diagrams for readability - Update all references, design decisions, verification plan Made-with: Cursor * updates * plan: document get_required_columns for skip propagation - Explain why propagation must not use get_upstream_columns() once skip.when adds DAG edges; add _required_columns and get_required_columns() to the execution graph plan - Point async _run_cell at get_required_columns for parity with sync - Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for DataFrames; tighten resolved-questions wording - Extend DAG/graph verification with gating_col regression case Refs #479 Made-with: Cursor * plan: centralize __skipped__ handling in skip_provenance - Document new skip_provenance.py (key constant, read/write/strip API) - Point sync builder, async scheduler, and batch buffers at shared helpers - Strip metadata before every DataFrame from buffer dicts, including FULL_COLUMN active subsets - Split §3 into skip_evaluator vs skip_provenance; extend verification Refs #479 Made-with: Cursor * plan: align doc title with SkipConfig / skip.when Drop legacy skip_when naming in headings and #362 cross-reference. Refs #479 Made-with: Cursor * plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization - SkipConfig._validate_when_syntax now checks find_undeclared_variables is non-empty, rejecting expressions without {{ }} delimiters that would silently skip every row - evaluate_skip_when centralizes try/except so both sync and async engines get identical fail-safe behavior on eval errors - evaluate_skip_when takes a single pre-deserialized record; caller runs deserialize_json_values once and passes to both skip eval and generator (no double deserialization, no redundant parameter) - Update _should_skip_cell, async _run_cell, Files Modified table, and verification section accordingly Refs #479 Made-with: Cursor * plan: add get_side_effect_columns accessor to execution graph spec Document _side_effects_by_producer inverse map and get_side_effect_columns() accessor on ExecutionGraph, needed by _write_skip_to_record / apply_skip_to_record to clear __trace, __reasoning_content, etc. on skip. Added to both Step 2b metadata section and Files Modified table. The __skipped__ leak into active_df (greptile's other P1) was already fixed in `70463789` via strip_skip_metadata_from_records. Refs #479 Made-with: Cursor * add skip.when conditional column generation Introduce SkipConfig on SingleColumnConfig to gate column generation with a Jinja2 expression. Columns can be skipped by expression or by upstream propagation (propagate_skip flag). - SkipConfig: Pydantic model with config-time syntax/delimiter/variable validation and cached column extraction from the Jinja2 AST - skip_evaluator: runtime expression evaluation via NativeSandboxedEnvironment with fail-safe error handling (skip on expected failures) - skip_provenance: centralized __skipped__ record tracking shared by sync builder, async scheduler, and buffer managers - DAG/ExecutionGraph: skip.columns wired as dependency edges in both topological sort and static execution graph - Validation: validate_skip_references checks reference existence, sampler/seed scope, and allow_resize conflicts - Sync builder: cell-by-cell and full-column skip with merge-back - Async scheduler: cell and batch skip with live-buffer provenance Made-with: Cursor * fix review findings for skip.when implementation - Add skip evaluation to _fan_out_with_async (was missing, causing skipped rows to still be sent to the LLM) - Preserve __skipped__ provenance on non-skipped records after full-column generation so multi-hop propagation works - Use single live-buffer reference in _run_batch skip loop for consistency with _run_cell - Move Template import to TYPE_CHECKING and reorder import blocks - Replace O(n²) sum() with itertools.chain in dag.py - Add set_required_columns/set_propagate_skip/set_skip_config setters to ExecutionGraph for symmetry with existing API Made-with: Cursor * add conditional generation with skip recipe and refactor skip helpers Add a new recipe demonstrating skip.when patterns (expression gate, propagation, opt-out) with a customer support ticket pipeline. Also extract _should_skip_record in async_scheduler, remove the redundant propagate_skip param from should_skip_by_propagation, and pass a precomputed all_side_effects set through the DAG sort. Made-with: Cursor * updates * fixes * remove recipe > inject conditional gen into existing tutorial * regen colab notebooks * fix: handle missing execution graph in _column_can_skip Return False when the graph has not been initialized instead of raising, since skip logic cannot apply before generators are set up. Made-with: Cursor * parametrize some tests * public before private * slight refactor for readability * parametrize some tests * minor fixes * reanme internla skip tracker key name * clarify intent in comment * when skipped _run_cell should return skipped value even though the consumer doesn't currenlty care about it * remove inline import * minor refactor for clarity * fix: preserve skip metadata across replace_buffer and exclude allow_resize from skip branch Two bugs in the sequential engine's _run_full_column_generator: 1. replace_buffer(df.to_dict()) erased __internal_skipped_columns in three code paths (MultiColumnConfig, non-skip-aware, has_skipped=False fallthrough), breaking propagate_skip for downstream columns when an independent FULL_COLUMN generator ran between skip-setting and propagating columns. 2. _column_can_skip returned True for allow_resize=True columns via propagation, causing the skip-aware merge path to raise on the 1:1 row-count check for 1:N generators. - Add restore_skip_metadata helper to skip_tracker.py - Guard _column_can_skip against allow_resize=True columns - Refactor _run_full_column_generator into three focused methods - Remove dead allow_resize / _log_resize_if_changed from skip path - Remove redundant _require_graph() calls in skip helpers - Add single_column_config_by_name cached property - Add integration tests for both bugs and unit tests for the helper Made-with: Cursor * address review comments on skip.when PR (#502) - Extract shared skip decision logic (_should_skip_cell / _should_skip_record) into should_skip_column_for_record() in skip_evaluator.py so both sync and async engines call the same function (andreatgretel review comment) - Extend SkipConfig self-reference validation to cover side-effect columns (e.g. review__trace on the review column) — previously only checked self.name, now checks self.name \| self.side_effect_columns - Add async engine integration tests for skip paths: cell-by-cell with propagation and full-column batch skip (exercises _run_cell / _run_batch) - Fix test_allow_resize_column_not_blocked_by_upstream_skip to use default propagate_skip=True so it actually exercises the allow_resize guard - Move get_skipped_column_names from skip_tracker to skip_evaluator (sole production consumer) Made-with: Cursor * address cr feedback * Fix issue with full column generating messing up order of skipped rows * add skip conditional generation edge case tests - test_skip_evaluator: parametrized should_skip_column_for_record covering propagation, expression gates, short-circuiting, and disabled propagation - test_execution_graph: skip metadata accessors (get_skip_config, should_propagate_skip, get_required_columns, get_side_effect_columns, resolve_side_effect, skip.when DAG edges) - test_dataset_builder: chained transitive propagation (4 levels), two independent skip gates, custom skip.value, row count preservation Made-with: Cursor * fix: make expression jinja validator private Rename assert_expression_valid_jinja to _assert_expression_valid_jinja to match the private naming convention used by other model validators. Made-with: Cursor --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 09:31:50 -06:00
Eric W. Tramel	64f31bc581	feat: add generic and OpenRouter attribution headers (#542 )	2026-04-14 11:59:49 -04:00
Andre Manoel	533a94b78f	fix: async engine side-effect column propagation and collision resolution (#509 ) * fix: async engine side-effect column propagation and collision resolution ExecutionGraph.set_side_effect() now uses first-writer-wins instead of last-writer-wins, matching sync engine semantics where earlier consumers see the first producer's value. This prevents false DAGCircularDependencyError when multiple generators declare the same side-effect column at different pipeline stages. AsyncTaskScheduler now includes side-effect columns in _instance_to_columns so their values are written to the RowGroupBufferManager and available to downstream prompt templates. Fixes #508 * fix: separate side-effect columns from completion tracking in async scheduler Side-effect columns added to _instance_to_columns caused KeyError in CompletionTracker._validate_strategy() because they are not registered in the execution graph. Split into _instance_to_write_columns (buffer writes, includes side-effects) and _instance_to_columns (completion tracking, real columns only). * fix: warn on side-effect collision and clarify scheduler column maps Log a warning when multiple producers register the same side-effect column (first-writer-wins still applies). Rename _instance_to_columns and _instance_to_write_columns per review feedback for clarity. * fix: raise ConfigCompilationError on duplicate side-effect producers Replace first-writer-wins collision handling with a hard error. Each side-effect column must have exactly one producer; duplicates are a configuration issue to be fixed at the source. * fix: reject duplicate side-effect producers in sync DAG path Mirror the async path check: raise ConfigCompilationError when two custom columns declare the same side-effect column name during topological sort.	2026-04-13 20:16:06 -03:00
Johnny Greco	4a2813654a	fix: always return ISO-8601 from datetime postproc (#484 ) (#512 ) * fix: always return ISO-8601 from datetime postproc (#484) The DatetimeFormatMixin.postproc heuristics inferred output format from value distribution, silently stripping date/time components for small datasets or narrow date ranges. Replace with deterministic ISO-8601 output via vectorized strftime. Users who need custom formats can still set convert_to on the SamplerColumnConfig. * docs: update convert_to docstring and add DatetimeFormatMixin docstring The SamplerColumnConfig.convert_to docstring incorrectly stated that only "float", "int", or "str" are accepted. Datetime/timedelta samplers accept strftime format strings. Also document the ISO-8601 default. * test: add regression test for #484 via DataDesigner.preview API Captures the exact reproducer from the issue: a single-record datetime preview through the public DataDesigner.preview() interface must return a full ISO-8601 timestamp, not a bare year string. * test: trim redundant datetime tests, align reproducer with issue #484 - Remove postproc_same_day_records (subsumed by same_month + no_convert_to) - Remove postproc_always_parseable (subsumed by stdlib_fromisoformat) - Remove all_same_month integration test (subsumed by narrow_range_single_day) - Update single_record test to use unit="h" matching the issue reproducer * fix: address review nits — move datetime import to module scope, drop redundant isinstance	2026-04-09 12:50:40 -04:00
Johnny Greco	fdd5ebb5ef	feat: add Pi Coding Agent rollout seed source (#513 ) (#514 ) Add support for ingesting Pi Coding Agent session artifacts as an agent rollout seed source. Pi sessions are tree-structured JSONL files; the handler resolves the active conversation path by walking from the last entry back to the root via id/parentId links. Key points: - Tree-structured sessions with automatic active-path resolution - Entry-level types: model_change, compaction, branch_summary, custom_message, thinking_level_change - Message roles: user, assistant (inline ToolCall/ThinkingContent/ TextContent blocks), toolResult, bashExecution (synthesized as tool-call pairs), custom, compactionSummary, branchSummary - Extract shared normalize_message_content to utils.py (was duplicated in Hermes handler)	2026-04-09 12:07:47 -04:00
Nabin Mulepati	c27ad62a14	fix: use non-blocking dispatch to prevent pipeline starvation (#504 ) (#505 ) Replace blocking semaphore acquire in the dispatch loop with a non-blocking try_acquire that breaks out when the semaphore is full. This causes the outer loop to re-query the frontier, picking up newly-ready downstream tasks instead of draining a stale snapshot. Fixes #504 Made-with: Cursor	2026-04-08 13:50:26 -06:00
Eric W. Tramel	7891dd53cb	feat: add Hermes Agent rollout support (#500 )	2026-04-07 12:39:49 -04:00
Eric W. Tramel	58870bb83f	feat: add ATIF rollout ingestion (#495 )	2026-04-06 11:06:14 -04:00
Nabin Mulepati	d43ac1cb2e	test: add transport-wiring regression tests for #459 (#485 ) * test: add transport-wiring regression tests for #459 The existing test_client_limits_respect_max_parallel_requests only checks the limits property, which was always computed correctly even before the fix. Add mock-capture tests that patch HTTPTransport and AsyncHTTPTransport constructors, trigger lazy init via completion() / acompletion(), and assert the constructors received the correct limits. These tests fail on the pre-fix code (assert_called_once fails because the old code never explicitly constructed the transport). Made-with: Cursor * test: address review — parametrize over AnthropicClient, use helpers Parametrize all three #459 regression tests over both OpenAICompatibleClient and AnthropicClient (wiring lives in HttpModelClient, so both subclasses need coverage). Use the existing _make_openai_client / _make_anthropic_client helpers with **kwargs instead of constructing clients directly. Move transport patch constants up with the other module-level constants. Made-with: Cursor	2026-04-01 13:54:33 -06:00
Przemysław Boruta	97119863da	fix: respect max_parallel_requests in HTTP connection pool size (#460 ) * fix: respect max_parallel_requests in HTTP connection pool size Pass a pre-configured HTTPTransport/AsyncHTTPTransport with the correct limits into RetryTransport instead of letting it create its own pool with httpx defaults (100 connections). Previously, the limits calculated from max_parallel_requests were passed to httpx.Client(limits=...) which silently ignores them when a custom transport is provided. Fixes #459 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * fix: document transport param and annotate private attr chains in tests Address Greptile P2 review comments on PR #460: - Add docstring entry for the new `transport` parameter in `create_retry_transport` explaining accepted types and None default - Add inline comments in pool-size regression tests explaining the private attribute chain (_sync/_async_transport → _pool → _max_connections) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * refactor: address review feedback — public limits property, clean tests - Drop three tests from test_retry.py that reached into private attributes of third-party RetryTransport (_sync_transport, _async_transport); the end-to-end contract is covered by the pool-size regression test - Expose a public `limits` property on HttpModelClient so tests and diagnostic code can assert the pool configuration without walking private attribute chains across three libraries - Replace two private-chain pool assertions with a single `client.limits.max_connections == 600` check against the new property - Trim "inner" from the transport docstring entry (nabinchha suggestion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> --------- Signed-off-by: Przemysław <przemekboruta@interia.pl> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>	2026-03-31 15:50:54 -06:00
Andre Manoel	c146af5146	chore: async engine follow-up - rename, preview, lifecycle, progress (#456 ) * chore: rename ColumnWiseDatasetBuilder, wire async preview, unify row-group lifecycle - Rename ColumnWiseDatasetBuilder to DatasetBuilder and column_wise_builder.py to dataset_builder.py, update all references - Extract _prepare_async_run() factory shared by build and preview paths - Add _build_async_preview() for async preview with no disk checkpoints - Replace on_row_group_complete/on_checkpoint_complete with single on_finalize_row_group callback; caller handles checkpointing - Add free_row_group() on RowGroupBufferManager for discard-without-write - Free fully-dropped row groups instead of finalizing them - Add consolidated AsyncProgressReporter for async generation logging Closes #437, closes #442, closes #444 * feat: add consolidated async progress reporting with row group context - Add AsyncProgressReporter: groups per-column progress into a single log block emitted at configurable intervals (default 5s) - Add quiet mode to ProgressTracker to suppress per-tracker logging when used with the consolidated reporter - Add ContextVar-based row group tagging (RG1, RG2, ...) for log messages emitted inside async tasks (samplers, expressions, seeds) - Add progress_interval to RunConfig for user-configurable reporting - Remove log_start_async from ProgressTracker (superseded by reporter) Closes #443 * fix async preview and progress reporting * feat: add opt-in sticky ANSI progress bars for generation Add RunConfig.progress_bar setting that replaces periodic log-line progress with sticky terminal bars that stay at the bottom while logs scroll above. Pure ANSI escape codes, no new dependencies. Disabled by default - existing log-based output unchanged. * docs: add missing progress_interval docstring in RunConfig * fix: update progress bar on every completion in async path Skip the time gate when the progress bar is active so the bar redraws on every record instead of every progress_interval seconds. * fix: resolve row-group semaphore deadlock when all tasks are deferred When all tasks for admitted row groups fail with transient errors, the row-group semaphore never releases, blocking admission of new row groups. Fix by salvaging stalled row groups inline - retrying deferred tasks immediately so row groups can checkpoint and free their semaphore slots. Also updates row group log format to (x/X) with leading zeros. * fix: eagerly salvage stalled row groups to avoid wasting semaphore slots Run inline salvage after every checkpoint pass instead of only when globally stalled. Row groups with 0 in-flight and only deferred tasks are salvaged immediately, freeing their semaphore slot for new work. * fix: address review findings from greptile and codex - Use `with` statement for progress bar context (safe __exit__ on error) - Check bar.is_active instead of bar is not None (non-TTY fallback) - Record failures (not skips) for tasks that exhaust salvage retries - Record skipped tasks when pre-batch filtering drops rows * fix: stable progress bar width and accurate failure counts - Pre-compute fixed stats width at bar creation to prevent bar resizing when failed count appears - Cap displayed completed at total to avoid >100% on retries - Exclude already-failed columns from skip recording to prevent double-counting in progress reporter * fix: address Nabin's review - exclude_columns, dead code, docstring - Add exclude_columns={task.column} on non-retryable batch/from_scratch drop path to prevent double-counting (same pattern as cell path) - Simplify salvage drop to per-task exclude (is_dropped guard handles multi-column case) - Remove dead _in_flight_for_rg method - Fix context.py docstring to match actual (x/X) format * fix: _drain_frontier exits before dispatching ready salvage tasks After salvage discards a cell task from dispatched (making it available in the frontier), _drain_frontier broke immediately because nothing was in-flight yet. The task and its downstream were never re-dispatched, leaving the row group incomplete. Fix: only break when both ready and in-flight are empty. * fix: salvage edge cases found by code review - _drain_frontier: only break when both ready and in-flight are empty - _salvage_rounds: re-mark sibling columns as dispatched after from_scratch retry to prevent duplicate dispatch - _salvage_stalled_row_groups: separate exhausted tasks from new drain failures to avoid treating non-stalled tasks as permanent - _checkpoint_completed_row_groups: clean up deferred tasks for checkpointed row groups - Early shutdown: salvage stalled row groups before exiting * fix: skip record_failure for already-dropped rows in salvage When multiple columns fail for the same row, the first drop records a skip for the other column. Without this guard, record_failure fires again for the second column, double-counting it.	2026-03-25 14:50:48 -03:00
Nabin Mulepati	1356408c6e	feat: remove litellm dependency and bridge path (#455 ) * adjust plan * feat: remove litellm dependency and bridge path (PR-7) - Delete litellm_bridge.py adapter, litellm_overrides.py, and their tests - Remove LiteLLM fallback branch and DATA_DESIGNER_MODEL_BACKEND env var from clients/factory.py; unknown provider_type now raises ValueError - Remove apply_litellm_patches() call from models/factory.py - Remove LiteLLM exception match arms and DownstreamLLMExceptionMessageParser from models/errors.py; port context window detail extraction to _extract_context_window_detail for native ProviderError path - Remove litellm from lazy_heavy_imports.py and pyproject.toml runtime deps - Remove flatten_extra_body parameter from TransportKwargs.from_request - Clean up LiteLLM references in docstrings, comments, and AGENTS.md - Add full ProviderErrorKind test coverage to test_model_errors.py - Update benchmark script to patch OpenAICompatibleClient instead of CustomRouter Made-with: Cursor * fix: forward tools to _fake_response in benchmark patch The old CustomRouter patch forwarded *kwargs (including tools) to _fake_response, but the new OpenAICompatibleClient patch only passed model and messages — silently disabling tool-call simulation in benchmark scenarios that exercise allow_tools. Made-with: Cursor fix: address PR-7 review feedback - Return ChatCompletionResponse from benchmark fakes instead of FakeResponse to match the native client contract (facade expects .message, not .choices[0].message) - Add ids= to parametrize block in test_model_errors.py for readability - Remove unnecessary try/except from _extract_context_window_detail; the `if marker in` guard is sufficient - Make context window marker match case-insensitive - Replace stale httpx.AsyncClient callout in async_concurrency.py docstring with generic "async-stateful resources" Made-with: Cursor	2026-03-24 11:41:00 -06:00
Nabin Mulepati	f24e88d0f8	feat: wire ThrottledModelClient and dual-semaphore scheduler (#449 ) * feat: constrain HttpModelClient to single concurrency mode Constrain each HttpModelClient instance to sync or async at construction time, eliminating dual-mode lifecycle complexity that caused transport leaks and cross-mode teardown bugs. - Add ClientConcurrencyMode StrEnum replacing Literal type alias - Add concurrency_mode constructor param with mode enforcement guards on _get_sync_client / _get_async_client - Simplify close()/aclose() to single-mode teardown (cross-mode calls are no-ops) - Thread client_concurrency_mode through factory chain from DATA_DESIGNER_ASYNC_ENGINE env var - Add ModelRegistry.arun_health_check() async mirror and wire async dispatch in ColumnWiseDatasetBuilder - Make ensure_async_engine_loop public (used cross-module) - Fix test helpers to derive concurrency mode from injected client - Add PR-5 architecture notes * fix: address review findings on HttpModelClient lifecycle - Fix transport double-close in close()/aclose() by delegating teardown to the httpx client when one exists (if/elif pattern); only close transport directly if no client was ever created - Reject mismatched client/mode injection in constructor (e.g. async_client on a sync-mode instance raises ValueError) - Add 5-minute wall-clock timeout to future.result() in async health check dispatch - Add constructor validation tests for both mismatch directions - Update PR-5 architecture notes Made-with: Cursor * fix: address PR-5 review feedback on HttpModelClient lifecycle - Type _transport as RetryTransport \| None, removing type: ignore suppressions in close()/aclose() - Make transport fully lazy (None by default) and accept optional transport constructor param so injected-client paths don't eagerly allocate an unused RetryTransport - Cancel future on TimeoutError in async health check dispatch so timed-out coroutines don't linger on the shared event loop - Set health check timeout to 180s (3 min) matching architecture notes - Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the list parametrizes both sync and async mode tests - Update architecture notes timeout from 180→300 back to 180 to match implementation Made-with: Cursor * add plan for pr-6 * update plan * feat: wire ThrottledModelClient and dual-semaphore scheduler for AIMD concurrency control Introduces per-request AIMD throttle feedback by wrapping every ModelClient with ThrottledModelClient (acquire/release around each HTTP call) and adding a dual-semaphore scheme to AsyncTaskScheduler that separates submission slots from LLM-wait slots via a one-way handoff. Key changes: - ThrottledModelClient: decorator that acquires/releases ThrottleManager permits around sync and async completion, embedding, and image calls, routing 429s to release_rate_limited for AIMD backoff. - RetryConfig: removes 429 from retryable_status_codes so rate-limit signals propagate to the AIMD feedback loop instead of being silently retried at the transport layer. - AsyncTaskScheduler: adds TrackingSemaphore, LLM-wait semaphore, and is_llm_bound-driven handoff; extracts _main_dispatch_loop and _salvage_rounds; tracks worker tasks for clean cancellation. - ColumnGenerator.is_llm_bound property on base, model-registry, and custom generators to classify which columns need the handoff. - ThrottleConfig (Pydantic model on RunConfig) for user-tunable AIMD knobs (reduce_factor, additive_increase, success_window, block_seconds). - ModelRegistry.get_aggregate_max_parallel_requests() to size the LLM-wait semaphore from registered model configs. - Renames throttle.py -> throttle_manager.py for clarity. Made-with: Cursor * fix: address PR-6 review feedback on concurrency safety - Shield _cancel_workers() with asyncio.shield in the CancelledError handler to prevent double-cancellation from leaking semaphore permits - Guard ThrottleManager release calls in _throttled_sync/_athrottled with try/except so a failing release doesn't mask the original error or permanently leak the in-flight permit counter - Add registration guard in ThrottleManager.try_acquire() that raises RuntimeError if called before register(), making the ordering invariant explicit rather than relying on temporal coupling - Add is_registered() public query method on ThrottleManager - Expand TrackingSemaphore docstring noting CPython _value dependency Made-with: Cursor * fix test * feat: AIMD throttle refinements — cascade dampening, ceiling stabilization, early shutdown exclusion Dampen aggressive cascade 429 reduction (only the first 429 in a burst reduces the limit; subsequent cascade 429s release permits without further reduction). Add rate_limit_ceiling with configurable overshoot to stabilize concurrency instead of sawtooth recovery. Exclude ModelRateLimitError from early shutdown error rate. Consolidate AIMD defaults into ThrottleConfig ClassVar constants as single source of truth, and update ThrottleManager to accept ThrottleConfig directly. Update architecture notes for PR-6. Made-with: Cursor * fix: address PR-6 review feedback on retry boundary, capacity polling, and test coverage - Keep 429 in transport retry list for sync-mode clients (no salvage queue); only strip for async-mode where AIMD + salvage handles it. Add strip_rate_limit_codes kwarg to create_retry_transport; wire through HttpModelClient based on concurrency mode. - Return CAPACITY_POLL_INTERVAL (50ms) instead of default_block_seconds (2s) from try_acquire when at capacity but not rate-limited, so callers poll responsively when a slot could free in milliseconds. - Document registration ordering invariant on create_model_client's throttle_manager param. - Add semaphore permit assertions to test_scheduler_llm_bound_one_way_handoff matching the non-LLM counterpart test. - Update architecture notes for all changes. Made-with: Cursor * add missing test * fix: consolidate ThrottleConfig and improve naming Move ceiling_overshoot from ThrottleManager constructor kwarg into ThrottleConfig (single source of truth for all AIMD tuning knobs). Rename block_seconds → cooldown_seconds and block_duration → cooldown_duration throughout for clarity. Remove DEFAULT_CEILING_OVERSHOOT module constant from throttle_manager.py. Made-with: Cursor * fix: annotate state capture invariant in acquire_sync/acquire_async Add inline comment explaining why the captured DomainThrottleState reference is safe to reuse in the finally block (objects are never replaced after creation). Made-with: Cursor	2026-03-24 09:03:11 -06:00
Nabin Mulepati	015d4f188f	feat: Constrain HttpModelClient to single concurrency mode... (#439 ) * feat: constrain HttpModelClient to single concurrency mode Constrain each HttpModelClient instance to sync or async at construction time, eliminating dual-mode lifecycle complexity that caused transport leaks and cross-mode teardown bugs. - Add ClientConcurrencyMode StrEnum replacing Literal type alias - Add concurrency_mode constructor param with mode enforcement guards on _get_sync_client / _get_async_client - Simplify close()/aclose() to single-mode teardown (cross-mode calls are no-ops) - Thread client_concurrency_mode through factory chain from DATA_DESIGNER_ASYNC_ENGINE env var - Add ModelRegistry.arun_health_check() async mirror and wire async dispatch in ColumnWiseDatasetBuilder - Make ensure_async_engine_loop public (used cross-module) - Fix test helpers to derive concurrency mode from injected client - Add PR-5 architecture notes * fix: address review findings on HttpModelClient lifecycle - Fix transport double-close in close()/aclose() by delegating teardown to the httpx client when one exists (if/elif pattern); only close transport directly if no client was ever created - Reject mismatched client/mode injection in constructor (e.g. async_client on a sync-mode instance raises ValueError) - Add 5-minute wall-clock timeout to future.result() in async health check dispatch - Add constructor validation tests for both mismatch directions - Update PR-5 architecture notes Made-with: Cursor * fix: address PR-5 review feedback on HttpModelClient lifecycle - Type _transport as RetryTransport \| None, removing type: ignore suppressions in close()/aclose() - Make transport fully lazy (None by default) and accept optional transport constructor param so injected-client paths don't eagerly allocate an unused RetryTransport - Cancel future on TimeoutError in async health check dispatch so timed-out coroutines don't linger on the shared event loop - Set health check timeout to 180s (3 min) matching architecture notes - Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the list parametrizes both sync and async mode tests - Update architecture notes timeout from 180→300 back to 180 to match implementation Made-with: Cursor * fix: address Greptile review findings on health check methods - Use bare `raise` instead of `raise e` in both run_health_check and arun_health_check to preserve original traceback frames - Add GenerationType.IMAGE test coverage for sync and async health checks (stub-image config + generate_image/agenerate_image patches) Made-with: Cursor * Update packages/data-designer-engine/tests/engine/models/test_model_registry.py	2026-03-20 12:23:03 -06:00
Andre Manoel	7e812630cf	feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder (#429 ) * feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder * chore: add async benchmark notebook and demo scripts * fix: address all PR review comments on async builder integration - Wire on_batch_complete through on_row_group_complete callback - Mark trailing slots as dropped in replace_dataframe when processor filters rows - Ensure checkpoint still runs when on_before_checkpoint raises - Gate non-seed task dispatch on pre-batch completion - Add public run_pre_batch_on_df to ProcessorRunner (replaces private _run_stage call) - Add public is_column_complete_for_rg to CompletionTracker (replaces private _completed access) - Type task_traces as list[TaskTrace] in results.py - Add async_trace docstring to RunConfig - Move module-level log into _build_async - Add replace_dataframe unit tests (same-size, dropped rows, fewer rows) - Assert on public outcomes in scheduler tests instead of private attributes - Parametrize allow_resize validation tests - Cache seed_cols before main loop - Remove redundant disable_early_shutdown from AsyncTaskScheduler * style: fix ruff format for lambda expression * fix: address open review issues on async scheduler - Flush completed row groups before breaking on early shutdown (data loss) - Change error rate check from >= to > so disable_early_shutdown sentinel (1.0) never triggers at 100% failure rate - Extract seeds-complete check into helper and call it in salvage rounds via _drain_frontier, with pre-batch gating, so pre-batch processor runs even when seed tasks succeed only after retry - Fix is_column_complete_for_rg to check _batch_complete first, then verify all non-dropped rows for CELL_BY_CELL columns - Replace O(\|in-flight\|) scan in _in_flight_for_rg with per-RG counter * fix: sync pre-batch row drops to CompletionTracker and restore stderr safely Pre-batch processors that filter rows marked them as dropped in RowGroupBufferManager but not in CompletionTracker, causing unnecessary LLM calls for rows that would be discarded at checkpoint time. Also wrap the benchmark warmup stderr redirect in try/finally so stderr is restored if _run_once raises. * fix: prune _admitted_rg_ids on row group checkpoint Prevents unbounded growth of the admission set across large runs. * chore: remove demo/async files from PR Dev-time benchmarks and manual test scripts - kept locally, not needed in the PR. * fix: wire disable_early_shutdown into AsyncTaskScheduler RunConfig.disable_early_shutdown was forwarded to the sync executor but silently ignored in the async path. Now passed through to the scheduler's _check_error_rate. * test: add e2e test for async engine concurrency Verifies the async scheduler dispatches independent LLM columns concurrently by checking for overlapping task trace intervals. Uses a wide DAG (sampler -> 2 parallel LLM columns) with 2 records. Requires NVIDIA_API_KEY. * fix: drop row group on on_before_checkpoint failure instead of writing unprocessed data Matches on_seeds_complete failure behavior and avoids silently checkpointing unfiltered rows when a post-batch processor fails. * fix: skip on_before_checkpoint when no POST_BATCH processors configured Avoids unnecessary DataFrame round-trip for every row group in the common case where no post-batch processors exist. * fix: address remaining review nits from nabinchha and greptile summary - Gate on_seeds_complete on PRE_BATCH processors (matches on_before_checkpoint pattern) - Cache seed_cols as instance attr instead of recomputing in _dispatch_seeds - Iterate list(self._active_rgs) snapshot in _run_seeds_complete_check - Add logger.debug to telemetry except block - Add design comment on on_before_checkpoint failure drop behavior - Rename row_group param to row_group_index in is_column_complete_for_rg - Document rg_id as current_batch_number equivalence - Use mock.patch.object in e2e test instead of direct mutation - Add max(0, ...) floor guard on _in_flight_counts decrement - Rename _ensure_async_engine_loop to public ensure_async_engine_loop - Move AsyncTaskScheduler import to module level in integration tests * fix: preserve async callback contract and e2e setup * fix: prune _seeds_dispatched_rgs and _pre_batch_done_rgs on checkpoint * refactor: consolidate per-RG state into _RowGroupState dataclass Replace 5 independent collections (_active_rgs, _admitted_rg_ids, _seeds_dispatched_rgs, _pre_batch_done_rgs, _in_flight_counts) with a single _rg_states dict keyed by row group ID. Cleanup is now a single `del` instead of N separate discards, eliminating the class of bugs where one collection is missed during row group teardown. * fix: skip checkpoint and callbacks when on_before_checkpoint fails When on_before_checkpoint raises and all rows are dropped, the code previously fell through to checkpoint_row_group and on_row_group_complete, sending a spurious progress notification for a batch with zero records. Now gates both on a `dropped` flag so they are skipped after failure. * fix: snapshot dropped rows before await in _run_batch and sync tracker on checkpoint failure Two fixes: - _run_batch: snapshot dropped rows before `await agenerate` so the row-count expectation matches batch_df. Concurrent tasks can drop rows during the await, causing a spurious ValueError that would drop the entire row group. Write-back now re-checks is_dropped to skip rows dropped mid-flight. - _checkpoint_completed_row_groups: add tracker.drop_row alongside buffer_manager.drop_row when on_before_checkpoint fails, keeping both in sync. * feat: sliding window error rate and out-of-order row group completion test Replace cumulative error counters with a deque-based sliding window so that early transient failures do not permanently inflate the error rate in long-running jobs. Add tests for the sliding window recovery path and for deterministic out-of-order row group checkpoint ordering. * fix: use real time delays in out-of-order completion test asyncio.sleep(0) interleaving is not deterministic across Python versions. Switch to asyncio.sleep(num_records * 0.02) so the smaller row group genuinely finishes seeds first regardless of event loop scheduling. * fix: prevent ZeroDivisionError when shutdown_error_window is 0 Change RunConfig.shutdown_error_window constraint from ge=0 to ge=1 so the sliding window denominator is never zero. * fix: address Greptile review nits in async_scheduler - Move del _rg_states inside try/finally so semaphore is always released - Add exc_info=True to pre-batch failure log for consistent tracebacks - Short-circuit _check_error_rate when _early_shutdown already set * fix: address Greptile summary findings - Remove duplicate async engine log in build() (kept in _build_async) - Guard _run_seeds_complete_check with has_pre_batch at both call sites - Change error rate comparison from > to >= to match sync path semantics	2026-03-20 13:05:09 -03:00
Eric W. Tramel	a0fb04ee07	feat: agent rollout trace ingestion (#399 )	2026-03-20 09:17:35 -04:00
Nabin Mulepati	a9a16ae61a	feat: Native Anthropic adapter with shared HTTP client infrastructure (#426 ) * plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @dataclass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * add native OpenAI adapter with retry and throttle infrastructure - Implement OpenAICompatibleClient using httpx with RetryTransport - Add ThrottleManager with AIMD concurrency control and structured logging - Route provider_type=openai to native adapter in client factory - Add extract_reasoning_content helper for vLLM field migration - Make ModelRegistry own ThrottleManager and RetryConfig explicitly - Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override Made-with: Cursor * Self CR * fix claude slop * Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler * wrap facade close in try/catch * clean up stray params * fix: address review findings from model facade overhaul PR3 - Fix metadata bug: drop unknown kwargs instead of passing them to ChatCompletionRequest (which has no metadata field), preventing a runtime TypeError. - Lazy-init httpx clients: sync and async clients are created on first use instead of eagerly in the constructor, with constructor injection for testability. - Remove defensive getattr on httpx.Response.status_code (always present). - Add comment clarifying throttle manager wiring is deferred. - Refactor tests to use constructor injection instead of private attribute mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix stray inclusion of metadata * small regression fix * address more feedback * self review * Fixes * new test for aimd lifecycle * update plan docs * update plans with refs to prs * fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test lay init * fix timeout for openaicompatibleadapter * remove unused attr * fix: address review findings from PR #402 - Guard reasoning_content fallback with isinstance(str) check to prevent non-string provider values from violating the return type contract - Normalize provider_type comparison to lowercase so mixed-case configs (e.g. "OpenAI") route to the native adapter instead of silently falling through to the bridge - Document register-before-acquire ordering invariant on ThrottleManager - Add TODO to remove 429 from retryable_status_codes once ThrottleManager is wired via AsyncTaskScheduler (plan 346) Made-with: Cursor * Address pr feedback * fix method order * feat: native Anthropic adapter with image block translation Implement AnthropicClient as a native httpx-based adapter for the Anthropic Messages API, following the same patterns as OpenAICompatibleClient. Route provider_type="anthropic" to the native adapter in the client factory, with unknown providers falling through to LiteLLMBridgeClient. Key adapter behaviors: - System message extraction to top-level `system` parameter - OpenAI image_url content blocks translated to Anthropic image/source format - tool_use / thinking content blocks normalized to canonical types - x-api-key auth (not Bearer), anthropic-version header - max_tokens defaults to 4096 when omitted - stop mapped to stop_sequences - Embeddings and image generation raise unsupported capability errors Made-with: Cursor * fix: exclude OpenAI-specific params from Anthropic payload and drop unnecessary re.DOTALL Expand _ANTHROPIC_EXCLUDE to filter response_format, frequency_penalty, presence_penalty, and seed from TransportKwargs forwarding. Remove the unnecessary re.DOTALL flag from _DATA_URI_RE since base64 data never contains newlines. Made-with: Cursor * Fix failing test * fix: address PR #402 review findings - Add POST to RetryTransport allowed_methods so retries actually fire for all adapter endpoints (chat, embeddings, images are all POST) - Guard close()/aclose() with _init_lock to prevent TOCTOU race with lazy client initialization - Detect "unsupported parameter" patterns in 400 responses so the native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST - Return None from coerce_message_content for image-only content lists instead of leaking the Python repr via str(content) - Restore per-request timeout forwarding for LiteLLMBridgeClient via _with_timeout helper (broken when "timeout" was added to _META_FIELDS) - Normalize ModelProvider.provider_type to lowercase via field_validator so consumers don't need per-site .lower() calls - Fix unclosed httpx.AsyncClient in lazy-init test Made-with: Cursor * updates to DRY out code between the two adapters * refactor code to introduce HttpModelClient * update tests * fix: address PR #426 review findings - Release _aclient in close() to prevent async connection pool leak when both sync and async clients were initialized - Drop malformed image_url blocks (missing image_url key) instead of forwarding them unchanged to the Anthropic API - Preserve image blocks in system messages by returning Anthropic block-list format when non-text content is present - Rename extract_system_text to extract_system_content and add merge_system_parts helper for mixed string/block system parts Made-with: Cursor * fix: improve error classification and surface provider messages for 400s - Handle /v1 in Anthropic endpoint gracefully to avoid path duplication - Add QUOTA_EXCEEDED provider error kind for credit/billing failures - Extend UNSUPPORTED_PARAMS detection for mutually exclusive params - Surface raw provider message in formatted errors for 400 status codes - Consolidate provider message helpers into single _attach_provider_message Made-with: Cursor * fix: address PR #426 review findings (round 2) - Fix close() double-close of shared transport by closing self._transport directly instead of accessing private aclient._transport (critical) - Add TODO for threading.Lock → asyncio.Lock split (plan-346) - Remove unused _model_id from HttpModelClient and all callers - Export AnthropicClient from adapters __init__.py - Filter empty text blocks in translate_tool_result_content join - Move mock helpers to conftest.py with consistent json.dumps text default - Add __init__.py files to enable absolute imports from test conftest - Add bridge env override test for anthropic provider - Add ConnectionError and non-JSON response tests for AnthropicClient - Assert secret_resolver.resolve called with correct key ref in factory tests Made-with: Cursor * Update license headers * Fix anthropic tool call flow * fix: add explicit UNSUPPORTED_CAPABILITY error mapping - Add ModelUnsupportedCapabilityError so unsupported operations (e.g. Anthropic embeddings/image-generation) surface a specific error instead of falling through to generic ModelAPIError - Forward the provider's operation-specific message in the cause - Add parametrized test case for the new error path	2026-03-19 11:18:40 -06:00
Mike Knepper	12baad776d	feat: Refactor person data reading for client ddb connection control (#393 )	2026-03-19 09:34:57 -05:00
Eric W. Tramel	c88508790f	test: trim FileSystemSeedReader follow-up coverage (#432 )	2026-03-19 09:47:55 -04:00
Andre Manoel	61fa0150f7	fix: support nested field access in schema transform templates (#435 ) * feat: support nested field access in schema transform templates Enable {{ result.quality.score }} style dot notation in schema transform Jinja2 templates, where result is a deserialized JSON column. Previously, _json_escape_record flattened all dict values to escaped JSON strings before Jinja2 saw them. This made the rendered output valid JSON but prevented nested access since Jinja2 only saw strings. The fix introduces TemplateValue, a wrapper that defers the choice between "drill into nested dict" and "render as escaped string" to template evaluation time. Jinja2 resolves dot notation via __getattr__ (returning a new TemplateValue for the nested value), and converts to string via __str__ (delegating to a caller-provided str_fn). This is necessary because plain dicts render as Python repr ({'key': 'val'}) which is invalid JSON - we need to control __str__ to produce properly escaped JSON, and that requires a wrapper object. Other Jinja2 consumers (prompt templates, expression columns) don't need this - Jinja2 natively supports dot access on plain dicts via getattr-to-getitem fallback, and plain str() is fine for text output. Schema transform is unique because its output must be valid JSON. * Address PR review comments - Fix boolean serialization: add bool check before str in _escape_value_for_json to produce JSON 'true'/'false' instead of Python 'True'/'False' - Add class-level _record_str_fn annotation to WithJinja2UserTemplateRendering - Rename skip_record_sanitization to _skip_record_sanitization (underscore prefix) to signal internal-only usage, and document it in safe_render docstring - Add defensive error handling in TemplateValue.__getitem__ and __iter__ - Promote test input data to parametrize column, removing brittle string scan * Address second round of PR review comments - Add __eq__ and __hash__ to TemplateValue so Jinja2 equality conditionals (e.g. {% if result.label == "excellent" %}) work - Add inline comment explaining deliberate double-encode in _escape_value_for_json for dict/list values - Default _record_str_fn to None at class level so accessing it before prepare_jinja2_template_renderer doesn't mask the real error * refactor: replace TemplateValue with Jinja2 finalize hook Use Jinja2's built-in finalize hook for value-to-string conversion and a getattr override for dict-key-priority lookup, eliminating the custom TemplateValue wrapper class entirely. * docs: add missing param docs to prepare_jinja2_template_renderer	2026-03-18 15:28:18 -03:00
Eric W. Tramel	3d33680b7d	feat: support 1-to-many FileSystemSeedReader hydration (#424 )	2026-03-17 20:56:19 -04:00
Nabin Mulepati	255e78da83	feat: Native OpenAI adapter with retry and AIMD throttle infrastructure (#402 ) * plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @dataclass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * add native OpenAI adapter with retry and throttle infrastructure - Implement OpenAICompatibleClient using httpx with RetryTransport - Add ThrottleManager with AIMD concurrency control and structured logging - Route provider_type=openai to native adapter in client factory - Add extract_reasoning_content helper for vLLM field migration - Make ModelRegistry own ThrottleManager and RetryConfig explicitly - Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override Made-with: Cursor * Self CR * fix claude slop * Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler * wrap facade close in try/catch * clean up stray params * fix: address review findings from model facade overhaul PR3 - Fix metadata bug: drop unknown kwargs instead of passing them to ChatCompletionRequest (which has no metadata field), preventing a runtime TypeError. - Lazy-init httpx clients: sync and async clients are created on first use instead of eagerly in the constructor, with constructor injection for testability. - Remove defensive getattr on httpx.Response.status_code (always present). - Add comment clarifying throttle manager wiring is deferred. - Refactor tests to use constructor injection instead of private attribute mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix stray inclusion of metadata * small regression fix * address more feedback * self review * Fixes * new test for aimd lifecycle * update plan docs * update plans with refs to prs * fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test lay init * fix timeout for openaicompatibleadapter * remove unused attr * fix: address review findings from PR #402 - Guard reasoning_content fallback with isinstance(str) check to prevent non-string provider values from violating the return type contract - Normalize provider_type comparison to lowercase so mixed-case configs (e.g. "OpenAI") route to the native adapter instead of silently falling through to the bridge - Document register-before-acquire ordering invariant on ThrottleManager - Add TODO to remove 429 from retryable_status_codes once ThrottleManager is wired via AsyncTaskScheduler (plan 346) Made-with: Cursor * Address pr feedback * fix method order * Fix failing test * fix: address PR #402 review findings - Add POST to RetryTransport allowed_methods so retries actually fire for all adapter endpoints (chat, embeddings, images are all POST) - Guard close()/aclose() with _init_lock to prevent TOCTOU race with lazy client initialization - Detect "unsupported parameter" patterns in 400 responses so the native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST - Return None from coerce_message_content for image-only content lists instead of leaking the Python repr via str(content) - Restore per-request timeout forwarding for LiteLLMBridgeClient via _with_timeout helper (broken when "timeout" was added to _META_FIELDS) - Normalize ModelProvider.provider_type to lowercase via field_validator so consumers don't need per-site .lower() calls - Fix unclosed httpx.AsyncClient in lazy-init test	2026-03-17 09:26:58 -06:00
Andre Manoel	0b324478f1	feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine (#404 ) * feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine * test: add retryable salvage and eager row-drop scheduler tests Add two missing test cases for AsyncTaskScheduler: - Transient 503 failure deferred to salvage round and recovered - Downstream column never dispatched for rows dropped by upstream failure Update plan checkboxes to reflect PR 3 completion status. * fix: address Greptile review findings on scheduler and buffer manager - Non-retryable batch/from_scratch failure now drops all rows in the row group so it reaches a terminal state and gets checkpointed - Salvage rounds re-dispatch from_scratch tasks directly instead of relying on the frontier (which never contains them) - Log error if scheduler exits with unfinished row groups - update_batch raises ValueError on size mismatch - Free _row_group_sizes entry on checkpoint - Add row-count mismatch warning in _run_batch writeback - Document intentional stateful lock ordering in _dispatch_seeds * refactor: remove dead _seed_frontier method from CompletionTracker * refactor: restore seed_frontier as public opt-in method on CompletionTracker Keep the tracker self-contained for static introspection (capacity planning, task enumeration) without requiring a scheduler. The scheduler does not call it - it manages root dispatch directly for stateful locks and multi-column dedup. * fix: complete salvage path - stateful locks, frontier drain, checkpoint sweep - Acquire stateful lock before re-dispatching from_scratch tasks in salvage (prevents RuntimeError on lock release) - Replace single-pass salvage drain with _drain_frontier() that re-checks the frontier after each task completes (ensures downstream tasks enqueued during retry are dispatched) - Add post-salvage checkpoint sweep via _checkpoint_completed_row_groups (row groups completed during salvage are now written to parquet) - Extract _checkpoint_completed_row_groups helper, reused by both the main loop and post-salvage sweep * fix: checkpoint per salvage round, simplify deferred list, fire on_complete for empty groups - Move _checkpoint_completed_row_groups inside the salvage loop so completed row groups are flushed promptly between rounds - Simplify _deferred from list[tuple[Task, int]] to list[Task] since the attempt count was unused (salvage_max_rounds bounds retries) - Fire on_complete(None) when all rows in a row group are dropped so callers always receive a completion signal * fix: address PR review - _is_retryable, semaphore safety, dispatched pruning Replace string-matching _is_retryable with isinstance checks against typed ModelXxxError exceptions. Add try/finally around checkpoint to protect _rg_semaphore.release() from deadlocks. Prune completed tasks from _dispatched to prevent unbounded growth. Re-register batch_alias in salvage dispatch. Replace assert with ValueError in _run_cell. Parameterize on_complete Callable signature. Add test for on_complete(None) when all rows dropped. * fix: guard against dispatching tasks for checkpointed row groups When a from_scratch task fails non-retryably and all rows are dropped, downstream full_column tasks become vacuously ready and enter the frontier. If dispatched in the same loop iteration that checkpoints the row group, the task's coroutine runs against a freed buffer (KeyError). The guard in _execute_task_inner skips such tasks early, and a `skipped` flag keeps them in _dispatched to prevent re-dispatch. Also addresses remaining review feedback: - agenerate({}) fallback now passes empty DataFrame - dict comprehensions simplified to dict(row_groups) - get_row return type annotated as dict[str, Any] - configs parameter fully typed in test helper - mock generators use self.config.name * fix: raise on batch size mismatch, isolate checkpoint failures - _run_batch: raise ValueError on row count mismatch instead of warning, matching the update_batch path and preventing silent data corruption. - _checkpoint_completed_row_groups: catch per-RG exceptions so one failed checkpoint doesn't block subsequent row groups or leak semaphore slots.	2026-03-16 21:31:17 -03:00
Eric W. Tramel	28c8345909	feat: add built-in filesystem seed readers (#421 )	2026-03-16 17:40:27 -04:00
Eric W. Tramel	5783435979	feat: Improve generation failure reporting for schema and timeout failures (#416 )	2026-03-13 18:38:01 -04:00
Johnny Greco	26a9cf23ac	feat: normalize validator and constraint discriminators (#414 ) * feat: normalize validator and constraint discriminators * docs: add docstring and comment to Constraint base class Address Greptile review feedback: - Add docstring to Constraint noting it should not be instantiated directly - Add comment explaining the rhs fallback behavior in the resolver * refactor: restore ABC on Constraint base class * refactor: add explicit None guard in constraint resolver * Fix legacy numeric sampler constraint detection * fix: address PR review feedback from nabinchha - Guard _can_coerce_to_float against inf/nan strings - Add -> None return type annotations to test functions - Add clarifying comments to ColumnConstraintT vs ColumnConstraintInputT - Add tests for tagged constraint round-trip and missing rhs validation	2026-03-13 17:34:23 -04:00
Nabin Mulepati	d6b8433e25	fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409 ) (#412 ) * fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) TransportKwargs.from_request() flattened extra_body keys into top-level kwargs, causing LiteLLM to reject provider-specific params like reasoning_effort via its per-provider allowlist validation. Add a flatten_extra_body flag (default True for backward compat) so the LiteLLM bridge can opt out and preserve extra_body as a distinct kwarg that LiteLLM forwards without validation. Made-with: Cursor * fix: address PR #412 review comments Update stale docstrings in TransportKwargs and facade.py to reflect the new flatten_extra_body flag, and add an edge-case test for empty extra_body with flatten disabled.	2026-03-13 13:35:32 -06:00

1 2

86 commits