mirror of
https://github.com/NVIDIA-NeMo/DataDesigner
synced 2026-05-24 09:48:29 +00:00
86 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2a487cdc5c
|
feat: add dropped column preservation toggle (#691)
* feat: add dropped column preservation toggle Closes #690 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: reject dropped column policy resume mismatch Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
c0a4dcbb85
|
feat: implement async scheduling admission control (#661)
Some checks are pending
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
|
||
|
|
a83968f701
|
feat: preserve multimodal MCP tool results (#689)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* feat: preserve multimodal MCP tool results Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: gate MCP generic image payload detection Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: validate MCP image payload blocks Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
0860d62e23
|
fix: restore chat completion multi-choice support (#672)
* fix chat completion multi-choice support Restore the chat completion n request field and preserve all returned choices in the canonical response while keeping response.message as the first choice. Add coverage for request forwarding, compatibility access, multi-choice parsing, and generate forwarding. Fixes #620 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * strip n from generate requests Prevent generate and agenerate from forwarding multi-choice requests that they cannot expose, while keeping completion() multi-choice support intact. Add coverage for async parsing and Anthropic n exclusion. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * strip configured n from generate requests Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * rename multiple choice completion flag Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * move choice sanitizer to private helpers Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * order private facade helpers Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
bd0410bb05
|
fix(engine): actionable error when a Jinja field is missing/None/empty (#633)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(engine): actionable error when a Jinja field is missing/None/empty Empty-render and missing-attribute failures used to surface as the generic "User provided prompt generation template is invalid." either because `sanitize_user_exceptions` stripped the detail or because Jinja's raw `UndefinedError` leaked through. Both now raise a new `EmptyTemplateRenderError` carrying a row-level diagnostic that names the offending chain and includes copy-pasteable Jinja conditional and SkipConfig fix patterns. Closes #629. * fix(engine): address PR review feedback on EmptyTemplateRenderError Addresses the open review comments on #633: 1. (Greptile P1) Gate expression in the suggested remediation template was one accessor too deep when the root variable was entirely absent from the record, causing the suggested fix to itself raise UndefinedError. Fall back to gating on the root name alone when sample_name is not in record. 2. (andreatgretel) The AST walker reported loop-local names as missing culprits (e.g. ``person`` in ``{% for person in people %}...{% endfor %}``). Filter extracted chains through ``meta.find_undeclared_variables`` to defer to Jinja's canonical scope tracking. 3. (andreatgretel follow-up) Empty collections used as loop iterables (``items=[]``) fell through to the no-culprit fallback. Add a new ``_CULPRIT_EMPTY_COLLECTION`` classification so they're surfaced. 4. Minor: add ``from exception`` to ``safe_render``'s UndefinedError re-raise for traceback consistency with the native engine path, and add a note on the load-bearing exception ordering in ``sanitize_user_exceptions``. |
||
|
|
6055290136
|
feat: add workflow chaining (#636)
Some checks are pending
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish Fern devnotes / deploy (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
* feat: add workflow chaining * test: tidy workflow chaining coverage * fix: harden workflow chaining concurrency * docs: update workflow chaining plan * feat: add workflow stage postprocessors * feat: expose workflow stage outputs * fix: align workflow selected output export * fix: address workflow chaining review issues * fix: align workflow parquet export selection Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix: preserve generated columns in drop validation Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix: clarify workflow output processors Signed-off-by: Andre Manoel <amanoel@nvidia.com> * docs: add workflow chaining page * docs: align workflow chaining warning * fix: address workflow review nits --------- Signed-off-by: Andre Manoel <amanoel@nvidia.com> |
||
|
|
71997624b3
|
feat: track reasoning token usage (#670)
Some checks are pending
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
* feat: track reasoning token usage Capture provider-reported reasoning-token breakdowns alongside output tokens without changing output token totals. Carry the field through model usage aggregation and add coverage for parsing, facade tracking, and deltas. Refs #665 * fix: show reasoning tokens in usage summary Include reasoning token counts in the local model usage summary while preserving output and total token semantics. Telemetry remains unchanged. Refs #665 * fix: estimate missing reasoning token counts When providers return reasoning content without a numeric usage breakdown, estimate reasoning tokens from that content while preserving provider-reported output and total token counts. Refs #665 * fix: track reasoning token count source * fix: simplify reasoning token source * fix: omit unknown reasoning tokens from logs * refactor: clarify reasoning token count helpers * test: move token counting tests * fix: enforce reasoning token source * fix: address reasoning usage review |
||
|
|
a4085c441a
|
feat: add AIMD startup ramp (#638)
Some checks failed
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Lint and Format Check (push) Has been cancelled
CI / Check License Headers (push) Has been cancelled
Publish devnotes / deploy (push) Has been cancelled
Publish Fern devnotes / deploy (push) Has been cancelled
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
|
||
|
|
0fdea845ac
|
feat: add fair async task scheduling (#639) | ||
|
|
bbcd7d3995
|
fix: harden resume checkpoint handling (#624)
Some checks are pending
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
* fix: harden resume checkpoint handling Persist config identity in metadata, make checkpoints atomic, and reject unsafe resume states so interrupted runs do not mix incompatible or post-processed data. * fix: close resume edge cases Let IF_POSSIBLE start fresh for resize configs and mark after-generation processing before mutation so interrupted processors cannot be resumed unsafely. * refactor: drop dataset directory lock Single-user CLI/notebook flows don't race on the artifact directory, and the timestamped-directory fallback already handles the "ran it twice" case. The lock added complexity (re-entrancy, stale cleanup, the cached-property trap where IF_POSSIBLE→NEVER moves writes to a timestamped directory while the lock stays pinned to the original) for no real protection. Atomic metadata writes still cover the actual hazard (crash mid-write). Also fix a pre-existing test bug in test_initial_actual_num_records_uses_actual_parquet_rows_for_partial_row_group where the mocked scheduler hit the partial-completion path with unconfigured Mock attributes. * fix: address Greptile review on resume edge cases * Drop the unreachable ResumeMode.IF_POSSIBLE branch in _post_generation_processed_resume_result. By the time this helper runs, build() has normalised IF_POSSIBLE to ALWAYS or NEVER, so the guard now matches reality. Tighten the docstring to document the three outcomes (no-op return / fall through / raise). * Split the post-processed extension/raise into two cases. When num_records < prior_target the user just asked for fewer records than already exist; the previous "would mix pre- and post-processor records" message only describes the extension case. Mirror the wording used by _load_resume_state and add a regression test. * Remove the dead _find_completed_row_group_ids wrapper now that _build_async uses _find_completed_row_groups directly. Rename the related test to match. * refactor: unify sync + async resume around filesystem-derived progress Both engines now derive `num_completed_batches` and `actual_num_records` from `parquet-files/batch_*.parquet` via `_recover_progress_from_disk`. `metadata.json` keeps describing the run *configuration* (`buffer_size`, `target_num_records`, `original_target_num_records`, config fingerprint), while the filesystem is the source of truth for *progress*. This closes the sync engine's race window between `move_partial_result_to_final_file_path` and the metadata write that follows it, matching the crash-recovery the async engine already had. The sync engine additionally rejects non-contiguous batch IDs (a hole can only mean external mutation or a directory written by an incompatible engine); the async engine continues to tolerate gaps from out-of-order completion via `allow_holes=True`. Existing sync resume tests now seed parquet files alongside metadata, and two new tests cover the unified behaviour: filesystem progress wins when metadata lags, and sync rejects non-contiguous IDs. * docs: clarify DatasetCreationResults observability scope on resume `load_dataset`, `count_records`, `load_analysis`, `export`, and `push_to_hub` all read from the artifact directory, so they reflect the cumulative dataset (original + resume rows). `task_traces`, model-usage logs, and telemetry events are scoped to the current invocation only because the original run's in-memory state is not persisted. Document this in the class docstring, the architecture note, and the Fern resume guide. * docs: explain DeprecationWarning re-raise in create()/preview() Future readers were puzzled by the ``except DeprecationWarning: raise`` short-circuits before the generic generation-error wrappers. Add a comment in ``create()`` (with a back-reference from ``preview()``) to record that strict warning filters (``pytest.warns``, ``-W error::DeprecationWarning``) turn the engine's ``warnings.warn(..., DeprecationWarning)`` calls — most notably the ``allow_resize=True`` deprecation in ``_resolve_async_compatibility`` — into raised exceptions, and we want them to surface untouched instead of being swallowed by ``DataDesignerGenerationError``. * fix: close after-generation crash window and tighten metadata typing on resume Address review feedback on resume hardening: * Run after-generation processors unconditionally on the on-disk dataset rather than gating on the generation return value. The previous gate silently skipped after-generation when resume saw every row group already on disk, leaving a crash window between the final parquet write and the ``post_generation_state="started"`` marker write: in that window the dataset is complete but after-generation never ran, and the on-disk parquet files are still clean. The "started" short-circuit still rejects the other direction (crashed mid-rewrite, ambiguous state), so resume only re-runs after-generation when it is safe to do so. * Raise ``DatasetGenerationError`` (instead of letting a raw ``TypeError`` leak out of ``num_records < prior_target``) when a post-processed dataset's metadata is missing ``target_num_records``. Mirrors the wording used by ``_load_resume_state``. * Document the new behaviour in ``architecture/dataset-builders.md`` and the Fern resume invariants. Tests: * ``test_build_resume_complete_dataset_runs_after_generation_when_no_marker`` covers the closed crash window via the public ``set_processor_runner`` API. * ``test_build_resume_post_generation_processed_missing_target_raises_clearly`` covers the typed-error gap. |
||
|
|
4b93f5b245
|
feat: let column configs declare all model aliases for the startup health check (#626)
* feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
810c681f7a
|
feat: resume interrupted dataset generation runs (sync + async engine) (#526)
Some checks failed
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Lint and Format Check (push) Has been cancelled
CI / Check License Headers (push) Has been cancelled
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
* docs: add implementation plan for resume mechanism
Fixes #525
* feat(storage): add resume flag and clear_partial_results()
- ArtifactStorage gains a `resume: bool = False` field
- resolved_dataset_name skips timestamp logic when resume=True,
returning the existing dataset folder name as-is
- Raises ArtifactStorageError on resume=True when the target folder
is absent or empty (no data to resume from)
- New clear_partial_results() removes in-flight partial results
left over from an interrupted run
Fixes #525
* feat(batch-manager): add start_batch param to start()
DatasetBatchManager.start() now accepts:
- start_batch: int = 0 — first batch index to process
- initial_actual_num_records: int = 0 — records already on disk
Both default to 0 so all existing call sites are unaffected.
Fixes #525
* feat(builder): implement resume logic in DatasetBuilder
- build() gains a resume: bool = False parameter
- _load_resume_state() reads metadata.json and validates that
num_records and buffer_size match the original run
- _build_with_resume() skips completed batches, clears in-flight
partial results, and continues from the first incomplete batch
- Raises DatasetGenerationError with clear messages for:
- missing metadata.json (interrupted before first batch completes)
- num_records mismatch
- buffer_size mismatch
- DATA_DESIGNER_ASYNC_ENGINE=1 (not yet supported)
- Logs a warning and returns early when dataset is already complete
Fixes #525
* feat(interface): expose resume on DataDesigner.create()
- create() gains resume: bool = False
- _create_resource_provider() passes resume to ArtifactStorage
- builder.build() receives the resume flag
Fixes #525
* test: add tests for resume mechanism
Covers:
- ArtifactStorage.resolved_dataset_name with resume=True
- ArtifactStorage.clear_partial_results()
- DatasetBatchManager.start() with start_batch and
initial_actual_num_records
- DatasetBuilder.build(resume=True): missing metadata, num_records
mismatch, buffer_size mismatch, already-complete detection
Fixes #525
* feat(builder): extend resume to async engine (DATA_DESIGNER_ASYNC_ENGINE=1)
- Add _find_completed_row_group_ids() to scan parquet-files/ for already-written
row groups by parsing batch_*.parquet filenames
- _build_async() now accepts resume=True: loads metadata, finds completed row groups,
clears partial results, and logs progress; returns early if all row groups are done
- _prepare_async_run() accepts skip_row_groups, initial_actual_num_records, and
initial_total_num_batches so the scheduler only processes remaining row groups
and RowGroupBufferManager starts from the correct counts
- RowGroupBufferManager.__init__ gains initial_actual_num_records and
initial_total_num_batches params to seed the counters on resume
- finalize_row_group closure now writes incremental metadata after each checkpoint
so any run (resume or not) can be resumed if interrupted mid-way
- Remove the guard that rejected resume=True with DATA_DESIGNER_ASYNC_ENGINE=1
- Add tests for all new paths
* fix(builder): skip after-generation processors when resume finds dataset already complete
_build_with_resume and _build_async now return False when the dataset is already
complete (early-return path), True otherwise. build() skips
_processor_runner.run_after_generation() on False, preventing processors from
calling shutil.rmtree and rewriting an already-finalized dataset.
Fixes the issue raised in review: greptile P1 comment on PR #526.
* fix(builder): use filesystem count for initial_total_num_batches on async resume
Metadata can lag by one row group if a crash occurs between
move_partial_result_to_final_file_path and write_metadata. Using
len(completed_ids) from the filesystem scan instead of
state.num_completed_batches ensures the final metadata reflects the
actual number of parquet files present, not the potentially stale
metadata count.
* feat(results): add export() method and --output-format CLI flag
Adds DatasetCreationResults.export(path, format=) supporting jsonl,
csv, and parquet. The CLI create command gains --output-format / -f
which writes dataset.<format> alongside the parquet batch files.
* fix(builder): handle resume when metadata.json missing (interrupted before first batch)
When a run is interrupted before any row group or batch completes, metadata.json
is never written. Previously resume=True would raise DatasetGenerationError in
this case. Now build() detects the missing file, logs an info message, clears
any leftover partial results and falls back to a clean fresh run.
This is the common scenario for small datasets (fewer records than buffer_size)
where all records fit in a single row group.
* docs(interface): fix resume docstring — async engine is supported
* fix(builder): derive initial_actual_num_records from filesystem in async resume
In the crash window (row group written to disk but write_metadata crashed before
updating the file), both initial_total_num_batches and initial_actual_num_records
now use the filesystem-discovered completed_ids as source of truth. Previously
initial_actual_num_records was read from potentially stale metadata, causing
actual_num_records in the final metadata to be undercounted by one row group.
Also adds a test covering the partial-resume crash-window scenario.
* feat(resume): replace resume: bool with ResumeMode enum (NEVER/ALWAYS/IF_POSSIBLE)
- Introduces ResumeMode(StrEnum) in artifact_storage.py for use across all layers
- Replaces resume: bool with resume: ResumeMode in DatasetBuilder.build(),
DataDesigner.create(), ArtifactStorage, and _build_async()
- Adds _check_resume_config_compatibility() using config fingerprints to support
IF_POSSIBLE: falls back to a fresh run when config has changed since last run
- Relaxes num_records validation from strict equality to num_records >= actual_num_records,
allowing dataset extension on resume; buffer_size must still match exactly
- Preserves exception chain with 'from exc' on FileNotFoundError in _load_resume_state
- Exports ResumeMode from data_designer.interface for users to import
- Adds skip_row_groups assertion test and IF_POSSIBLE storage behavior tests
* fix(resume): invalidate resolved_dataset_name cache when IF_POSSIBLE downgrades to NEVER
ArtifactStorage's Pydantic model validator accesses base_dataset_path at
construction time, caching resolved_dataset_name under IF_POSSIBLE semantics
before build() can set resume=NEVER. Pop the stale cache entry so the property
re-resolves with the correct NEVER semantics (timestamped directory).
Also fixes _check_resume_config_compatibility() to use artifact_path/dataset_name
directly instead of base_dataset_path, and adds a regression test covering the
cache-bypass scenario.
* fix(builder): move partial-completion warning before return in _build_async
* fix(builder): IF_POSSIBLE now starts fresh when no dataset directory exists
_check_resume_config_compatibility returned True when config_path was absent,
even when the dataset directory itself didn't exist. This caused IF_POSSIBLE to
upgrade to ALWAYS, which then raised ArtifactStorageError on the first-ever run
because ALWAYS requires an existing directory.
Fix: return False early when the dataset directory is absent. Also sets
actual_num_records on mock buffer managers in two async resume tests that
started failing after the partial-completion warning block was made reachable.
* fix(builder): use original target_num_records in async resume record count
When extending a non-aligned run (e.g. original num_records=5, buffer_size=2),
the last completed row group has 1 record, not buffer_size=2. Using new num_records
in the formula would overcount: min(2, 7-2*2)=2 instead of min(2, 5-2*2)=1.
Fix: capture state from _load_resume_state (previously discarded) and pass
state.target_num_records into the sum formula. Added target_num_records field to
_ResumeState, populated from metadata.json.
Test: test_build_async_resume_initial_actual_num_records_uses_original_target
* fix(builder): IF_POSSIBLE starts fresh on empty dataset directory
Empty directory (crash between mkdir and first file write) was treated as
compatible — _check_resume_config_compatibility returned True, IF_POSSIBLE
upgraded to ALWAYS, which then raised ArtifactStorageError.
Fix: treat empty directory the same as missing — return False from
_check_resume_config_compatibility when any(dir.iterdir()) is False.
Test: test_if_possible_starts_fresh_when_directory_is_empty
* fix(builder): ALWAYS raises DatasetGenerationError on config fingerprint mismatch
ResumeMode.ALWAYS was documented to raise when column/model config changed, but
_check_resume_config_compatibility() was only called in the IF_POSSIBLE branch.
A user resuming with ALWAYS after changing the config would silently mix records
from two different configs.
Fix:
- Refactor _check_resume_config_compatibility() to return _ConfigCompatibility
enum (COMPATIBLE / INCOMPATIBLE / NO_PRIOR_DATASET) instead of bool so callers
can distinguish 'no prior run' from 'configs differ'
- Call the check for both ALWAYS and IF_POSSIBLE before _write_builder_config()
- ALWAYS + INCOMPATIBLE → DatasetGenerationError
- IF_POSSIBLE + INCOMPATIBLE → silent fresh start (existing behaviour)
- IF_POSSIBLE + NO_PRIOR_DATASET → silent fresh start (existing behaviour)
Test: test_build_resume_always_raises_on_config_mismatch
* fix(resume): address nabinchha review — drop export collision, add CLI flag, fix edge cases
C1: drop commit
|
||
|
|
8d4d59303d
|
fix: normalize rollout timestamps before deriving started_at/ended_at (#556) | ||
|
|
9214637a5b
|
fix(engine): validate processor plugin impls (#609)
* fix(engine): validate processor plugin impls Add the processor implementation base to assert_valid_plugin so processor plugins are checked against Processor instead of only the generic config contract. Keep plugin type validation table-driven and raise explicit AssertionError messages so checks are not skipped under optimized Python. Signed-off-by: Johnny Greco <jogreco@nvidia.com> * test(engine): require plugin base map coverage --------- Signed-off-by: Johnny Greco <jogreco@nvidia.com> |
||
|
|
f73da1975c
|
feat(models): deprecate implicit default provider routing (#594)
Some checks failed
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
* feat(models): deprecate implicit default provider routing
Emit DeprecationWarning whenever the legacy "implicit default
provider" path is exercised: `ModelConfig.provider=None`, the
registry-level `ModelProviderRegistry.default`, the YAML
`default:` key in `~/.data-designer/model_providers.yaml`, and
the CLI's "Change default provider" workflow.
`resolve_model_provider_registry` skips passing `default=` in the
single-provider case so the common construction path stays quiet.
Multi-provider registries still pass `default` (per
`check_implicit_default`) and warn accordingly.
Update docs, the package README, and test fixtures to specify
`provider=` explicitly on every `ModelConfig`. New tests cover
each warning entry point and pin the post-deprecation happy paths.
Refs #589
Made-with: Cursor
* fix(models): address PR #594 review feedback
Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.
Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).
johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.
johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.
johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.
johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.
Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
roundtrip stops exercising the deprecated YAML-default path
unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
`test_get_provider`, and
`test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
in `pytest.warns` so the suite stays green under
`-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
pin the cascade-dedup behavior.
Refs #589
Made-with: Cursor
* fix(models): make deprecation warnings visible under default filters
andreatgretel (PR #594): the YAML-default warning in
`get_default_provider_name` and the registry-default warning emitted
from inside DataDesigner helpers were attributing to data_designer
library frames, not user code. Python's default filter chain includes
`ignore::DeprecationWarning`, so library-attributed entries are
silenced — meaning a normal `DataDesigner()` call with a YAML
`default:` set showed nothing, and `resolve_model_provider_registry`
warnings were similarly invisible. Two related changes:
1. `warn_at_caller`: extend the default skip-list from `("pydantic",)`
to `("pydantic", "pydantic_core", "data_designer")` so the walk
escapes both pydantic's validator-dispatch frames and data_designer
helper frames before attributing. Also tighten the prefix predicate
to exact-or-dotted-prefix matching (`name == p or
name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not
falsely matched as part of `pydantic` (johnnygreco nit). Allow
callers to pass a custom `skip_prefixes` for flexibility. Drop the
"skip frame 0+1 unconditionally" guard now that prefix matching
covers it.
2. `get_default_provider_name`: switch from
`warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous
stacklevel pointed into `default_model_settings.py`, which is a
library file → silenced under default filters. Verified the fix
empirically with `python -W default`: warning is now attributed to
the user's call site and rendered.
johnnygreco (PR #594): add the missing
`test_explicit_default_none_does_not_emit_deprecation_warning`
regression for the `self.default is not None` predicate landed in
the prior round.
Tests:
- New `test_warning_helpers.py` pins prefix-matching precision
(rejects `pydantic_helpers` / `data_designer_other`), default
skip-list contents, attribution past skip-prefix frames, and
per-call-site dedup behavior.
- `test_get_default_provider_name_warning_attributes_to_user_frame`
pins andreatgretel's repro for the YAML-default site.
- `test_explicit_default_warning_attributes_to_user_frame` pins the
multi-frame case: construction goes through
`resolve_model_provider_registry`, so the walk has to escape both
pydantic and data_designer before landing on the test file.
- `test_explicit_default_none_does_not_emit_deprecation_warning`
pins johnnygreco's predicate-tightening regression.
3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net
from this round).
Refs #589
Made-with: Cursor
* fix(models): apply warn_at_caller to remaining deprecation sites
greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s
YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`,
which attributes to whichever data_designer frame called `load()` —
controllers, services, list/reset commands, agent introspection. Every
real call path lands on `data_designer.cli.*`, which falls under
Python's default `ignore::DeprecationWarning` filter and is silenced.
Audit found two more sites with the same problem:
- `DatasetBuilder._resolve_async_compatibility` (`allow_resize` /
issue #552) — was using `stacklevel=4` to walk past
`_resolve_async_compatibility -> build/build_preview -> interface ->
user`. Brittle: any added frame (decorator, async wrapping, the
`try/except DeprecationWarning: raise` boundary) shifts attribution
silently. The existing test passed only because it used
`simplefilter("always") + record=True`, which records warnings
regardless of attribution.
- `ProviderController._handle_change_default` — was using
`stacklevel=2`, which lands on the menu dispatcher in the same
controller module. `print_warning` already shows the message
visually, but programmatic observers (`pytest.warns`,
`filterwarnings("error", ...)`) saw a library-attributed entry that
default filters silenced.
All three migrated to `warn_at_caller` (the helper from
|
||
|
|
61cdeefb17
|
feat: make async engine the default execution path (#592)
Some checks failed
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish devnotes / deploy (push) Has been cancelled
* feat: make async engine the default execution path
The async engine has been hardening as opt-in for several releases. Make it
the default and address the prerequisites flagged for the flip.
Default flip
- DATA_DESIGNER_ASYNC_ENGINE defaults to "1" at both consumption sites
- Set DATA_DESIGNER_ASYNC_ENGINE=0 for one transitional release to opt out
- allow_resize=True still falls back to sync with a DeprecationWarning
Python 3.10 support
- Replace asyncio.TaskGroup (3.11+) in async_concurrency.py with
gather-with-explicit-cancel; semantics preserved because _run_task already
swallows its own exceptions and uses _shutdown_event for sibling cancellation
- Remove the sys.version_info < (3, 11) runtime guard
- Remove the matching pytest skipif so the executor tests run on 3.10 too
Derived timeouts (replaces two hardcoded 300s constants)
- ThrottleManager.acquire_sync/async default to timeout=None (no deadline)
instead of DEFAULT_ACQUIRE_TIMEOUT=300; HTTP request timeout already bounds
actual work, queue waits scale with provider speed and AIMD
- _AsyncBridgedModelFacade derives the sync->async bridge timeout from the
model's inference_parameters.timeout and the call's max_correction_steps;
one knob (per-model timeout) drives both deadlines, no new config surface
- Add ModelFacade.request_timeout property so the bridge can read the
effective timeout the client is configured with
Root-cause surfacing
- AsyncTaskScheduler captures the first non-retryable error and exposes it
via first_non_retryable_error
- Interface threads it through DataDesignerGenerationError when 0 records
are produced without early-shutdown, so deterministic failures (e.g. bad
seed sources) surface their original message instead of a wrapped
FileNotFoundError on the parquet path
Tests
- New: throttle no-deadline default behavior (sync+async), parametrized
derived bridge timeout, restored async_concurrency tests on 3.10
- Updated: test_dataset_builder.py uses an autouse fixture to pin its
Mock-based tests to the sync engine they cover; existing bridge tests
set facade.request_timeout for the new derivation
Docs
- Replace the stale LiteLLM security notice in README with a short
async-default heads-up and link to the migration guide
- Add docs/migration-async-default.md covering per-model timeouts,
custom-column thread safety, mocking model calls, run outcomes, and
the opt-out
- Append a short Update section to the async-all-the-way-down dev note
* test: extract _compute_bridge_timeout helper for direct testing
The parametrized bridge-timeout test was patching ``concurrent.futures.Future.result``
to capture the timeout the bridge passed in. That reaches into stdlib internals
(DEVELOPMENT.md "Mock at boundaries: Keep mocking shallow") and the ``ids=`` argument
on the parametrize was missing.
Extracts the formula into a module-level ``_compute_bridge_timeout`` helper. The test
now calls the helper directly with no mocking, and the parametrize gets readable ids.
Behavior is unchanged.
* test(e2e): align demo plugins with async engine contracts
The e2e demo plugins exercise plugin discovery and full DD lifecycle. Two
of them were written against sync-engine semantics that the async engine
restricts:
- DemoColumnGeneratorImpl was a ColumnGeneratorFullColumn with no
required_columns. The async engine routes ``no-upstream`` columns
through the from-scratch path, which passes an empty DataFrame to
generators that aren't FromScratchColumnGenerator subclasses. The
generator then produces 0 rows and the scheduler raises
``update_batch received 0 values``. Switching the plugin to
FromScratchColumnGenerator with generate_from_scratch(num_records)
matches what the plugin actually does (produces a constant column
without input) and works on both engines.
- RegexFilterProcessor implemented process_before_batch with row-count
changes. The async engine enforces row-count invariance in pre- and
post-batch processor stages by design. Moving the filter to
process_after_generation preserves the plugin's purpose (regex-based
row filtering) at a stage that supports row-count changes on both
engines. Test assertions check the final dataset, so the stage shift
is transparent.
Both changes are demo-plugin updates only; no production code change.
* fix: address Codex review findings on async-default flip
Three bugs and two test-quality concerns surfaced by an independent review of
the prior commits. Each was real and worth fixing in the flip PR.
Bug fixes
- Sync-fallback path was creating async-only model clients. The default flip
meant ``client_concurrency_mode = ASYNC`` for every default run, but the
``allow_resize=True`` path falls back to the sync engine — sync ``model.generate()``
calls then hit ``SyncClientUnavailableError``. The resolution decision now
lives at the DataDesigner interface level via
``_resolve_client_concurrency_mode``: it considers both the env var and the
config (allow_resize forces sync clients) and is passed explicitly to
``create_resource_provider``. Direct callers of the factory still get the
env-var default.
- Sync→async bridge timeout ignored the per-call ``timeout=`` override. A
custom column calling ``model.generate(timeout=600)`` against a slow endpoint
was being cancelled at the model-config default, not 600s. The bridge now
prefers ``kwargs.get("timeout")`` over ``facade.request_timeout``.
- Bridge timeout formula missed ``max_conversation_restarts``. One logical
generation can do ``(1 + max_conversation_restarts) × (1 + max_correction_steps)``
HTTP requests; the formula now multiplies both, matching the worst-case
attempt budget.
Engine routing fix (also surfaced by failing e2e plugin tests)
- ``_run_from_scratch`` else-branch passed an empty DataFrame to non-FromScratch
generators classified as seeds (no upstream columns), so ``ColumnGeneratorFullColumn``
with no required_columns produced 0 rows for an ``rg_size``-row buffer. Now
passes an ``rg_size``-row snapshot of the row-group buffer, mirroring the
sync engine's FULL_COLUMN contract.
- The earlier ``DemoColumnGeneratorImpl`` workaround (rewrite as ``FromScratchColumnGenerator``)
is reverted; the engine fix subsumes it. The processor-plugin fix
(``process_after_generation`` for the regex filter) stays — pre-batch
row-count change is intentionally rejected by the async engine.
Test improvements
- Throttle no-deadline tests are parametrized over ``(timeout=0.0, raises)``
and ``(timeout=None, waits)``, pinning that ``None`` is genuinely distinct
from any finite default. Sync and async counterparts mirror.
- New regression tests for ``first_non_retryable_error`` surfacing covering
both load-raises and load-returns-empty paths, asserting the original
exception is chained via ``__cause__`` and that the typed
``DataDesignerEarlyShutdownError`` doesn't fire in this branch.
- New parametrized regression test for ``_resolve_client_concurrency_mode``
covering all four (env × allow_resize) combinations.
- New parametrized test for the per-call ``timeout=`` override flowing into
the bridge timeout calculation.
- Bridge formula tests extended with ``max_conversation_restarts`` cases.
* test: trim redundant parametrize cases in async-default tests
Three parametrize cases were duplicating coverage already provided by
existing standalone tests:
- ``test_acquire_*_timeout_branches`` parametrized over ``(0.0, raises)``
and ``(None, waits)``. The ``raises`` half duplicates
``test_acquire_*_raises_timeout_when_at_capacity``. Replaced with two
focused ``..._default_no_deadline_waits_for_release`` tests covering
only the no-deadline branch.
- ``test_resolve_client_concurrency_mode_matches_engine_choice`` had four
cases. The ``async-off + allow-resize`` case asserts ``SYNC`` because the
env var alone forces it; the allow_resize input is moot. Dropped.
- ``test_async_bridge_honors_per_call_timeout`` had three cases. The
"override below floor" case cross-products the per-call override flow
with the floor-clamping behavior already covered by
``test_compute_bridge_timeout``. Dropped.
Net: -25 lines of test code with no loss of essential coverage.
* docs: fold migration page into existing concept docs
The standalone ``Migrating to the async default`` page didn't fit the
existing docs style — present tense, behavior over comparisons, content
in the natural concept home. Folding it in:
- ``architecture-and-performance.md`` gets a new ``Async Engine`` section
covering per-model timeouts, run outcomes (partial completion +
``DataDesignerEarlyShutdownError``), and the transitional opt-out.
Three stale ``async engine is landing soon`` callouts updated to
reflect the flip.
- ``custom_columns.md`` gets two short notes: a thread-safety callout
near Generation Strategies, and a mocking-with-spec note in
Development Testing.
- ``async-all-the-way-down.md`` Update section now points at the new
arch-and-perf section.
- README heads-up links to the same anchor.
- ``migration-async-default.md`` removed; mkdocs.yml entry dropped.
* docs: frame Execution Model as sync-engine specifics
Small targeted edits to make the user-facing concept docs consistent
with the post-flip state. No restructuring.
- ``architecture-and-performance.md``: the ``Execution Model`` callout
now opens with two engines, links to the new ``Async Engine`` section,
and frames the existing column-at-a-time description as sync-engine
semantics. The ``Step 2: Process columns sequentially`` paragraph notes
the async engine relaxes this. The ``Key Concepts`` table differentiates
per-engine for ``Batching`` and ``Sequential columns``; ``Parallel cells``
is the same on both.
- ``processors.md``: added a warning callout about the async engine's
row-count invariance in pre- and post-batch stages, with the guidance
to use ``process_after_generation()`` for row-filtering or expansion.
* fix: address review nits from PR #592 (Nabin)
Four targeted fixes from the review.
Worth-addressing (warning):
- ``test_acquire_async_default_no_deadline_waits_for_release`` was
spawning the release task without holding a strong reference. The
loop's weak-ref bookkeeping could GC it before the inner ``await``
observes the release, producing a CI flake. Hold the task and
``await`` it in ``finally``.
Take-it-or-leave-it (applied):
- Root-cause error surfacing now includes the exception type name:
``f"🛑 {type(root_cause).__name__}: {root_cause}"`` so users see
``ValueError: ...`` instead of just the message string. The
``__cause__`` chain is preserved either way.
- Drop the defensive ``getattr(c, "allow_resize", False)`` in
``_resolve_client_concurrency_mode`` — every member of
``ColumnConfigT`` inherits ``allow_resize: bool = False`` from
``SingleColumnConfig``.
- One-line comment near the root-cause surfacing branch noting that
``actual_num_records == 0`` is async-only (sync runs leave it at
``-1``), so the branch is async-only by construction.
Not addressed in this PR (filing as follow-ups):
- ``SYNC_BRIDGE_TIMEOUT = 300`` still hardcoded in
``column_generators/generators/base.py:_run_coroutine_sync``. That
bridge has no model-facade context to derive a timeout from, so the
fix is a structural refactor outside this PR's scope.
- First-error capture loses subsequent-error context. The "first wins"
heuristic is documented; richer aggregation is a follow-up.
* fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync
This was the third hardcoded 300s timeout (Nabin flagged it on PR #592).
The path is the generic sync→async bridge in ``ColumnGenerator.generate()``:
when a subclass overrides only ``agenerate()``, the sync entry point runs
the coroutine in a background thread.
Same philosophy we applied to the throttle queue wait elsewhere in the
PR: a defensive deadline on top of work that's already bounded by the
HTTP timeout doesn't add safety, it just produces spurious failures on
slow self-hosted endpoints. Drop the constant, the timeout exception
handling, and the ``timed_out`` bookkeeping. ``pool.shutdown(wait=True)``
becomes the simple cleanup.
Tests in ``test_async_generators.py`` exercise the happy path only and
don't depend on the timeout firing.
* Revert "fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync"
This reverts commit
|
||
|
|
47c72b3d87
|
fix(async): pack of fixes for async engine under degraded providers (#585)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(async): exclude all retryable errors from early-shutdown gate The gate previously only excluded `ModelRateLimitError`, leaving `ModelTimeoutError`, `ModelInternalServerError`, and `ModelAPIConnectionError` to count toward the sliding-window error rate. Under provider degradation these errors cluster in time (concurrent in-flight requests time out together), so 5/10 in a row is easy and trips the gate even when salvage could recover the rows. Refs #575. * feat(async): WARN log when provider showing degraded performance Diagnostic A/Bs against build.nvidia.com showed runs failing silently under provider degradation - no log indication that retryable errors were piling up until the early-shutdown gate fired (or, post-fix, until salvage exhaustion). Surfacing this earlier helps users distinguish "DataDesigner is broken" from "the upstream provider is slow today." Tracks a separate sliding window over retryable-vs-not for every task outcome (independent of the early-shutdown gate's window) and emits a throttled WARN when the rolling fraction crosses the threshold. Refs #575. * fix(async): salvage partial row groups on early shutdown Before: when the early-shutdown gate fired, any row group still in flight stayed in `_rg_states` un-checkpointed. The buffer manager later raised `FileNotFoundError` when the builder tried to read the finalized parquet. User-visible result: `0 records produced`. After: a new `_finalize_after_shutdown` step runs in `run()`'s finally block, after `_cancel_workers` has drained in-flight tasks (Codex caveat: in-flight `from_scratch`/`batch` tasks must not be allowed to write into a buffer that's being finalized). For each remaining row group it drops rows that aren't fully complete, then delegates to the existing `_checkpoint_completed_row_groups` so the buffer manager's zero-survivor handling (skip empty parquet, free buffer) kicks in unchanged. Also surfaces partial completion as a structured signal: scheduler exposes `early_shutdown: bool` and `partial_row_groups: tuple[int, ...]` properties so callers can detect partial completion programmatically rather than parsing log lines. Builder uses this to emit a more specific WARN distinguishing early shutdown from non-shutdown drops. Refs #575. * fix(throttle): reset consecutive_429s on non-rate-limit failure In `release_failure`, the cascade counter wasn't reset, so a sequence like 429 → 500 → 429 was treated as 2 consecutive 429s. The cascade counter feeds AIMD's reduce-once-per-cascade logic; the second 429 should start a fresh cascade and trigger another concurrency reduction, but currently doesn't. Standalone bug surfaced during #575 investigation; not on the failure path that drives the gate-trip outcome but worth fixing while we're in this code. * fix(custom): preserve retryability through CustomColumnGenerator wrap A real-workload run of #575 showed the early-shutdown gate still trips even with the gate-exclusion fix in place: the trigger is 10 timeouts inside Anonymizer's QA-repair custom columns, all wrapped in CustomColumnGenerationError (non-retryable) by the catch-all in CustomColumnGenerator. Two fixes here: 1. Re-raise RETRYABLE_MODEL_ERRORS unchanged before the wrap so the scheduler's _is_retryable correctly classifies them. 2. Surface _AsyncBridgedModelFacade timeouts as ModelTimeoutError instead of stdlib TimeoutError. Without this the sync bridge times out as the wrong exception type and is still classified non-retryable even after fix #1. Also moves _RETRYABLE_MODEL_ERRORS from async_scheduler to models/errors as the public RETRYABLE_MODEL_ERRORS tuple - both the scheduler and the wrap site need it, and models/errors is the appropriate home alongside the error class definitions. Refs #575. * feat(interface): typed DataDesignerEarlyShutdownError on zero-record runs When the async scheduler hits early shutdown and produces zero records, the buffer manager skips writing parquet (correctly), so ArtifactStorage.load_dataset_with_dropped_columns() raises FileNotFoundError. Previously this surfaced as a generic DataDesignerGenerationError wrapping the FileNotFoundError, which is ambiguous (could be missing files for any reason). This commit: - Adds DataDesignerEarlyShutdownError as a subclass of DataDesignerGenerationError so existing handlers still match while callers that want to react programmatically (retry on different alias, surface a degraded-provider message, etc.) can catch the specific type. - Plumbs the scheduler's structured signals (early_shutdown, partial_row_groups) up through the builder so they're available at data_designer.create() time without re-introspecting the scheduler. - create() raises the typed error in both failure modes (load fails or empty DataFrame returned) when builder.early_shutdown is True. Refs #575. * fix(async): emit first degraded-provider WARN regardless of clock state Initialize _last_degraded_warn_at to -inf so the first WARN is always emitted. The previous initialization to 0.0 suppressed the first WARN on fresh CI runners where time.monotonic() returns a small value (system boot uptime), making the throttle interval check (now - 0.0 < interval) true on the first attempt. * fix(async): address review findings on early-shutdown salvage PR Five real correctness issues caught in review of the original PR, plus a few smaller cleanups and test simplifications. Throttle - cascade reset (regression of existing AIMD invariant): release_failure() now resets consecutive_429s only when in_flight == 0. Resetting unconditionally broke "reduce once per cascade" when 429/500/429 arrived interleaved within a single in-flight burst - the second 429 was treated as a new cascade and the limit got halved twice for what was effectively one rate-limit event. Interface - typed-error gating: DataDesignerEarlyShutdownError now fires only when early_shutdown is true AND actual_num_records == 0. Without this, a partial-salvage run that fails to load for unrelated reasons (corrupt parquet, schema drift, disk hiccup) was misdiagnosed as "zero records produced," hiding the real cause. Async - WARN window scope: the degraded-provider warning was fed by every task outcome, including samplers and non-LLM customs. In realistic pipelines (one model column, several non-model columns) the rate stayed under threshold even when every model call was failing, silencing the WARN exactly when it mattered. Now gated on is_llm. Async/builder - signal preservation across raises: scheduler.early_shutdown and partial_row_groups are captured in a try/finally around future.result(), so a processor failure during the salvage path doesn't drop the structured signal. Both build() and build_preview() now reset per-run state at the start so reused builders don't leak prior-run flags. Async - dead code: dispatch_error capture in run() was unread (the post- finally check is unreachable on the exception path). Removed. Smaller cleanups: - early-shutdown WARN says "non-retryable error rate exceeded threshold" - bridge timeout WARN demoted to debug (ModelTimeoutError already surfaces it; the throttled degraded-provider WARN is the user-facing signal) - TODO note for threading degraded_warn_* through RunConfig - doc note in _finalize_after_shutdown clarifying that pre-batch processor isn't re-run on partial-salvage row groups Tests: - new regression tests for the cascade burst case, partial-salvage error gating, and LLM-only WARN window - direct unit test for _reset_run_state - dedup via _make_storage / _seed_plus_cell_setup helpers - WARN emission cases parametrized into a single test - shared parametrize lists hoisted to module-level constants - redundant cascade test dropped in favor of the more thorough drain variant; redundant healthy-baseline test folded into the zero-survivor test * chore(async): address Nabin's review comments Style cleanups, parametrization, docstring polish, and one consistency fix in the typed-error path. All non-blocking ("Ship it (with nits)"). interface/data_designer.py: - preview() now raises DataDesignerEarlyShutdownError when shutdown produced zero records (parity with create()), and also gates on actual_num_records == 0 so partial-salvage runs that fail to load don't get misdiagnosed - create()'s defensive empty-DF guard mirrors the load-failure guard with the same actual_num_records == 0 check async_scheduler.py: - _record_retryable_outcome docstring clarifies that the call site filters by is_llm; the function alone reads as if every outcome feeds the window dataset_builder.py: - moved _reset_run_state() down past the public methods to match the project's public-before-private convention test_custom.py: - flattened TestAsyncBridgedModelFacade class into module-level test functions (matches the rest of the file) - hoisted inline imports (asyncio, threading, patch, _AsyncBridgedModelFacade, SyncClientUnavailableError) to top of file - driven retryable-error parametrize off RETRYABLE_MODEL_ERRORS directly instead of the hand-rolled factory list, so new retryable types pick up coverage automatically - dropped the redundant "Sanity" block in test_async_bridge_timeout_raises_ model_timeout_error - pytest.raises already enforces the type, the duplicate block was running the same slow scenario twice test_async_scheduler.py: - parametrize over RETRYABLE_MODEL_ERRORS directly (same as above) test_data_designer.py: - added preview-path tests for the typed-error and partial-salvage fall-through cases - updated the existing empty-DF test to also patch actual_num_records=0 (otherwise the new gating in the empty-DF guard skips the typed error) * test(interface): consolidate create() error-dispatch tests into a matrix Five separate tests (two existing, three new from earlier in this PR) all probed the same dispatch logic in create(): "given a load outcome and a builder state, which error type should fire?" Pulled them into a single parametrized matrix indexed by (load_side_effect, early_shutdown, actual_num_records). Net result: 5 named tests → 1 parametrized test with 6 cells, and the previously-missing empty_df + shutdown + partial salvage cell is now covered. Test names retain readable IDs (load_fails_shutdown_zero_records etc.) so failures still pinpoint the exact case in pytest output. |
||
|
|
05c2e8df2e
|
fix: normalize image_url blocks to OpenAI-compliant dict format (#577)
* fix: normalize image_url blocks to OpenAI-compliant dict format (#576) ImageContext.get_contexts() produced bare-string and non-standard dict shapes for image_url content blocks, which broke the native OpenAI adapter (passes blocks through as-is) and only worked with Anthropic by accident via defensive handling in the translation layer. - Wrap all image_url values in {"url": ...} dict (OpenAI spec) - Remove non-standard "format" key from base64 dicts - Tighten Anthropic translate_image_url_block to require dict input Fixes #576 Made-with: Cursor * fix: reject malformed image_url blocks instead of silently dropping them translate_image_url_block now raises TypeError when image_url is not a dict. Since all image_url blocks are constructed internally, a bare string indicates an internal bug and should fail loudly. Made-with: Cursor * address review: tighten return type, add OpenAI + data-URI tests - Narrow _auto_resolve_context_value return type to dict[str, str] - Add OpenAI-client regression tests for image_url dict passthrough - Cover both bare-URL and bare-data-URI rejection in Anthropic tests Made-with: Cursor |
||
|
|
bfa7a46c9a
|
chore: async engine readiness - blockers and polish before default (#553)
* chore: async engine readiness blockers (#462) - Processor callback failures (pre-batch and post-batch) now raise DatasetGenerationError instead of silently dropping row groups - Early shutdown and all error paths drain in-flight workers via a finally block in AsyncTaskScheduler.run() - Pre-batch and post-batch processors that change row count in async mode raise immediately (strict_row_count guard) - Partial completion logs a warning when actual < target records - allow_resize=True auto-falls back to sync engine with a deprecation warning instead of raising, using a per-run _use_async flag - Preview path mirrors the trace check from the full build path; PreviewResults exposes task_traces Closes #462 * fix: address review findings for async engine readiness - Prevent double-wrapping of DatasetGenerationError in scheduler callbacks - Fix stacklevel in allow_resize DeprecationWarning to point at user code - Update stale comment to reflect fail-fast behavior - Rename misleading test and remove unused caplog fixture - Add zero-warnings assertion for happy-path case - Move warnings import to module level * fix: address review comments on async engine readiness - Extract _is_async_trace_enabled() helper to deduplicate trace check - Post-batch row-count guard now raises DatasetProcessingError (not DatasetGenerationError) so the scheduler wraps it with rg_id symmetrically with the pre-batch path - Add test_dropped_rows_reduce_actual_record_count for partial completion path * fix: address second-round review feedback on async engine readiness - DeprecationWarning no longer swallowed by interface error wrapper - Incomplete-RG log only fires on clean scheduler exits - Post-batch row-count guard moved into ProcessorRunner (strict_row_count) - Expose active_worker_count property on AsyncTaskScheduler - Drop unused monkeypatch fixture and pytest import * test: fold metadata-count test into dropped-rows test Remove test_write_metadata_records_actual_and_target_counts (poked _actual_num_records directly) and assert metadata counts in test_dropped_rows_reduce_actual_record_count instead, which exercises the same path through the public API. |
||
|
|
f6128220da
|
fix: prevent sticky progress bar ghost lines from terminal wrapping (#565)
* fix: prevent sticky progress bar ghost lines from terminal wrapping When a formatted bar line exceeded the terminal width, it wrapped to multiple physical lines but _drawn_lines only counted 1 per bar. Subsequent cursor-up clears missed the wrapped portions, leaving ghost lines that accumulated every redraw cycle. - Count physical lines in _redraw based on visible width vs terminal width - Cap rate at 9999.9 and eta at 999s to prevent stats field overflow - Remove max(10, ...) bar_width floor; degrade gracefully on narrow terminals - Add update_many() for batch updates with a single redraw cycle - Use update_many() in AsyncProgressReporter to reduce N redraws to 1 * test: strengthen wrapping and degradation tests per review feedback - Monkeypatch shutil.get_terminal_size to force narrow terminal in tests - Inject oversized lines via _format_bar patch to exercise physical line counting (ceiling division) code path - Assert output line width <= width-1 in graceful degradation test - Assert _drawn_lines == 1 in degradation mode (no false wrapping) * test: flatten test classes and replace private attribute assertions Address review feedback: use flat pytest functions per DEVELOPMENT.md conventions instead of class-based test suites. Move inline imports to module level. Replace most _drawn_lines and _bars assertions with public output proxies (counting CURSOR_UP_CLEAR sequences and checking rendered bar content). Keep _drawn_lines access only where no clean public proxy exists (multi-checkpoint add/remove test, zero-bars-remaining after log_final). * refactor: expose drawn_lines as public read-only property Replace _drawn_lines access in tests with a public property, consistent with the existing is_active pattern. |
||
|
|
956b8cd30d
|
refactor: unify duplicate DAG construction (dag.py + ExecutionGraph) (#511)
* refactor: unify DAG construction by moving topological sort into execution_graph.py Eliminates dag.py and its networkx dependency by moving topologically_sort_column_configs into execution_graph.py as a module-level function. Side-effect resolution is now O(1) via a side_effect_map dict (previously O(n²) linear scan). Kahn's algorithm is reused in-place rather than leaning on networkx.topological_sort. Closes #510 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: relax non-deterministic ordering assertion in test_dag test_judge and test_code_and_depends_on_validation_reasoning_traces have no mutual dependency and reach in-degree 0 simultaneously in Kahn's algorithm. Set iteration order varies with PYTHONHASHSEED, making the strict list assertion flaky. Assert only the topological invariants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(engine): document intentional skip.columns omission in topologically_sort_column_configs ExecutionGraph.create handles skip.when ordering edges in its own two-pass build; the pre-sort function only needs required_columns to produce a valid ColumnConfigT ordering for config compilation. * fix(engine): restore skip.columns edges in topologically_sort_column_configs The sync builder executes generators in compile-time sort order (from _column_configs, populated via this function), not ExecutionGraph order. Dropping skip.columns edges caused evaluate_skip_when to hit UndefinedError when the referenced column hadn't been generated yet, silently skipping rows. Also: - Refactor edge building into _add_edge() helper with a label parameter to distinguish "required" from "skip.when" edges in debug output - Rename test_dag.py -> test_topological_sort.py to match the new module location - Add from __future__ import annotations (required by AGENTS.md) - Add test_side_effect_column_ordering covering the side_effect_map.get() path - Add test_skip_when_column_ordering covering the skip.columns edge path * fix(tests): move SkipConfig import to module level in test_topological_sort * refactor(execution-graph): extract nested closures to module-level helpers, fix docs and test style - Extract `resolve`/`_add_edge` nested closures in `topologically_sort_column_configs` to module-level `_resolve_dag_column` and `_add_dag_edge` per STYLEGUIDE.md - Add self-edge guard in `_add_dag_edge` (consistent with `ExecutionGraph.create`) - Update `architecture/dataset-builders.md` to remove stale `dag.py`/NetworkX references - Fix import order in `test_topological_sort.py` (SkipConfig before column_configs) - Add `-> None` return annotations to legacy test functions * refactor(execution-graph): extract shared Kahn helper, inline resolve, fix module-level ordering - Extract `_kahns_topological_sort` shared helper used by both `ExecutionGraph.get_topological_order` and `topologically_sort_column_configs` - Inline `_resolve_dag_column` into `_add_dag_edge` (no other call sites) - Move private helpers after the public function (public-before-private per STYLEGUIDE) - Add docstring to `topologically_sort_column_configs` - Update architecture/dataset-builders.md to mention both sort sites for skip.when edges --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com> |
||
|
|
8be4ff787f
|
feat: add RunConfig jinja rendering engine (#557) | ||
|
|
a965bc1542
|
fix: bridge model.generate() to agenerate() for custom columns in async engine (#545)
* feat: bridge model.generate() to agenerate() for custom columns in async engine Custom column generators that call model.generate() fail under the async engine because the sync HTTP client is unavailable. Add an _AsyncBridgedModelFacade proxy in _build_models_dict() that intercepts the sync-client RuntimeError and schedules agenerate() on the engine's persistent event loop via run_coroutine_threadsafe. Includes a deadlock guard for async custom columns running on the event loop. * refactor: wrap facades at sync call site, not in _build_models_dict Move _AsyncBridgedModelFacade wrapping from _build_models_dict() into _invoke_generator_function() so the async path gets raw facades. The bridge proxy is only needed for sync custom columns; async columns already have direct access to model.agenerate(). * fix: address review feedback - typed exception, timeout cleanup, kwargs test - Introduce SyncClientUnavailableError so the facade catches by type instead of matching error strings (review comment #1) - Add future.cancel() + logger.warning() on timeout to match the _run_coroutine_sync pattern in base.py (review comment #2) - Assert kwargs forwarding in the async bridge test (review comment #4) * fix: let SyncClientUnavailableError propagate through @catch_llm_exceptions The decorator catches all exceptions and wraps them into DataDesignerError, which prevented the async bridge proxy from ever seeing the original error. Add an early match case that re-raises SyncClientUnavailableError directly. * refactor: make SYNC_BRIDGE_TIMEOUT a public constant Drop the underscore prefix since the constant is exported and used across modules (base.py and custom.py). |
||
|
|
a9af365e8e
|
feat: add skip.when conditional column generation (#502)
* plan: add skip_when for conditional column generation (#479)
Adds implementation plan for a `skip_when` field on `SingleColumnConfig`
that enables conditional column generation. When the Jinja2 expression
evaluates truthy, the cell is set to None and the generator is skipped.
Skips auto-propagate through the DAG to downstream columns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: remove HopChain example from skip_when plan
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: replace HopChain example with generic product review example
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: add open questions on skip sentinel value and row filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* plan: major revision — SkipConfig model, sync engine support, decouple propagation
- Introduce SkipConfig(when, value) as nested model on SingleColumnConfig
- Move propagate_skip to SingleColumnConfig as independent field, fixing
bug where columns with no SkipConfig couldn't participate in propagation
- Add full sync engine implementation (Steps 4a-4d) covering both
_fan_out_with_threads and _run_full_column_generator dispatch paths
- Add serialization boundary stripping for both DatasetBatchManager (sync)
and RowGroupBufferManager (async)
- Simplify architecture diagrams for readability
- Update all references, design decisions, verification plan
Made-with: Cursor
* updates
* plan: document get_required_columns for skip propagation
- Explain why propagation must not use get_upstream_columns() once
skip.when adds DAG edges; add _required_columns and
get_required_columns() to the execution graph plan
- Point async _run_cell at get_required_columns for parity with sync
- Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for
DataFrames; tighten resolved-questions wording
- Extend DAG/graph verification with gating_col regression case
Refs #479
Made-with: Cursor
* plan: centralize __skipped__ handling in skip_provenance
- Document new skip_provenance.py (key constant, read/write/strip API)
- Point sync builder, async scheduler, and batch buffers at shared helpers
- Strip metadata before every DataFrame from buffer dicts, including
FULL_COLUMN active subsets
- Split §3 into skip_evaluator vs skip_provenance; extend verification
Refs #479
Made-with: Cursor
* plan: align doc title with SkipConfig / skip.when
Drop legacy skip_when naming in headings and #362 cross-reference.
Refs #479
Made-with: Cursor
* plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization
- SkipConfig._validate_when_syntax now checks find_undeclared_variables
is non-empty, rejecting expressions without {{ }} delimiters that
would silently skip every row
- evaluate_skip_when centralizes try/except so both sync and async
engines get identical fail-safe behavior on eval errors
- evaluate_skip_when takes a single pre-deserialized record; caller
runs deserialize_json_values once and passes to both skip eval and
generator (no double deserialization, no redundant parameter)
- Update _should_skip_cell, async _run_cell, Files Modified table,
and verification section accordingly
Refs #479
Made-with: Cursor
* plan: add get_side_effect_columns accessor to execution graph spec
Document _side_effects_by_producer inverse map and
get_side_effect_columns() accessor on ExecutionGraph, needed by
_write_skip_to_record / apply_skip_to_record to clear __trace,
__reasoning_content, etc. on skip. Added to both Step 2b metadata
section and Files Modified table.
The __skipped__ leak into active_df (greptile's other P1) was already
fixed in
|
||
|
|
64f31bc581
|
feat: add generic and OpenRouter attribution headers (#542) | ||
|
|
533a94b78f
|
fix: async engine side-effect column propagation and collision resolution (#509)
* fix: async engine side-effect column propagation and collision resolution ExecutionGraph.set_side_effect() now uses first-writer-wins instead of last-writer-wins, matching sync engine semantics where earlier consumers see the first producer's value. This prevents false DAGCircularDependencyError when multiple generators declare the same side-effect column at different pipeline stages. AsyncTaskScheduler now includes side-effect columns in _instance_to_columns so their values are written to the RowGroupBufferManager and available to downstream prompt templates. Fixes #508 * fix: separate side-effect columns from completion tracking in async scheduler Side-effect columns added to _instance_to_columns caused KeyError in CompletionTracker._validate_strategy() because they are not registered in the execution graph. Split into _instance_to_write_columns (buffer writes, includes side-effects) and _instance_to_columns (completion tracking, real columns only). * fix: warn on side-effect collision and clarify scheduler column maps Log a warning when multiple producers register the same side-effect column (first-writer-wins still applies). Rename _instance_to_columns and _instance_to_write_columns per review feedback for clarity. * fix: raise ConfigCompilationError on duplicate side-effect producers Replace first-writer-wins collision handling with a hard error. Each side-effect column must have exactly one producer; duplicates are a configuration issue to be fixed at the source. * fix: reject duplicate side-effect producers in sync DAG path Mirror the async path check: raise ConfigCompilationError when two custom columns declare the same side-effect column name during topological sort. |
||
|
|
4a2813654a
|
fix: always return ISO-8601 from datetime postproc (#484) (#512)
* fix: always return ISO-8601 from datetime postproc (#484) The DatetimeFormatMixin.postproc heuristics inferred output format from value distribution, silently stripping date/time components for small datasets or narrow date ranges. Replace with deterministic ISO-8601 output via vectorized strftime. Users who need custom formats can still set convert_to on the SamplerColumnConfig. * docs: update convert_to docstring and add DatetimeFormatMixin docstring The SamplerColumnConfig.convert_to docstring incorrectly stated that only "float", "int", or "str" are accepted. Datetime/timedelta samplers accept strftime format strings. Also document the ISO-8601 default. * test: add regression test for #484 via DataDesigner.preview API Captures the exact reproducer from the issue: a single-record datetime preview through the public DataDesigner.preview() interface must return a full ISO-8601 timestamp, not a bare year string. * test: trim redundant datetime tests, align reproducer with issue #484 - Remove postproc_same_day_records (subsumed by same_month + no_convert_to) - Remove postproc_always_parseable (subsumed by stdlib_fromisoformat) - Remove all_same_month integration test (subsumed by narrow_range_single_day) - Update single_record test to use unit="h" matching the issue reproducer * fix: address review nits — move datetime import to module scope, drop redundant isinstance |
||
|
|
fdd5ebb5ef
|
feat: add Pi Coding Agent rollout seed source (#513) (#514)
Add support for ingesting Pi Coding Agent session artifacts as an agent rollout seed source. Pi sessions are tree-structured JSONL files; the handler resolves the active conversation path by walking from the last entry back to the root via id/parentId links. Key points: - Tree-structured sessions with automatic active-path resolution - Entry-level types: model_change, compaction, branch_summary, custom_message, thinking_level_change - Message roles: user, assistant (inline ToolCall/ThinkingContent/ TextContent blocks), toolResult, bashExecution (synthesized as tool-call pairs), custom, compactionSummary, branchSummary - Extract shared normalize_message_content to utils.py (was duplicated in Hermes handler) |
||
|
|
c27ad62a14
|
fix: use non-blocking dispatch to prevent pipeline starvation (#504) (#505)
Replace blocking semaphore acquire in the dispatch loop with a non-blocking try_acquire that breaks out when the semaphore is full. This causes the outer loop to re-query the frontier, picking up newly-ready downstream tasks instead of draining a stale snapshot. Fixes #504 Made-with: Cursor |
||
|
|
7891dd53cb
|
feat: add Hermes Agent rollout support (#500) | ||
|
|
58870bb83f
|
feat: add ATIF rollout ingestion (#495) | ||
|
|
d43ac1cb2e
|
test: add transport-wiring regression tests for #459 (#485)
* test: add transport-wiring regression tests for #459 The existing test_client_limits_respect_max_parallel_requests only checks the limits property, which was always computed correctly even before the fix. Add mock-capture tests that patch HTTPTransport and AsyncHTTPTransport constructors, trigger lazy init via completion() / acompletion(), and assert the constructors received the correct limits. These tests fail on the pre-fix code (assert_called_once fails because the old code never explicitly constructed the transport). Made-with: Cursor * test: address review — parametrize over AnthropicClient, use helpers Parametrize all three #459 regression tests over both OpenAICompatibleClient and AnthropicClient (wiring lives in HttpModelClient, so both subclasses need coverage). Use the existing _make_openai_client / _make_anthropic_client helpers with **kwargs instead of constructing clients directly. Move transport patch constants up with the other module-level constants. Made-with: Cursor |
||
|
|
97119863da
|
fix: respect max_parallel_requests in HTTP connection pool size (#460)
* fix: respect max_parallel_requests in HTTP connection pool size Pass a pre-configured HTTPTransport/AsyncHTTPTransport with the correct limits into RetryTransport instead of letting it create its own pool with httpx defaults (100 connections). Previously, the limits calculated from max_parallel_requests were passed to httpx.Client(limits=...) which silently ignores them when a custom transport is provided. Fixes #459 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * fix: document transport param and annotate private attr chains in tests Address Greptile P2 review comments on PR #460: - Add docstring entry for the new `transport` parameter in `create_retry_transport` explaining accepted types and None default - Add inline comments in pool-size regression tests explaining the private attribute chain (_sync/_async_transport → _pool → _max_connections) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * refactor: address review feedback — public limits property, clean tests - Drop three tests from test_retry.py that reached into private attributes of third-party RetryTransport (_sync_transport, _async_transport); the end-to-end contract is covered by the pool-size regression test - Expose a public `limits` property on HttpModelClient so tests and diagnostic code can assert the pool configuration without walking private attribute chains across three libraries - Replace two private-chain pool assertions with a single `client.limits.max_connections == 600` check against the new property - Trim "inner" from the transport docstring entry (nabinchha suggestion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> --------- Signed-off-by: Przemysław <przemekboruta@interia.pl> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
c146af5146
|
chore: async engine follow-up - rename, preview, lifecycle, progress (#456)
* chore: rename ColumnWiseDatasetBuilder, wire async preview, unify row-group lifecycle - Rename ColumnWiseDatasetBuilder to DatasetBuilder and column_wise_builder.py to dataset_builder.py, update all references - Extract _prepare_async_run() factory shared by build and preview paths - Add _build_async_preview() for async preview with no disk checkpoints - Replace on_row_group_complete/on_checkpoint_complete with single on_finalize_row_group callback; caller handles checkpointing - Add free_row_group() on RowGroupBufferManager for discard-without-write - Free fully-dropped row groups instead of finalizing them - Add consolidated AsyncProgressReporter for async generation logging Closes #437, closes #442, closes #444 * feat: add consolidated async progress reporting with row group context - Add AsyncProgressReporter: groups per-column progress into a single log block emitted at configurable intervals (default 5s) - Add quiet mode to ProgressTracker to suppress per-tracker logging when used with the consolidated reporter - Add ContextVar-based row group tagging (RG1, RG2, ...) for log messages emitted inside async tasks (samplers, expressions, seeds) - Add progress_interval to RunConfig for user-configurable reporting - Remove log_start_async from ProgressTracker (superseded by reporter) Closes #443 * fix async preview and progress reporting * feat: add opt-in sticky ANSI progress bars for generation Add RunConfig.progress_bar setting that replaces periodic log-line progress with sticky terminal bars that stay at the bottom while logs scroll above. Pure ANSI escape codes, no new dependencies. Disabled by default - existing log-based output unchanged. * docs: add missing progress_interval docstring in RunConfig * fix: update progress bar on every completion in async path Skip the time gate when the progress bar is active so the bar redraws on every record instead of every progress_interval seconds. * fix: resolve row-group semaphore deadlock when all tasks are deferred When all tasks for admitted row groups fail with transient errors, the row-group semaphore never releases, blocking admission of new row groups. Fix by salvaging stalled row groups inline - retrying deferred tasks immediately so row groups can checkpoint and free their semaphore slots. Also updates row group log format to (x/X) with leading zeros. * fix: eagerly salvage stalled row groups to avoid wasting semaphore slots Run inline salvage after every checkpoint pass instead of only when globally stalled. Row groups with 0 in-flight and only deferred tasks are salvaged immediately, freeing their semaphore slot for new work. * fix: address review findings from greptile and codex - Use `with` statement for progress bar context (safe __exit__ on error) - Check bar.is_active instead of bar is not None (non-TTY fallback) - Record failures (not skips) for tasks that exhaust salvage retries - Record skipped tasks when pre-batch filtering drops rows * fix: stable progress bar width and accurate failure counts - Pre-compute fixed stats width at bar creation to prevent bar resizing when failed count appears - Cap displayed completed at total to avoid >100% on retries - Exclude already-failed columns from skip recording to prevent double-counting in progress reporter * fix: address Nabin's review - exclude_columns, dead code, docstring - Add exclude_columns={task.column} on non-retryable batch/from_scratch drop path to prevent double-counting (same pattern as cell path) - Simplify salvage drop to per-task exclude (is_dropped guard handles multi-column case) - Remove dead _in_flight_for_rg method - Fix context.py docstring to match actual (x/X) format * fix: _drain_frontier exits before dispatching ready salvage tasks After salvage discards a cell task from dispatched (making it available in the frontier), _drain_frontier broke immediately because nothing was in-flight yet. The task and its downstream were never re-dispatched, leaving the row group incomplete. Fix: only break when both ready and in-flight are empty. * fix: salvage edge cases found by code review - _drain_frontier: only break when both ready and in-flight are empty - _salvage_rounds: re-mark sibling columns as dispatched after from_scratch retry to prevent duplicate dispatch - _salvage_stalled_row_groups: separate exhausted tasks from new drain failures to avoid treating non-stalled tasks as permanent - _checkpoint_completed_row_groups: clean up deferred tasks for checkpointed row groups - Early shutdown: salvage stalled row groups before exiting * fix: skip record_failure for already-dropped rows in salvage When multiple columns fail for the same row, the first drop records a skip for the other column. Without this guard, record_failure fires again for the second column, double-counting it. |
||
|
|
1356408c6e
|
feat: remove litellm dependency and bridge path (#455)
* adjust plan * feat: remove litellm dependency and bridge path (PR-7) - Delete litellm_bridge.py adapter, litellm_overrides.py, and their tests - Remove LiteLLM fallback branch and DATA_DESIGNER_MODEL_BACKEND env var from clients/factory.py; unknown provider_type now raises ValueError - Remove apply_litellm_patches() call from models/factory.py - Remove LiteLLM exception match arms and DownstreamLLMExceptionMessageParser from models/errors.py; port context window detail extraction to _extract_context_window_detail for native ProviderError path - Remove litellm from lazy_heavy_imports.py and pyproject.toml runtime deps - Remove flatten_extra_body parameter from TransportKwargs.from_request - Clean up LiteLLM references in docstrings, comments, and AGENTS.md - Add full ProviderErrorKind test coverage to test_model_errors.py - Update benchmark script to patch OpenAICompatibleClient instead of CustomRouter Made-with: Cursor * fix: forward tools to _fake_response in benchmark patch The old CustomRouter patch forwarded **kwargs (including tools) to _fake_response, but the new OpenAICompatibleClient patch only passed model and messages — silently disabling tool-call simulation in benchmark scenarios that exercise allow_tools. Made-with: Cursor * fix: address PR-7 review feedback - Return ChatCompletionResponse from benchmark fakes instead of FakeResponse to match the native client contract (facade expects .message, not .choices[0].message) - Add ids= to parametrize block in test_model_errors.py for readability - Remove unnecessary try/except from _extract_context_window_detail; the `if marker in` guard is sufficient - Make context window marker match case-insensitive - Replace stale httpx.AsyncClient callout in async_concurrency.py docstring with generic "async-stateful resources" Made-with: Cursor |
||
|
|
f24e88d0f8
|
feat: wire ThrottledModelClient and dual-semaphore scheduler (#449)
* feat: constrain HttpModelClient to single concurrency mode Constrain each HttpModelClient instance to sync or async at construction time, eliminating dual-mode lifecycle complexity that caused transport leaks and cross-mode teardown bugs. - Add ClientConcurrencyMode StrEnum replacing Literal type alias - Add concurrency_mode constructor param with mode enforcement guards on _get_sync_client / _get_async_client - Simplify close()/aclose() to single-mode teardown (cross-mode calls are no-ops) - Thread client_concurrency_mode through factory chain from DATA_DESIGNER_ASYNC_ENGINE env var - Add ModelRegistry.arun_health_check() async mirror and wire async dispatch in ColumnWiseDatasetBuilder - Make ensure_async_engine_loop public (used cross-module) - Fix test helpers to derive concurrency mode from injected client - Add PR-5 architecture notes * fix: address review findings on HttpModelClient lifecycle - Fix transport double-close in close()/aclose() by delegating teardown to the httpx client when one exists (if/elif pattern); only close transport directly if no client was ever created - Reject mismatched client/mode injection in constructor (e.g. async_client on a sync-mode instance raises ValueError) - Add 5-minute wall-clock timeout to future.result() in async health check dispatch - Add constructor validation tests for both mismatch directions - Update PR-5 architecture notes Made-with: Cursor * fix: address PR-5 review feedback on HttpModelClient lifecycle - Type _transport as RetryTransport | None, removing type: ignore suppressions in close()/aclose() - Make transport fully lazy (None by default) and accept optional transport constructor param so injected-client paths don't eagerly allocate an unused RetryTransport - Cancel future on TimeoutError in async health check dispatch so timed-out coroutines don't linger on the shared event loop - Set health check timeout to 180s (3 min) matching architecture notes - Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the list parametrizes both sync and async mode tests - Update architecture notes timeout from 180→300 back to 180 to match implementation Made-with: Cursor * add plan for pr-6 * update plan * feat: wire ThrottledModelClient and dual-semaphore scheduler for AIMD concurrency control Introduces per-request AIMD throttle feedback by wrapping every ModelClient with ThrottledModelClient (acquire/release around each HTTP call) and adding a dual-semaphore scheme to AsyncTaskScheduler that separates submission slots from LLM-wait slots via a one-way handoff. Key changes: - ThrottledModelClient: decorator that acquires/releases ThrottleManager permits around sync and async completion, embedding, and image calls, routing 429s to release_rate_limited for AIMD backoff. - RetryConfig: removes 429 from retryable_status_codes so rate-limit signals propagate to the AIMD feedback loop instead of being silently retried at the transport layer. - AsyncTaskScheduler: adds TrackingSemaphore, LLM-wait semaphore, and is_llm_bound-driven handoff; extracts _main_dispatch_loop and _salvage_rounds; tracks worker tasks for clean cancellation. - ColumnGenerator.is_llm_bound property on base, model-registry, and custom generators to classify which columns need the handoff. - ThrottleConfig (Pydantic model on RunConfig) for user-tunable AIMD knobs (reduce_factor, additive_increase, success_window, block_seconds). - ModelRegistry.get_aggregate_max_parallel_requests() to size the LLM-wait semaphore from registered model configs. - Renames throttle.py -> throttle_manager.py for clarity. Made-with: Cursor * fix: address PR-6 review feedback on concurrency safety - Shield _cancel_workers() with asyncio.shield in the CancelledError handler to prevent double-cancellation from leaking semaphore permits - Guard ThrottleManager release calls in _throttled_sync/_athrottled with try/except so a failing release doesn't mask the original error or permanently leak the in-flight permit counter - Add registration guard in ThrottleManager.try_acquire() that raises RuntimeError if called before register(), making the ordering invariant explicit rather than relying on temporal coupling - Add is_registered() public query method on ThrottleManager - Expand TrackingSemaphore docstring noting CPython _value dependency Made-with: Cursor * fix test * feat: AIMD throttle refinements — cascade dampening, ceiling stabilization, early shutdown exclusion Dampen aggressive cascade 429 reduction (only the first 429 in a burst reduces the limit; subsequent cascade 429s release permits without further reduction). Add rate_limit_ceiling with configurable overshoot to stabilize concurrency instead of sawtooth recovery. Exclude ModelRateLimitError from early shutdown error rate. Consolidate AIMD defaults into ThrottleConfig ClassVar constants as single source of truth, and update ThrottleManager to accept ThrottleConfig directly. Update architecture notes for PR-6. Made-with: Cursor * fix: address PR-6 review feedback on retry boundary, capacity polling, and test coverage - Keep 429 in transport retry list for sync-mode clients (no salvage queue); only strip for async-mode where AIMD + salvage handles it. Add strip_rate_limit_codes kwarg to create_retry_transport; wire through HttpModelClient based on concurrency mode. - Return CAPACITY_POLL_INTERVAL (50ms) instead of default_block_seconds (2s) from try_acquire when at capacity but not rate-limited, so callers poll responsively when a slot could free in milliseconds. - Document registration ordering invariant on create_model_client's throttle_manager param. - Add semaphore permit assertions to test_scheduler_llm_bound_one_way_handoff matching the non-LLM counterpart test. - Update architecture notes for all changes. Made-with: Cursor * add missing test * fix: consolidate ThrottleConfig and improve naming Move ceiling_overshoot from ThrottleManager constructor kwarg into ThrottleConfig (single source of truth for all AIMD tuning knobs). Rename block_seconds → cooldown_seconds and block_duration → cooldown_duration throughout for clarity. Remove DEFAULT_CEILING_OVERSHOOT module constant from throttle_manager.py. Made-with: Cursor * fix: annotate state capture invariant in acquire_sync/acquire_async Add inline comment explaining why the captured DomainThrottleState reference is safe to reuse in the finally block (objects are never replaced after creation). Made-with: Cursor |
||
|
|
015d4f188f
|
feat: Constrain HttpModelClient to single concurrency mode... (#439)
* feat: constrain HttpModelClient to single concurrency mode Constrain each HttpModelClient instance to sync or async at construction time, eliminating dual-mode lifecycle complexity that caused transport leaks and cross-mode teardown bugs. - Add ClientConcurrencyMode StrEnum replacing Literal type alias - Add concurrency_mode constructor param with mode enforcement guards on _get_sync_client / _get_async_client - Simplify close()/aclose() to single-mode teardown (cross-mode calls are no-ops) - Thread client_concurrency_mode through factory chain from DATA_DESIGNER_ASYNC_ENGINE env var - Add ModelRegistry.arun_health_check() async mirror and wire async dispatch in ColumnWiseDatasetBuilder - Make ensure_async_engine_loop public (used cross-module) - Fix test helpers to derive concurrency mode from injected client - Add PR-5 architecture notes * fix: address review findings on HttpModelClient lifecycle - Fix transport double-close in close()/aclose() by delegating teardown to the httpx client when one exists (if/elif pattern); only close transport directly if no client was ever created - Reject mismatched client/mode injection in constructor (e.g. async_client on a sync-mode instance raises ValueError) - Add 5-minute wall-clock timeout to future.result() in async health check dispatch - Add constructor validation tests for both mismatch directions - Update PR-5 architecture notes Made-with: Cursor * fix: address PR-5 review feedback on HttpModelClient lifecycle - Type _transport as RetryTransport | None, removing type: ignore suppressions in close()/aclose() - Make transport fully lazy (None by default) and accept optional transport constructor param so injected-client paths don't eagerly allocate an unused RetryTransport - Cancel future on TimeoutError in async health check dispatch so timed-out coroutines don't linger on the shared event loop - Set health check timeout to 180s (3 min) matching architecture notes - Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the list parametrizes both sync and async mode tests - Update architecture notes timeout from 180→300 back to 180 to match implementation Made-with: Cursor * fix: address Greptile review findings on health check methods - Use bare `raise` instead of `raise e` in both run_health_check and arun_health_check to preserve original traceback frames - Add GenerationType.IMAGE test coverage for sync and async health checks (stub-image config + generate_image/agenerate_image patches) Made-with: Cursor * Update packages/data-designer-engine/tests/engine/models/test_model_registry.py |
||
|
|
7e812630cf
|
feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder (#429)
* feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder * chore: add async benchmark notebook and demo scripts * fix: address all PR review comments on async builder integration - Wire on_batch_complete through on_row_group_complete callback - Mark trailing slots as dropped in replace_dataframe when processor filters rows - Ensure checkpoint still runs when on_before_checkpoint raises - Gate non-seed task dispatch on pre-batch completion - Add public run_pre_batch_on_df to ProcessorRunner (replaces private _run_stage call) - Add public is_column_complete_for_rg to CompletionTracker (replaces private _completed access) - Type task_traces as list[TaskTrace] in results.py - Add async_trace docstring to RunConfig - Move module-level log into _build_async - Add replace_dataframe unit tests (same-size, dropped rows, fewer rows) - Assert on public outcomes in scheduler tests instead of private attributes - Parametrize allow_resize validation tests - Cache seed_cols before main loop - Remove redundant disable_early_shutdown from AsyncTaskScheduler * style: fix ruff format for lambda expression * fix: address open review issues on async scheduler - Flush completed row groups before breaking on early shutdown (data loss) - Change error rate check from >= to > so disable_early_shutdown sentinel (1.0) never triggers at 100% failure rate - Extract seeds-complete check into helper and call it in salvage rounds via _drain_frontier, with pre-batch gating, so pre-batch processor runs even when seed tasks succeed only after retry - Fix is_column_complete_for_rg to check _batch_complete first, then verify all non-dropped rows for CELL_BY_CELL columns - Replace O(|in-flight|) scan in _in_flight_for_rg with per-RG counter * fix: sync pre-batch row drops to CompletionTracker and restore stderr safely Pre-batch processors that filter rows marked them as dropped in RowGroupBufferManager but not in CompletionTracker, causing unnecessary LLM calls for rows that would be discarded at checkpoint time. Also wrap the benchmark warmup stderr redirect in try/finally so stderr is restored if _run_once raises. * fix: prune _admitted_rg_ids on row group checkpoint Prevents unbounded growth of the admission set across large runs. * chore: remove demo/async files from PR Dev-time benchmarks and manual test scripts - kept locally, not needed in the PR. * fix: wire disable_early_shutdown into AsyncTaskScheduler RunConfig.disable_early_shutdown was forwarded to the sync executor but silently ignored in the async path. Now passed through to the scheduler's _check_error_rate. * test: add e2e test for async engine concurrency Verifies the async scheduler dispatches independent LLM columns concurrently by checking for overlapping task trace intervals. Uses a wide DAG (sampler -> 2 parallel LLM columns) with 2 records. Requires NVIDIA_API_KEY. * fix: drop row group on on_before_checkpoint failure instead of writing unprocessed data Matches on_seeds_complete failure behavior and avoids silently checkpointing unfiltered rows when a post-batch processor fails. * fix: skip on_before_checkpoint when no POST_BATCH processors configured Avoids unnecessary DataFrame round-trip for every row group in the common case where no post-batch processors exist. * fix: address remaining review nits from nabinchha and greptile summary - Gate on_seeds_complete on PRE_BATCH processors (matches on_before_checkpoint pattern) - Cache seed_cols as instance attr instead of recomputing in _dispatch_seeds - Iterate list(self._active_rgs) snapshot in _run_seeds_complete_check - Add logger.debug to telemetry except block - Add design comment on on_before_checkpoint failure drop behavior - Rename row_group param to row_group_index in is_column_complete_for_rg - Document rg_id as current_batch_number equivalence - Use mock.patch.object in e2e test instead of direct mutation - Add max(0, ...) floor guard on _in_flight_counts decrement - Rename _ensure_async_engine_loop to public ensure_async_engine_loop - Move AsyncTaskScheduler import to module level in integration tests * fix: preserve async callback contract and e2e setup * fix: prune _seeds_dispatched_rgs and _pre_batch_done_rgs on checkpoint * refactor: consolidate per-RG state into _RowGroupState dataclass Replace 5 independent collections (_active_rgs, _admitted_rg_ids, _seeds_dispatched_rgs, _pre_batch_done_rgs, _in_flight_counts) with a single _rg_states dict keyed by row group ID. Cleanup is now a single `del` instead of N separate discards, eliminating the class of bugs where one collection is missed during row group teardown. * fix: skip checkpoint and callbacks when on_before_checkpoint fails When on_before_checkpoint raises and all rows are dropped, the code previously fell through to checkpoint_row_group and on_row_group_complete, sending a spurious progress notification for a batch with zero records. Now gates both on a `dropped` flag so they are skipped after failure. * fix: snapshot dropped rows before await in _run_batch and sync tracker on checkpoint failure Two fixes: - _run_batch: snapshot dropped rows before `await agenerate` so the row-count expectation matches batch_df. Concurrent tasks can drop rows during the await, causing a spurious ValueError that would drop the entire row group. Write-back now re-checks is_dropped to skip rows dropped mid-flight. - _checkpoint_completed_row_groups: add tracker.drop_row alongside buffer_manager.drop_row when on_before_checkpoint fails, keeping both in sync. * feat: sliding window error rate and out-of-order row group completion test Replace cumulative error counters with a deque-based sliding window so that early transient failures do not permanently inflate the error rate in long-running jobs. Add tests for the sliding window recovery path and for deterministic out-of-order row group checkpoint ordering. * fix: use real time delays in out-of-order completion test asyncio.sleep(0) interleaving is not deterministic across Python versions. Switch to asyncio.sleep(num_records * 0.02) so the smaller row group genuinely finishes seeds first regardless of event loop scheduling. * fix: prevent ZeroDivisionError when shutdown_error_window is 0 Change RunConfig.shutdown_error_window constraint from ge=0 to ge=1 so the sliding window denominator is never zero. * fix: address Greptile review nits in async_scheduler - Move del _rg_states inside try/finally so semaphore is always released - Add exc_info=True to pre-batch failure log for consistent tracebacks - Short-circuit _check_error_rate when _early_shutdown already set * fix: address Greptile summary findings - Remove duplicate async engine log in build() (kept in _build_async) - Guard _run_seeds_complete_check with has_pre_batch at both call sites - Change error rate comparison from > to >= to match sync path semantics |
||
|
|
a0fb04ee07
|
feat: agent rollout trace ingestion (#399) | ||
|
|
a9a16ae61a
|
feat: Native Anthropic adapter with shared HTTP client infrastructure (#426)
* plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @dataclass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * add native OpenAI adapter with retry and throttle infrastructure - Implement OpenAICompatibleClient using httpx with RetryTransport - Add ThrottleManager with AIMD concurrency control and structured logging - Route provider_type=openai to native adapter in client factory - Add extract_reasoning_content helper for vLLM field migration - Make ModelRegistry own ThrottleManager and RetryConfig explicitly - Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override Made-with: Cursor * Self CR * fix claude slop * Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler * wrap facade close in try/catch * clean up stray params * fix: address review findings from model facade overhaul PR3 - Fix metadata bug: drop unknown kwargs instead of passing them to ChatCompletionRequest (which has no metadata field), preventing a runtime TypeError. - Lazy-init httpx clients: sync and async clients are created on first use instead of eagerly in the constructor, with constructor injection for testability. - Remove defensive getattr on httpx.Response.status_code (always present). - Add comment clarifying throttle manager wiring is deferred. - Refactor tests to use constructor injection instead of private attribute mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix stray inclusion of metadata * small regression fix * address more feedback * self review * Fixes * new test for aimd lifecycle * update plan docs * update plans with refs to prs * fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test lay init * fix timeout for openaicompatibleadapter * remove unused attr * fix: address review findings from PR #402 - Guard reasoning_content fallback with isinstance(str) check to prevent non-string provider values from violating the return type contract - Normalize provider_type comparison to lowercase so mixed-case configs (e.g. "OpenAI") route to the native adapter instead of silently falling through to the bridge - Document register-before-acquire ordering invariant on ThrottleManager - Add TODO to remove 429 from retryable_status_codes once ThrottleManager is wired via AsyncTaskScheduler (plan 346) Made-with: Cursor * Address pr feedback * fix method order * feat: native Anthropic adapter with image block translation Implement AnthropicClient as a native httpx-based adapter for the Anthropic Messages API, following the same patterns as OpenAICompatibleClient. Route provider_type="anthropic" to the native adapter in the client factory, with unknown providers falling through to LiteLLMBridgeClient. Key adapter behaviors: - System message extraction to top-level `system` parameter - OpenAI image_url content blocks translated to Anthropic image/source format - tool_use / thinking content blocks normalized to canonical types - x-api-key auth (not Bearer), anthropic-version header - max_tokens defaults to 4096 when omitted - stop mapped to stop_sequences - Embeddings and image generation raise unsupported capability errors Made-with: Cursor * fix: exclude OpenAI-specific params from Anthropic payload and drop unnecessary re.DOTALL Expand _ANTHROPIC_EXCLUDE to filter response_format, frequency_penalty, presence_penalty, and seed from TransportKwargs forwarding. Remove the unnecessary re.DOTALL flag from _DATA_URI_RE since base64 data never contains newlines. Made-with: Cursor * Fix failing test * fix: address PR #402 review findings - Add POST to RetryTransport allowed_methods so retries actually fire for all adapter endpoints (chat, embeddings, images are all POST) - Guard close()/aclose() with _init_lock to prevent TOCTOU race with lazy client initialization - Detect "unsupported parameter" patterns in 400 responses so the native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST - Return None from coerce_message_content for image-only content lists instead of leaking the Python repr via str(content) - Restore per-request timeout forwarding for LiteLLMBridgeClient via _with_timeout helper (broken when "timeout" was added to _META_FIELDS) - Normalize ModelProvider.provider_type to lowercase via field_validator so consumers don't need per-site .lower() calls - Fix unclosed httpx.AsyncClient in lazy-init test Made-with: Cursor * updates to DRY out code between the two adapters * refactor code to introduce HttpModelClient * update tests * fix: address PR #426 review findings - Release _aclient in close() to prevent async connection pool leak when both sync and async clients were initialized - Drop malformed image_url blocks (missing image_url key) instead of forwarding them unchanged to the Anthropic API - Preserve image blocks in system messages by returning Anthropic block-list format when non-text content is present - Rename extract_system_text to extract_system_content and add merge_system_parts helper for mixed string/block system parts Made-with: Cursor * fix: improve error classification and surface provider messages for 400s - Handle /v1 in Anthropic endpoint gracefully to avoid path duplication - Add QUOTA_EXCEEDED provider error kind for credit/billing failures - Extend UNSUPPORTED_PARAMS detection for mutually exclusive params - Surface raw provider message in formatted errors for 400 status codes - Consolidate provider message helpers into single _attach_provider_message Made-with: Cursor * fix: address PR #426 review findings (round 2) - Fix close() double-close of shared transport by closing self._transport directly instead of accessing private aclient._transport (critical) - Add TODO for threading.Lock → asyncio.Lock split (plan-346) - Remove unused _model_id from HttpModelClient and all callers - Export AnthropicClient from adapters __init__.py - Filter empty text blocks in translate_tool_result_content join - Move mock helpers to conftest.py with consistent json.dumps text default - Add __init__.py files to enable absolute imports from test conftest - Add bridge env override test for anthropic provider - Add ConnectionError and non-JSON response tests for AnthropicClient - Assert secret_resolver.resolve called with correct key ref in factory tests Made-with: Cursor * Update license headers * Fix anthropic tool call flow * fix: add explicit UNSUPPORTED_CAPABILITY error mapping - Add ModelUnsupportedCapabilityError so unsupported operations (e.g. Anthropic embeddings/image-generation) surface a specific error instead of falling through to generic ModelAPIError - Forward the provider's operation-specific message in the cause - Add parametrized test case for the new error path |
||
|
|
12baad776d
|
feat: Refactor person data reading for client ddb connection control (#393) | ||
|
|
c88508790f
|
test: trim FileSystemSeedReader follow-up coverage (#432) | ||
|
|
61fa0150f7
|
fix: support nested field access in schema transform templates (#435)
* feat: support nested field access in schema transform templates
Enable {{ result.quality.score }} style dot notation in schema
transform Jinja2 templates, where result is a deserialized JSON column.
Previously, _json_escape_record flattened all dict values to escaped
JSON strings before Jinja2 saw them. This made the rendered output
valid JSON but prevented nested access since Jinja2 only saw strings.
The fix introduces TemplateValue, a wrapper that defers the choice
between "drill into nested dict" and "render as escaped string" to
template evaluation time. Jinja2 resolves dot notation via __getattr__
(returning a new TemplateValue for the nested value), and converts to
string via __str__ (delegating to a caller-provided str_fn). This is
necessary because plain dicts render as Python repr ({'key': 'val'})
which is invalid JSON - we need to control __str__ to produce properly
escaped JSON, and that requires a wrapper object.
Other Jinja2 consumers (prompt templates, expression columns) don't
need this - Jinja2 natively supports dot access on plain dicts via
getattr-to-getitem fallback, and plain str() is fine for text output.
Schema transform is unique because its output must be valid JSON.
* Address PR review comments
- Fix boolean serialization: add bool check before str in _escape_value_for_json
to produce JSON 'true'/'false' instead of Python 'True'/'False'
- Add class-level _record_str_fn annotation to WithJinja2UserTemplateRendering
- Rename skip_record_sanitization to _skip_record_sanitization (underscore prefix)
to signal internal-only usage, and document it in safe_render docstring
- Add defensive error handling in TemplateValue.__getitem__ and __iter__
- Promote test input data to parametrize column, removing brittle string scan
* Address second round of PR review comments
- Add __eq__ and __hash__ to TemplateValue so Jinja2 equality
conditionals (e.g. {% if result.label == "excellent" %}) work
- Add inline comment explaining deliberate double-encode in
_escape_value_for_json for dict/list values
- Default _record_str_fn to None at class level so accessing it
before prepare_jinja2_template_renderer doesn't mask the real error
* refactor: replace TemplateValue with Jinja2 finalize hook
Use Jinja2's built-in finalize hook for value-to-string conversion
and a getattr override for dict-key-priority lookup, eliminating the
custom TemplateValue wrapper class entirely.
* docs: add missing param docs to prepare_jinja2_template_renderer
|
||
|
|
3d33680b7d
|
feat: support 1-to-many FileSystemSeedReader hydration (#424) | ||
|
|
255e78da83
|
feat: Native OpenAI adapter with retry and AIMD throttle infrastructure (#402)
* plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @dataclass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * add native OpenAI adapter with retry and throttle infrastructure - Implement OpenAICompatibleClient using httpx with RetryTransport - Add ThrottleManager with AIMD concurrency control and structured logging - Route provider_type=openai to native adapter in client factory - Add extract_reasoning_content helper for vLLM field migration - Make ModelRegistry own ThrottleManager and RetryConfig explicitly - Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override Made-with: Cursor * Self CR * fix claude slop * Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler * wrap facade close in try/catch * clean up stray params * fix: address review findings from model facade overhaul PR3 - Fix metadata bug: drop unknown kwargs instead of passing them to ChatCompletionRequest (which has no metadata field), preventing a runtime TypeError. - Lazy-init httpx clients: sync and async clients are created on first use instead of eagerly in the constructor, with constructor injection for testability. - Remove defensive getattr on httpx.Response.status_code (always present). - Add comment clarifying throttle manager wiring is deferred. - Refactor tests to use constructor injection instead of private attribute mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix stray inclusion of metadata * small regression fix * address more feedback * self review * Fixes * new test for aimd lifecycle * update plan docs * update plans with refs to prs * fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test lay init * fix timeout for openaicompatibleadapter * remove unused attr * fix: address review findings from PR #402 - Guard reasoning_content fallback with isinstance(str) check to prevent non-string provider values from violating the return type contract - Normalize provider_type comparison to lowercase so mixed-case configs (e.g. "OpenAI") route to the native adapter instead of silently falling through to the bridge - Document register-before-acquire ordering invariant on ThrottleManager - Add TODO to remove 429 from retryable_status_codes once ThrottleManager is wired via AsyncTaskScheduler (plan 346) Made-with: Cursor * Address pr feedback * fix method order * Fix failing test * fix: address PR #402 review findings - Add POST to RetryTransport allowed_methods so retries actually fire for all adapter endpoints (chat, embeddings, images are all POST) - Guard close()/aclose() with _init_lock to prevent TOCTOU race with lazy client initialization - Detect "unsupported parameter" patterns in 400 responses so the native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST - Return None from coerce_message_content for image-only content lists instead of leaking the Python repr via str(content) - Restore per-request timeout forwarding for LiteLLMBridgeClient via _with_timeout helper (broken when "timeout" was added to _META_FIELDS) - Normalize ModelProvider.provider_type to lowercase via field_validator so consumers don't need per-site .lower() calls - Fix unclosed httpx.AsyncClient in lazy-init test |
||
|
|
0b324478f1
|
feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine (#404)
* feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine
* test: add retryable salvage and eager row-drop scheduler tests
Add two missing test cases for AsyncTaskScheduler:
- Transient 503 failure deferred to salvage round and recovered
- Downstream column never dispatched for rows dropped by upstream failure
Update plan checkboxes to reflect PR 3 completion status.
* fix: address Greptile review findings on scheduler and buffer manager
- Non-retryable batch/from_scratch failure now drops all rows in the
row group so it reaches a terminal state and gets checkpointed
- Salvage rounds re-dispatch from_scratch tasks directly instead of
relying on the frontier (which never contains them)
- Log error if scheduler exits with unfinished row groups
- update_batch raises ValueError on size mismatch
- Free _row_group_sizes entry on checkpoint
- Add row-count mismatch warning in _run_batch writeback
- Document intentional stateful lock ordering in _dispatch_seeds
* refactor: remove dead _seed_frontier method from CompletionTracker
* refactor: restore seed_frontier as public opt-in method on CompletionTracker
Keep the tracker self-contained for static introspection (capacity
planning, task enumeration) without requiring a scheduler. The scheduler
does not call it - it manages root dispatch directly for stateful locks
and multi-column dedup.
* fix: complete salvage path - stateful locks, frontier drain, checkpoint sweep
- Acquire stateful lock before re-dispatching from_scratch tasks in
salvage (prevents RuntimeError on lock release)
- Replace single-pass salvage drain with _drain_frontier() that
re-checks the frontier after each task completes (ensures downstream
tasks enqueued during retry are dispatched)
- Add post-salvage checkpoint sweep via _checkpoint_completed_row_groups
(row groups completed during salvage are now written to parquet)
- Extract _checkpoint_completed_row_groups helper, reused by both the
main loop and post-salvage sweep
* fix: checkpoint per salvage round, simplify deferred list, fire on_complete for empty groups
- Move _checkpoint_completed_row_groups inside the salvage loop so
completed row groups are flushed promptly between rounds
- Simplify _deferred from list[tuple[Task, int]] to list[Task] since
the attempt count was unused (salvage_max_rounds bounds retries)
- Fire on_complete(None) when all rows in a row group are dropped so
callers always receive a completion signal
* fix: address PR review - _is_retryable, semaphore safety, dispatched pruning
Replace string-matching _is_retryable with isinstance checks against
typed ModelXxxError exceptions. Add try/finally around checkpoint to
protect _rg_semaphore.release() from deadlocks. Prune completed tasks
from _dispatched to prevent unbounded growth. Re-register batch_alias
in salvage dispatch. Replace assert with ValueError in _run_cell.
Parameterize on_complete Callable signature. Add test for
on_complete(None) when all rows dropped.
* fix: guard against dispatching tasks for checkpointed row groups
When a from_scratch task fails non-retryably and all rows are dropped,
downstream full_column tasks become vacuously ready and enter the
frontier. If dispatched in the same loop iteration that checkpoints
the row group, the task's coroutine runs against a freed buffer
(KeyError). The guard in _execute_task_inner skips such tasks early,
and a `skipped` flag keeps them in _dispatched to prevent re-dispatch.
Also addresses remaining review feedback:
- agenerate({}) fallback now passes empty DataFrame
- dict comprehensions simplified to dict(row_groups)
- get_row return type annotated as dict[str, Any]
- configs parameter fully typed in test helper
- mock generators use self.config.name
* fix: raise on batch size mismatch, isolate checkpoint failures
- _run_batch: raise ValueError on row count mismatch instead of warning,
matching the update_batch path and preventing silent data corruption.
- _checkpoint_completed_row_groups: catch per-RG exceptions so one failed
checkpoint doesn't block subsequent row groups or leak semaphore slots.
|
||
|
|
28c8345909
|
feat: add built-in filesystem seed readers (#421) | ||
|
|
5783435979
|
feat: Improve generation failure reporting for schema and timeout failures (#416) | ||
|
|
26a9cf23ac
|
feat: normalize validator and constraint discriminators (#414)
* feat: normalize validator and constraint discriminators * docs: add docstring and comment to Constraint base class Address Greptile review feedback: - Add docstring to Constraint noting it should not be instantiated directly - Add comment explaining the rhs fallback behavior in the resolver * refactor: restore ABC on Constraint base class * refactor: add explicit None guard in constraint resolver * Fix legacy numeric sampler constraint detection * fix: address PR review feedback from nabinchha - Guard _can_coerce_to_float against inf/nan strings - Add -> None return type annotations to test functions - Add clarifying comments to ColumnConstraintT vs ColumnConstraintInputT - Add tests for tagged constraint round-trip and missing rhs validation |
||
|
|
d6b8433e25
|
fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) (#412)
* fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) TransportKwargs.from_request() flattened extra_body keys into top-level kwargs, causing LiteLLM to reject provider-specific params like reasoning_effort via its per-provider allowlist validation. Add a flatten_extra_body flag (default True for backward compat) so the LiteLLM bridge can opt out and preserve extra_body as a distinct kwarg that LiteLLM forwards without validation. Made-with: Cursor * fix: address PR #412 review comments Update stale docstrings in TransportKwargs and facade.py to reflect the new flatten_extra_body flag, and add an edge-case test for empty extra_body with flatten disabled. |