Commit graph

86 commits

Author SHA1 Message Date
Nabin Mulepati
2a487cdc5c
feat: add dropped column preservation toggle (#691)
* feat: add dropped column preservation toggle

Closes #690

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* fix: reject dropped column policy resume mismatch

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

---------

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-21 13:19:20 -06:00
Eric W. Tramel
c0a4dcbb85
feat: implement async scheduling admission control (#661)
Some checks are pending
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
2026-05-20 20:58:05 -04:00
Nabin Mulepati
a83968f701
feat: preserve multimodal MCP tool results (#689)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* feat: preserve multimodal MCP tool results

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* fix: gate MCP generic image payload detection

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* fix: validate MCP image payload blocks

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

---------

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-20 14:49:22 -06:00
Nabin Mulepati
0860d62e23
fix: restore chat completion multi-choice support (#672)
* fix chat completion multi-choice support

Restore the chat completion n request field and preserve all returned choices in the canonical response while keeping response.message as the first choice.

Add coverage for request forwarding, compatibility access, multi-choice parsing, and generate forwarding.

Fixes #620

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* strip n from generate requests

Prevent generate and agenerate from forwarding multi-choice requests that they cannot expose, while keeping completion() multi-choice support intact.

Add coverage for async parsing and Anthropic n exclusion.

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* strip configured n from generate requests

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* rename multiple choice completion flag

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* move choice sanitizer to private helpers

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* order private facade helpers

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

---------

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-20 11:40:10 -06:00
Nabin Mulepati
bd0410bb05
fix(engine): actionable error when a Jinja field is missing/None/empty (#633)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(engine): actionable error when a Jinja field is missing/None/empty

Empty-render and missing-attribute failures used to surface as the
generic "User provided prompt generation template is invalid." either
because `sanitize_user_exceptions` stripped the detail or because
Jinja's raw `UndefinedError` leaked through. Both now raise a new
`EmptyTemplateRenderError` carrying a row-level diagnostic that names
the offending chain and includes copy-pasteable Jinja conditional and
SkipConfig fix patterns.

Closes #629.

* fix(engine): address PR review feedback on EmptyTemplateRenderError

Addresses the open review comments on #633:

1. (Greptile P1) Gate expression in the suggested remediation template
   was one accessor too deep when the root variable was entirely absent
   from the record, causing the suggested fix to itself raise
   UndefinedError. Fall back to gating on the root name alone when
   sample_name is not in record.

2. (andreatgretel) The AST walker reported loop-local names as missing
   culprits (e.g. ``person`` in ``{% for person in people %}...{% endfor %}``).
   Filter extracted chains through ``meta.find_undeclared_variables`` to
   defer to Jinja's canonical scope tracking.

3. (andreatgretel follow-up) Empty collections used as loop iterables
   (``items=[]``) fell through to the no-culprit fallback. Add a new
   ``_CULPRIT_EMPTY_COLLECTION`` classification so they're surfaced.

4. Minor: add ``from exception`` to ``safe_render``'s UndefinedError
   re-raise for traceback consistency with the native engine path, and
   add a note on the load-bearing exception ordering in
   ``sanitize_user_exceptions``.
2026-05-20 09:51:21 -06:00
Andre Manoel
6055290136
feat: add workflow chaining (#636)
Some checks are pending
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish Fern devnotes / deploy (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
* feat: add workflow chaining

* test: tidy workflow chaining coverage

* fix: harden workflow chaining concurrency

* docs: update workflow chaining plan

* feat: add workflow stage postprocessors

* feat: expose workflow stage outputs

* fix: align workflow selected output export

* fix: address workflow chaining review issues

* fix: align workflow parquet export selection

Signed-off-by: Andre Manoel <amanoel@nvidia.com>

* fix: preserve generated columns in drop validation

Signed-off-by: Andre Manoel <amanoel@nvidia.com>

* fix: clarify workflow output processors

Signed-off-by: Andre Manoel <amanoel@nvidia.com>

* docs: add workflow chaining page

* docs: align workflow chaining warning

* fix: address workflow review nits

---------

Signed-off-by: Andre Manoel <amanoel@nvidia.com>
2026-05-18 20:15:47 -03:00
Nabin Mulepati
71997624b3
feat: track reasoning token usage (#670)
Some checks are pending
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
* feat: track reasoning token usage

Capture provider-reported reasoning-token breakdowns alongside output tokens without changing output token totals. Carry the field through model usage aggregation and add coverage for parsing, facade tracking, and deltas.

Refs #665

* fix: show reasoning tokens in usage summary

Include reasoning token counts in the local model usage summary while preserving output and total token semantics. Telemetry remains unchanged.

Refs #665

* fix: estimate missing reasoning token counts

When providers return reasoning content without a numeric usage breakdown, estimate reasoning tokens from that content while preserving provider-reported output and total token counts.

Refs #665

* fix: track reasoning token count source

* fix: simplify reasoning token source

* fix: omit unknown reasoning tokens from logs

* refactor: clarify reasoning token count helpers

* test: move token counting tests

* fix: enforce reasoning token source

* fix: address reasoning usage review
2026-05-18 12:15:31 -06:00
Eric W. Tramel
a4085c441a
feat: add AIMD startup ramp (#638)
Some checks failed
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Lint and Format Check (push) Has been cancelled
CI / Check License Headers (push) Has been cancelled
Publish devnotes / deploy (push) Has been cancelled
Publish Fern devnotes / deploy (push) Has been cancelled
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
2026-05-13 16:25:03 -04:00
Eric W. Tramel
0fdea845ac
feat: add fair async task scheduling (#639) 2026-05-13 13:47:45 -04:00
Nabin Mulepati
bbcd7d3995
fix: harden resume checkpoint handling (#624)
Some checks are pending
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
* fix: harden resume checkpoint handling

Persist config identity in metadata, make checkpoints atomic, and reject unsafe resume states so interrupted runs do not mix incompatible or post-processed data.

* fix: close resume edge cases

Let IF_POSSIBLE start fresh for resize configs and mark after-generation processing before mutation so interrupted processors cannot be resumed unsafely.

* refactor: drop dataset directory lock

Single-user CLI/notebook flows don't race on the artifact directory, and
the timestamped-directory fallback already handles the "ran it twice"
case. The lock added complexity (re-entrancy, stale cleanup, the
cached-property trap where IF_POSSIBLE→NEVER moves writes to a
timestamped directory while the lock stays pinned to the original) for
no real protection. Atomic metadata writes still cover the actual hazard
(crash mid-write).

Also fix a pre-existing test bug in
test_initial_actual_num_records_uses_actual_parquet_rows_for_partial_row_group
where the mocked scheduler hit the partial-completion path with
unconfigured Mock attributes.

* fix: address Greptile review on resume edge cases

* Drop the unreachable ResumeMode.IF_POSSIBLE branch in
  _post_generation_processed_resume_result. By the time this helper
  runs, build() has normalised IF_POSSIBLE to ALWAYS or NEVER, so the
  guard now matches reality. Tighten the docstring to document the
  three outcomes (no-op return / fall through / raise).

* Split the post-processed extension/raise into two cases. When
  num_records < prior_target the user just asked for fewer records than
  already exist; the previous "would mix pre- and post-processor
  records" message only describes the extension case. Mirror the
  wording used by _load_resume_state and add a regression test.

* Remove the dead _find_completed_row_group_ids wrapper now that
  _build_async uses _find_completed_row_groups directly. Rename the
  related test to match.

* refactor: unify sync + async resume around filesystem-derived progress

Both engines now derive `num_completed_batches` and `actual_num_records`
from `parquet-files/batch_*.parquet` via `_recover_progress_from_disk`.
`metadata.json` keeps describing the run *configuration* (`buffer_size`,
`target_num_records`, `original_target_num_records`, config fingerprint),
while the filesystem is the source of truth for *progress*. This closes
the sync engine's race window between `move_partial_result_to_final_file_path`
and the metadata write that follows it, matching the crash-recovery the
async engine already had.

The sync engine additionally rejects non-contiguous batch IDs (a hole can
only mean external mutation or a directory written by an incompatible
engine); the async engine continues to tolerate gaps from out-of-order
completion via `allow_holes=True`.

Existing sync resume tests now seed parquet files alongside metadata,
and two new tests cover the unified behaviour: filesystem progress wins
when metadata lags, and sync rejects non-contiguous IDs.

* docs: clarify DatasetCreationResults observability scope on resume

`load_dataset`, `count_records`, `load_analysis`, `export`, and `push_to_hub`
all read from the artifact directory, so they reflect the cumulative dataset
(original + resume rows). `task_traces`, model-usage logs, and telemetry
events are scoped to the current invocation only because the original run's
in-memory state is not persisted. Document this in the class docstring,
the architecture note, and the Fern resume guide.

* docs: explain DeprecationWarning re-raise in create()/preview()

Future readers were puzzled by the ``except DeprecationWarning: raise``
short-circuits before the generic generation-error wrappers. Add a
comment in ``create()`` (with a back-reference from ``preview()``) to
record that strict warning filters (``pytest.warns``,
``-W error::DeprecationWarning``) turn the engine's
``warnings.warn(..., DeprecationWarning)`` calls — most notably the
``allow_resize=True`` deprecation in ``_resolve_async_compatibility`` —
into raised exceptions, and we want them to surface untouched instead of
being swallowed by ``DataDesignerGenerationError``.

* fix: close after-generation crash window and tighten metadata typing on resume

Address review feedback on resume hardening:

* Run after-generation processors unconditionally on the on-disk dataset
  rather than gating on the generation return value. The previous gate
  silently skipped after-generation when resume saw every row group
  already on disk, leaving a crash window between the final parquet write
  and the ``post_generation_state="started"`` marker write: in that
  window the dataset is complete but after-generation never ran, and the
  on-disk parquet files are still clean. The "started" short-circuit
  still rejects the other direction (crashed mid-rewrite, ambiguous
  state), so resume only re-runs after-generation when it is safe to do
  so.

* Raise ``DatasetGenerationError`` (instead of letting a raw
  ``TypeError`` leak out of ``num_records < prior_target``) when a
  post-processed dataset's metadata is missing ``target_num_records``.
  Mirrors the wording used by ``_load_resume_state``.

* Document the new behaviour in ``architecture/dataset-builders.md`` and
  the Fern resume invariants.

Tests:

* ``test_build_resume_complete_dataset_runs_after_generation_when_no_marker``
  covers the closed crash window via the public ``set_processor_runner``
  API.
* ``test_build_resume_post_generation_processed_missing_target_raises_clearly``
  covers the typed-error gap.
2026-05-11 11:44:46 -06:00
Nabin Mulepati
4b93f5b245
feat: let column configs declare all model aliases for the startup health check (#626)
* feat(engine): let column configs declare all model aliases for the startup health check

Plugin column configs that depend on more than one model alias (generator + judge,
critic, etc.) previously could not opt their secondary aliases into the standard
startup health check, and configs without a `model_alias` field crashed the
collection loop with AttributeError.

Add `SingleColumnConfig.get_model_aliases()` as the single override hook the
builder uses to enumerate aliases. The default returns the column's primary
`model_alias` (if any), so built-in LLM, embedding, and image columns work
unchanged. `CustomColumnConfig` overrides it to surface decorator-declared
aliases, replacing the special-case `isinstance` branch in the builder. Plugin
configs with multiple model fields override it to opt every endpoint into the
health check.

Fixes #606

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* fix(config): forward empty model_alias to startup health check

SingleColumnConfig.get_model_aliases() used `if alias` to filter, which
also dropped empty-string aliases. Empty model_alias values are accepted
by the config model and previously reached run_health_check, where they
failed fast with "No model config with alias '' found!". Treating them
as "no model endpoints" silently delayed that error to first generation.

Use `alias is not None` so only a truly missing attribute skips the
health check, and add a regression test that exercises an empty-string
model_alias on a built-in config.

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

---------

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-11 11:33:50 -06:00
Przemysław Boruta
810c681f7a
feat: resume interrupted dataset generation runs (sync + async engine) (#526)
Some checks failed
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Lint and Format Check (push) Has been cancelled
CI / Check License Headers (push) Has been cancelled
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
* docs: add implementation plan for resume mechanism

Fixes #525

* feat(storage): add resume flag and clear_partial_results()

- ArtifactStorage gains a `resume: bool = False` field
- resolved_dataset_name skips timestamp logic when resume=True,
  returning the existing dataset folder name as-is
- Raises ArtifactStorageError on resume=True when the target folder
  is absent or empty (no data to resume from)
- New clear_partial_results() removes in-flight partial results
  left over from an interrupted run

Fixes #525

* feat(batch-manager): add start_batch param to start()

DatasetBatchManager.start() now accepts:
- start_batch: int = 0  — first batch index to process
- initial_actual_num_records: int = 0  — records already on disk

Both default to 0 so all existing call sites are unaffected.

Fixes #525

* feat(builder): implement resume logic in DatasetBuilder

- build() gains a resume: bool = False parameter
- _load_resume_state() reads metadata.json and validates that
  num_records and buffer_size match the original run
- _build_with_resume() skips completed batches, clears in-flight
  partial results, and continues from the first incomplete batch
- Raises DatasetGenerationError with clear messages for:
  - missing metadata.json (interrupted before first batch completes)
  - num_records mismatch
  - buffer_size mismatch
  - DATA_DESIGNER_ASYNC_ENGINE=1 (not yet supported)
- Logs a warning and returns early when dataset is already complete

Fixes #525

* feat(interface): expose resume on DataDesigner.create()

- create() gains resume: bool = False
- _create_resource_provider() passes resume to ArtifactStorage
- builder.build() receives the resume flag

Fixes #525

* test: add tests for resume mechanism

Covers:
- ArtifactStorage.resolved_dataset_name with resume=True
- ArtifactStorage.clear_partial_results()
- DatasetBatchManager.start() with start_batch and
  initial_actual_num_records
- DatasetBuilder.build(resume=True): missing metadata, num_records
  mismatch, buffer_size mismatch, already-complete detection

Fixes #525

* feat(builder): extend resume to async engine (DATA_DESIGNER_ASYNC_ENGINE=1)

- Add _find_completed_row_group_ids() to scan parquet-files/ for already-written
  row groups by parsing batch_*.parquet filenames
- _build_async() now accepts resume=True: loads metadata, finds completed row groups,
  clears partial results, and logs progress; returns early if all row groups are done
- _prepare_async_run() accepts skip_row_groups, initial_actual_num_records, and
  initial_total_num_batches so the scheduler only processes remaining row groups
  and RowGroupBufferManager starts from the correct counts
- RowGroupBufferManager.__init__ gains initial_actual_num_records and
  initial_total_num_batches params to seed the counters on resume
- finalize_row_group closure now writes incremental metadata after each checkpoint
  so any run (resume or not) can be resumed if interrupted mid-way
- Remove the guard that rejected resume=True with DATA_DESIGNER_ASYNC_ENGINE=1
- Add tests for all new paths

* fix(builder): skip after-generation processors when resume finds dataset already complete

_build_with_resume and _build_async now return False when the dataset is already
complete (early-return path), True otherwise. build() skips
_processor_runner.run_after_generation() on False, preventing processors from
calling shutil.rmtree and rewriting an already-finalized dataset.

Fixes the issue raised in review: greptile P1 comment on PR #526.

* fix(builder): use filesystem count for initial_total_num_batches on async resume

Metadata can lag by one row group if a crash occurs between
move_partial_result_to_final_file_path and write_metadata. Using
len(completed_ids) from the filesystem scan instead of
state.num_completed_batches ensures the final metadata reflects the
actual number of parquet files present, not the potentially stale
metadata count.

* feat(results): add export() method and --output-format CLI flag

Adds DatasetCreationResults.export(path, format=) supporting jsonl,
csv, and parquet. The CLI create command gains --output-format / -f
which writes dataset.<format> alongside the parquet batch files.

* fix(builder): handle resume when metadata.json missing (interrupted before first batch)

When a run is interrupted before any row group or batch completes, metadata.json
is never written. Previously resume=True would raise DatasetGenerationError in
this case. Now build() detects the missing file, logs an info message, clears
any leftover partial results and falls back to a clean fresh run.

This is the common scenario for small datasets (fewer records than buffer_size)
where all records fit in a single row group.

* docs(interface): fix resume docstring — async engine is supported

* fix(builder): derive initial_actual_num_records from filesystem in async resume

In the crash window (row group written to disk but write_metadata crashed before
updating the file), both initial_total_num_batches and initial_actual_num_records
now use the filesystem-discovered completed_ids as source of truth.  Previously
initial_actual_num_records was read from potentially stale metadata, causing
actual_num_records in the final metadata to be undercounted by one row group.

Also adds a test covering the partial-resume crash-window scenario.

* feat(resume): replace resume: bool with ResumeMode enum (NEVER/ALWAYS/IF_POSSIBLE)

- Introduces ResumeMode(StrEnum) in artifact_storage.py for use across all layers
- Replaces resume: bool with resume: ResumeMode in DatasetBuilder.build(),
  DataDesigner.create(), ArtifactStorage, and _build_async()
- Adds _check_resume_config_compatibility() using config fingerprints to support
  IF_POSSIBLE: falls back to a fresh run when config has changed since last run
- Relaxes num_records validation from strict equality to num_records >= actual_num_records,
  allowing dataset extension on resume; buffer_size must still match exactly
- Preserves exception chain with 'from exc' on FileNotFoundError in _load_resume_state
- Exports ResumeMode from data_designer.interface for users to import
- Adds skip_row_groups assertion test and IF_POSSIBLE storage behavior tests

* fix(resume): invalidate resolved_dataset_name cache when IF_POSSIBLE downgrades to NEVER

ArtifactStorage's Pydantic model validator accesses base_dataset_path at
construction time, caching resolved_dataset_name under IF_POSSIBLE semantics
before build() can set resume=NEVER. Pop the stale cache entry so the property
re-resolves with the correct NEVER semantics (timestamped directory).

Also fixes _check_resume_config_compatibility() to use artifact_path/dataset_name
directly instead of base_dataset_path, and adds a regression test covering the
cache-bypass scenario.

* fix(builder): move partial-completion warning before return in _build_async

* fix(builder): IF_POSSIBLE now starts fresh when no dataset directory exists

_check_resume_config_compatibility returned True when config_path was absent,
even when the dataset directory itself didn't exist. This caused IF_POSSIBLE to
upgrade to ALWAYS, which then raised ArtifactStorageError on the first-ever run
because ALWAYS requires an existing directory.

Fix: return False early when the dataset directory is absent. Also sets
actual_num_records on mock buffer managers in two async resume tests that
started failing after the partial-completion warning block was made reachable.

* fix(builder): use original target_num_records in async resume record count

When extending a non-aligned run (e.g. original num_records=5, buffer_size=2),
the last completed row group has 1 record, not buffer_size=2. Using new num_records
in the formula would overcount: min(2, 7-2*2)=2 instead of min(2, 5-2*2)=1.

Fix: capture state from _load_resume_state (previously discarded) and pass
state.target_num_records into the sum formula. Added target_num_records field to
_ResumeState, populated from metadata.json.

Test: test_build_async_resume_initial_actual_num_records_uses_original_target

* fix(builder): IF_POSSIBLE starts fresh on empty dataset directory

Empty directory (crash between mkdir and first file write) was treated as
compatible — _check_resume_config_compatibility returned True, IF_POSSIBLE
upgraded to ALWAYS, which then raised ArtifactStorageError.

Fix: treat empty directory the same as missing — return False from
_check_resume_config_compatibility when any(dir.iterdir()) is False.

Test: test_if_possible_starts_fresh_when_directory_is_empty

* fix(builder): ALWAYS raises DatasetGenerationError on config fingerprint mismatch

ResumeMode.ALWAYS was documented to raise when column/model config changed, but
_check_resume_config_compatibility() was only called in the IF_POSSIBLE branch.
A user resuming with ALWAYS after changing the config would silently mix records
from two different configs.

Fix:
- Refactor _check_resume_config_compatibility() to return _ConfigCompatibility
  enum (COMPATIBLE / INCOMPATIBLE / NO_PRIOR_DATASET) instead of bool so callers
  can distinguish 'no prior run' from 'configs differ'
- Call the check for both ALWAYS and IF_POSSIBLE before _write_builder_config()
- ALWAYS + INCOMPATIBLE → DatasetGenerationError
- IF_POSSIBLE + INCOMPATIBLE → silent fresh start (existing behaviour)
- IF_POSSIBLE + NO_PRIOR_DATASET → silent fresh start (existing behaviour)

Test: test_build_resume_always_raises_on_config_mismatch

* fix(resume): address nabinchha review — drop export collision, add CLI flag, fix edge cases

C1: drop commit 0bdf24ab — remove export() / --output-format from this PR; that feature
    belongs to #540 which has a superior streaming implementation
C2: add --resume / -r flag to data-designer create CLI, thread ResumeMode through
    GenerationController.run_create() into DataDesigner.create()
C3: fix already-complete warning text — replace stale "Remove resume=True" with
    "Use resume=ResumeMode.NEVER" in _build_with_resume and _build_async
C4: fix docstrings — ALWAYS does NOT raise when no checkpoint exists (silently
    restarts from scratch); clarify num_records >= actual semantics
C5: sync artifact_storage.resume = NEVER when no-metadata fallback fires so both
    state holders agree after the downgrade
C6: fix return_value=False → _ConfigCompatibility.INCOMPATIBLE in IF_POSSIBLE test;
    drop 3 direct _find_completed_row_group_ids tests (private API, covered by build())
W1: add logger.warning when builder_config.json is absent (silent COMPATIBLE was footgun)
W2: narrow except Exception → (OSError, json.JSONDecodeError, ValidationError)
W3: run make check-all-fix — ruff reformatted test_if_possible_starts_fresh_when_directory_is_empty

* fix(builder): replace stdlib StrEnum with project compat shim for Python 3.10

* fix(builder): guard extension row groups in initial_actual_num_records formula on async resume

When extending an async run (num_records > state.target_num_records) and a crash
occurs after an extension row group is written to disk but before write_metadata,
the formula `min(buffer_size, state.target_num_records - rg_id * buffer_size)` yields
a negative value for any extension row group (rg_id * buffer_size >= target), making
initial_actual_num_records silently undercount. The RowGroupBufferManager then starts
at the wrong offset, and the final metadata reports an incorrect actual_num_records
with a false partial-completion warning.

Fix: use state.target_num_records for original row groups and num_records for extension
row groups (guarded by rg_id * buffer_size < state.target_num_records). Covers the
scenario with a new regression test.

* fix(builder): pre-compute row-group list in _build_async to fix sizes on non-aligned extension resume

The partitioning loop in _prepare_async_run decremented remaining by
min(buffer_size, remaining) for every row group, including skipped ones.
For a non-aligned original run (e.g. target=5, buffer_size=2, last group
has 1 record), the loop deducted 2 for the skipped last group, leaving
remaining one short.  Extension row groups received smaller sizes than
intended, so the generated dataset was silently short by the deficit and
a false partial-completion warning fired.

Fix: pre-compute the full row-group list with correct per-group sizes in
_build_async where state.target_num_records is available, then pass it to
_prepare_async_run as precomputed_row_groups (replacing the skip_row_groups
param). Original groups use min(buffer_size, target - rg*bs); extension
groups use min(buffer_size, extension_records - ext_idx*bs).

Also updates the skip_row_groups test to assert on precomputed_row_groups
and adds a regression test for the non-aligned extension case.

* chore: remove stale implementation plan for #525

The plan described the initial resume: bool design which has since been
replaced by the full ResumeMode enum (NEVER/ALWAYS/IF_POSSIBLE), async
engine support, filesystem reconciliation, and config compatibility checks.
The PR description is the authoritative record of what shipped.

* fix(engine): fix false 'already complete' when extension fits in last group's slack

original_target=5, buffer_size=2 produces 3 groups [2,2,1]. Extending to
num_records=6: ceil(6/2)=3 equalled len(completed_ids)=3, triggering the
already-complete branch on both the async and sync paths — returning the
5-record dataset silently.

Fix (async): replace ceil(num_records/bs) with
  num_original_groups + ceil(extension_records/bs)
so any extension always adds new groups beyond num_original_groups.

Fix (sync): add num_records_list param to DatasetBatchManager.start() and
pass the correct per-batch sizes in _build_with_resume, giving the batch
manager the right total batch count (4 instead of 3 in the example).

* fix(engine): raise error when num_records is below original target on resume

Prevents negative extension_records in async path which silently truncated
the dataset and corrupted metadata without triggering a partial-completion warning.

* fix(storage): refresh MediaStorage path after IF_POSSIBLE → NEVER downgrade

When build() detected an incompatible config and downgraded resume from
IF_POSSIBLE to NEVER, _media_storage.base_path remained bound to the
original directory while all other path properties resolved to the new
timestamped directory — causing broken image references in image-column runs.

* fix(engine): preserve original_target_num_records across extension resume writes

After finalize_row_group successfully wrote incremental metadata during an
extension run, target_num_records in metadata was updated to the extension
target. A subsequent resume would read this as the original target, making
_rg_size() incorrect for all row groups and silently corrupting actual_num_records.

Stores original_target_num_records as an immutable field in metadata so the
original group boundaries are always recoverable regardless of how many
incremental writes have occurred.

---------

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-08 15:37:56 -06:00
Eric W. Tramel
8d4d59303d
fix: normalize rollout timestamps before deriving started_at/ended_at (#556) 2026-05-07 14:13:10 -04:00
Johnny Greco
9214637a5b
fix(engine): validate processor plugin impls (#609)
* fix(engine): validate processor plugin impls

Add the processor implementation base to assert_valid_plugin so
processor plugins are checked against Processor instead of only the
generic config contract. Keep plugin type validation table-driven and
raise explicit AssertionError messages so checks are not skipped under
optimized Python.

Signed-off-by: Johnny Greco <jogreco@nvidia.com>

* test(engine): require plugin base map coverage

---------

Signed-off-by: Johnny Greco <jogreco@nvidia.com>
2026-05-06 14:31:12 -04:00
Nabin Mulepati
f73da1975c
feat(models): deprecate implicit default provider routing (#594)
Some checks failed
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
* feat(models): deprecate implicit default provider routing

Emit DeprecationWarning whenever the legacy "implicit default
provider" path is exercised: `ModelConfig.provider=None`, the
registry-level `ModelProviderRegistry.default`, the YAML
`default:` key in `~/.data-designer/model_providers.yaml`, and
the CLI's "Change default provider" workflow.

`resolve_model_provider_registry` skips passing `default=` in the
single-provider case so the common construction path stays quiet.
Multi-provider registries still pass `default` (per
`check_implicit_default`) and warn accordingly.

Update docs, the package README, and test fixtures to specify
`provider=` explicitly on every `ModelConfig`. New tests cover
each warning entry point and pin the post-deprecation happy paths.

Refs #589

Made-with: Cursor

* fix(models): address PR #594 review feedback

Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.

Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).

johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.

johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.

johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.

johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.

Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
  roundtrip stops exercising the deprecated YAML-default path
  unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
  `test_get_provider`, and
  `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
  in `pytest.warns` so the suite stays green under
  `-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
  to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
  pin the cascade-dedup behavior.

Refs #589

Made-with: Cursor

* fix(models): make deprecation warnings visible under default filters

andreatgretel (PR #594): the YAML-default warning in
`get_default_provider_name` and the registry-default warning emitted
from inside DataDesigner helpers were attributing to data_designer
library frames, not user code. Python's default filter chain includes
`ignore::DeprecationWarning`, so library-attributed entries are
silenced — meaning a normal `DataDesigner()` call with a YAML
`default:` set showed nothing, and `resolve_model_provider_registry`
warnings were similarly invisible. Two related changes:

1. `warn_at_caller`: extend the default skip-list from `("pydantic",)`
   to `("pydantic", "pydantic_core", "data_designer")` so the walk
   escapes both pydantic's validator-dispatch frames and data_designer
   helper frames before attributing. Also tighten the prefix predicate
   to exact-or-dotted-prefix matching (`name == p or
   name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not
   falsely matched as part of `pydantic` (johnnygreco nit). Allow
   callers to pass a custom `skip_prefixes` for flexibility. Drop the
   "skip frame 0+1 unconditionally" guard now that prefix matching
   covers it.

2. `get_default_provider_name`: switch from
   `warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous
   stacklevel pointed into `default_model_settings.py`, which is a
   library file → silenced under default filters. Verified the fix
   empirically with `python -W default`: warning is now attributed to
   the user's call site and rendered.

johnnygreco (PR #594): add the missing
`test_explicit_default_none_does_not_emit_deprecation_warning`
regression for the `self.default is not None` predicate landed in
the prior round.

Tests:
- New `test_warning_helpers.py` pins prefix-matching precision
  (rejects `pydantic_helpers` / `data_designer_other`), default
  skip-list contents, attribution past skip-prefix frames, and
  per-call-site dedup behavior.
- `test_get_default_provider_name_warning_attributes_to_user_frame`
  pins andreatgretel's repro for the YAML-default site.
- `test_explicit_default_warning_attributes_to_user_frame` pins the
  multi-frame case: construction goes through
  `resolve_model_provider_registry`, so the walk has to escape both
  pydantic and data_designer before landing on the test file.
- `test_explicit_default_none_does_not_emit_deprecation_warning`
  pins johnnygreco's predicate-tightening regression.

3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net
from this round).

Refs #589

Made-with: Cursor

* fix(models): apply warn_at_caller to remaining deprecation sites

greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s
YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`,
which attributes to whichever data_designer frame called `load()` —
controllers, services, list/reset commands, agent introspection. Every
real call path lands on `data_designer.cli.*`, which falls under
Python's default `ignore::DeprecationWarning` filter and is silenced.
Audit found two more sites with the same problem:

- `DatasetBuilder._resolve_async_compatibility` (`allow_resize` /
  issue #552) — was using `stacklevel=4` to walk past
  `_resolve_async_compatibility -> build/build_preview -> interface ->
  user`. Brittle: any added frame (decorator, async wrapping, the
  `try/except DeprecationWarning: raise` boundary) shifts attribution
  silently. The existing test passed only because it used
  `simplefilter("always") + record=True`, which records warnings
  regardless of attribution.
- `ProviderController._handle_change_default` — was using
  `stacklevel=2`, which lands on the menu dispatcher in the same
  controller module. `print_warning` already shows the message
  visually, but programmatic observers (`pytest.warns`,
  `filterwarnings("error", ...)`) saw a library-attributed entry that
  default filters silenced.

All three migrated to `warn_at_caller` (the helper from 247fa30) so
attribution lands on the user's call site regardless of internal
chain shape. `data_designer` is already in
`DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library
in one pass.

Added attribution regression tests at each site asserting
`warning.filename == __file__`. A future regression to
`warnings.warn(stacklevel=N)` now fails CI instead of silently
silencing the user-facing nudge:

- `test_load_with_yaml_default_attributes_warning_to_caller`
  (test_provider_repository.py)
- `test_resolve_async_compatibility` extended with the same assertion
- `test_handle_change_default_emits_deprecation_warning` rewritten
  from `pytest.warns(...)` to a `catch_warnings(record=True)` block
  that filters for the message and asserts `filename == __file__`
  (`pytest.warns` does not check attribution, so the rewrite is
  required to actually catch the regression).

3,125 tests pass (548 config + 1,923 engine + 654 interface).

Refs #589
2026-05-05 13:39:12 -06:00
Andre Manoel
61cdeefb17
feat: make async engine the default execution path (#592)
Some checks failed
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish devnotes / deploy (push) Has been cancelled
* feat: make async engine the default execution path

The async engine has been hardening as opt-in for several releases. Make it
the default and address the prerequisites flagged for the flip.

Default flip
- DATA_DESIGNER_ASYNC_ENGINE defaults to "1" at both consumption sites
- Set DATA_DESIGNER_ASYNC_ENGINE=0 for one transitional release to opt out
- allow_resize=True still falls back to sync with a DeprecationWarning

Python 3.10 support
- Replace asyncio.TaskGroup (3.11+) in async_concurrency.py with
  gather-with-explicit-cancel; semantics preserved because _run_task already
  swallows its own exceptions and uses _shutdown_event for sibling cancellation
- Remove the sys.version_info < (3, 11) runtime guard
- Remove the matching pytest skipif so the executor tests run on 3.10 too

Derived timeouts (replaces two hardcoded 300s constants)
- ThrottleManager.acquire_sync/async default to timeout=None (no deadline)
  instead of DEFAULT_ACQUIRE_TIMEOUT=300; HTTP request timeout already bounds
  actual work, queue waits scale with provider speed and AIMD
- _AsyncBridgedModelFacade derives the sync->async bridge timeout from the
  model's inference_parameters.timeout and the call's max_correction_steps;
  one knob (per-model timeout) drives both deadlines, no new config surface
- Add ModelFacade.request_timeout property so the bridge can read the
  effective timeout the client is configured with

Root-cause surfacing
- AsyncTaskScheduler captures the first non-retryable error and exposes it
  via first_non_retryable_error
- Interface threads it through DataDesignerGenerationError when 0 records
  are produced without early-shutdown, so deterministic failures (e.g. bad
  seed sources) surface their original message instead of a wrapped
  FileNotFoundError on the parquet path

Tests
- New: throttle no-deadline default behavior (sync+async), parametrized
  derived bridge timeout, restored async_concurrency tests on 3.10
- Updated: test_dataset_builder.py uses an autouse fixture to pin its
  Mock-based tests to the sync engine they cover; existing bridge tests
  set facade.request_timeout for the new derivation

Docs
- Replace the stale LiteLLM security notice in README with a short
  async-default heads-up and link to the migration guide
- Add docs/migration-async-default.md covering per-model timeouts,
  custom-column thread safety, mocking model calls, run outcomes, and
  the opt-out
- Append a short Update section to the async-all-the-way-down dev note

* test: extract _compute_bridge_timeout helper for direct testing

The parametrized bridge-timeout test was patching ``concurrent.futures.Future.result``
to capture the timeout the bridge passed in. That reaches into stdlib internals
(DEVELOPMENT.md "Mock at boundaries: Keep mocking shallow") and the ``ids=`` argument
on the parametrize was missing.

Extracts the formula into a module-level ``_compute_bridge_timeout`` helper. The test
now calls the helper directly with no mocking, and the parametrize gets readable ids.
Behavior is unchanged.

* test(e2e): align demo plugins with async engine contracts

The e2e demo plugins exercise plugin discovery and full DD lifecycle. Two
of them were written against sync-engine semantics that the async engine
restricts:

- DemoColumnGeneratorImpl was a ColumnGeneratorFullColumn with no
  required_columns. The async engine routes ``no-upstream`` columns
  through the from-scratch path, which passes an empty DataFrame to
  generators that aren't FromScratchColumnGenerator subclasses. The
  generator then produces 0 rows and the scheduler raises
  ``update_batch received 0 values``. Switching the plugin to
  FromScratchColumnGenerator with generate_from_scratch(num_records)
  matches what the plugin actually does (produces a constant column
  without input) and works on both engines.

- RegexFilterProcessor implemented process_before_batch with row-count
  changes. The async engine enforces row-count invariance in pre- and
  post-batch processor stages by design. Moving the filter to
  process_after_generation preserves the plugin's purpose (regex-based
  row filtering) at a stage that supports row-count changes on both
  engines. Test assertions check the final dataset, so the stage shift
  is transparent.

Both changes are demo-plugin updates only; no production code change.

* fix: address Codex review findings on async-default flip

Three bugs and two test-quality concerns surfaced by an independent review of
the prior commits. Each was real and worth fixing in the flip PR.

Bug fixes
- Sync-fallback path was creating async-only model clients. The default flip
  meant ``client_concurrency_mode = ASYNC`` for every default run, but the
  ``allow_resize=True`` path falls back to the sync engine — sync ``model.generate()``
  calls then hit ``SyncClientUnavailableError``. The resolution decision now
  lives at the DataDesigner interface level via
  ``_resolve_client_concurrency_mode``: it considers both the env var and the
  config (allow_resize forces sync clients) and is passed explicitly to
  ``create_resource_provider``. Direct callers of the factory still get the
  env-var default.

- Sync→async bridge timeout ignored the per-call ``timeout=`` override. A
  custom column calling ``model.generate(timeout=600)`` against a slow endpoint
  was being cancelled at the model-config default, not 600s. The bridge now
  prefers ``kwargs.get("timeout")`` over ``facade.request_timeout``.

- Bridge timeout formula missed ``max_conversation_restarts``. One logical
  generation can do ``(1 + max_conversation_restarts) × (1 + max_correction_steps)``
  HTTP requests; the formula now multiplies both, matching the worst-case
  attempt budget.

Engine routing fix (also surfaced by failing e2e plugin tests)
- ``_run_from_scratch`` else-branch passed an empty DataFrame to non-FromScratch
  generators classified as seeds (no upstream columns), so ``ColumnGeneratorFullColumn``
  with no required_columns produced 0 rows for an ``rg_size``-row buffer. Now
  passes an ``rg_size``-row snapshot of the row-group buffer, mirroring the
  sync engine's FULL_COLUMN contract.
- The earlier ``DemoColumnGeneratorImpl`` workaround (rewrite as ``FromScratchColumnGenerator``)
  is reverted; the engine fix subsumes it. The processor-plugin fix
  (``process_after_generation`` for the regex filter) stays — pre-batch
  row-count change is intentionally rejected by the async engine.

Test improvements
- Throttle no-deadline tests are parametrized over ``(timeout=0.0, raises)``
  and ``(timeout=None, waits)``, pinning that ``None`` is genuinely distinct
  from any finite default. Sync and async counterparts mirror.
- New regression tests for ``first_non_retryable_error`` surfacing covering
  both load-raises and load-returns-empty paths, asserting the original
  exception is chained via ``__cause__`` and that the typed
  ``DataDesignerEarlyShutdownError`` doesn't fire in this branch.
- New parametrized regression test for ``_resolve_client_concurrency_mode``
  covering all four (env × allow_resize) combinations.
- New parametrized test for the per-call ``timeout=`` override flowing into
  the bridge timeout calculation.
- Bridge formula tests extended with ``max_conversation_restarts`` cases.

* test: trim redundant parametrize cases in async-default tests

Three parametrize cases were duplicating coverage already provided by
existing standalone tests:

- ``test_acquire_*_timeout_branches`` parametrized over ``(0.0, raises)``
  and ``(None, waits)``. The ``raises`` half duplicates
  ``test_acquire_*_raises_timeout_when_at_capacity``. Replaced with two
  focused ``..._default_no_deadline_waits_for_release`` tests covering
  only the no-deadline branch.

- ``test_resolve_client_concurrency_mode_matches_engine_choice`` had four
  cases. The ``async-off + allow-resize`` case asserts ``SYNC`` because the
  env var alone forces it; the allow_resize input is moot. Dropped.

- ``test_async_bridge_honors_per_call_timeout`` had three cases. The
  "override below floor" case cross-products the per-call override flow
  with the floor-clamping behavior already covered by
  ``test_compute_bridge_timeout``. Dropped.

Net: -25 lines of test code with no loss of essential coverage.

* docs: fold migration page into existing concept docs

The standalone ``Migrating to the async default`` page didn't fit the
existing docs style — present tense, behavior over comparisons, content
in the natural concept home. Folding it in:

- ``architecture-and-performance.md`` gets a new ``Async Engine`` section
  covering per-model timeouts, run outcomes (partial completion +
  ``DataDesignerEarlyShutdownError``), and the transitional opt-out.
  Three stale ``async engine is landing soon`` callouts updated to
  reflect the flip.
- ``custom_columns.md`` gets two short notes: a thread-safety callout
  near Generation Strategies, and a mocking-with-spec note in
  Development Testing.
- ``async-all-the-way-down.md`` Update section now points at the new
  arch-and-perf section.
- README heads-up links to the same anchor.
- ``migration-async-default.md`` removed; mkdocs.yml entry dropped.

* docs: frame Execution Model as sync-engine specifics

Small targeted edits to make the user-facing concept docs consistent
with the post-flip state. No restructuring.

- ``architecture-and-performance.md``: the ``Execution Model`` callout
  now opens with two engines, links to the new ``Async Engine`` section,
  and frames the existing column-at-a-time description as sync-engine
  semantics. The ``Step 2: Process columns sequentially`` paragraph notes
  the async engine relaxes this. The ``Key Concepts`` table differentiates
  per-engine for ``Batching`` and ``Sequential columns``; ``Parallel cells``
  is the same on both.
- ``processors.md``: added a warning callout about the async engine's
  row-count invariance in pre- and post-batch stages, with the guidance
  to use ``process_after_generation()`` for row-filtering or expansion.

* fix: address review nits from PR #592 (Nabin)

Four targeted fixes from the review.

Worth-addressing (warning):
- ``test_acquire_async_default_no_deadline_waits_for_release`` was
  spawning the release task without holding a strong reference. The
  loop's weak-ref bookkeeping could GC it before the inner ``await``
  observes the release, producing a CI flake. Hold the task and
  ``await`` it in ``finally``.

Take-it-or-leave-it (applied):
- Root-cause error surfacing now includes the exception type name:
  ``f"🛑 {type(root_cause).__name__}: {root_cause}"`` so users see
  ``ValueError: ...`` instead of just the message string. The
  ``__cause__`` chain is preserved either way.
- Drop the defensive ``getattr(c, "allow_resize", False)`` in
  ``_resolve_client_concurrency_mode`` — every member of
  ``ColumnConfigT`` inherits ``allow_resize: bool = False`` from
  ``SingleColumnConfig``.
- One-line comment near the root-cause surfacing branch noting that
  ``actual_num_records == 0`` is async-only (sync runs leave it at
  ``-1``), so the branch is async-only by construction.

Not addressed in this PR (filing as follow-ups):
- ``SYNC_BRIDGE_TIMEOUT = 300`` still hardcoded in
  ``column_generators/generators/base.py:_run_coroutine_sync``. That
  bridge has no model-facade context to derive a timeout from, so the
  fix is a structural refactor outside this PR's scope.
- First-error capture loses subsequent-error context. The "first wins"
  heuristic is documented; richer aggregation is a follow-up.

* fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync

This was the third hardcoded 300s timeout (Nabin flagged it on PR #592).
The path is the generic sync→async bridge in ``ColumnGenerator.generate()``:
when a subclass overrides only ``agenerate()``, the sync entry point runs
the coroutine in a background thread.

Same philosophy we applied to the throttle queue wait elsewhere in the
PR: a defensive deadline on top of work that's already bounded by the
HTTP timeout doesn't add safety, it just produces spurious failures on
slow self-hosted endpoints. Drop the constant, the timeout exception
handling, and the ``timed_out`` bookkeeping. ``pool.shutdown(wait=True)``
becomes the simple cleanup.

Tests in ``test_async_generators.py`` exercise the happy path only and
don't depend on the timeout firing.

* Revert "fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync"

This reverts commit 7a0b77d44c.

* docs+feat: deprecate the sync-engine opt-out path

Nabin asked whether the docs should adopt explicit "deprecation" language
on the opt-out path. Doing both:

- Doc: ``architecture-and-performance.md``'s ``Opting out`` section now
  uses an ``!!! warning "Deprecated"`` admonition that names the env var
  as a deprecated escape hatch and notes the run-time warning.
- Code: ``DataDesigner._resolve_client_concurrency_mode`` emits a
  ``DeprecationWarning`` when ``DATA_DESIGNER_ASYNC_ENGINE=0`` is detected.
  Same precedent as the existing ``allow_resize=True`` warning. Auto-fallback
  via ``allow_resize`` does not double-warn here; the builder layer emits
  its own warning later.
- Test: parametrized regression now asserts ``pytest.warns(DeprecationWarning)``
  on the opt-out branch and treats any warning on the async-on branches as
  a failure (``simplefilter("error")`` inside the ``catch_warnings`` block).

* fix: emit logger.warning alongside DeprecationWarning on env-var opt-out

Parity fix from Nabin's re-review of PR #592. The ``allow_resize=True``
auto-fallback path in ``_resolve_async_compatibility`` emits both a
``logger.warning("⚠️ ...")`` and a ``DeprecationWarning``. The new
``DATA_DESIGNER_ASYNC_ENGINE=0`` opt-out path was only emitting the
``DeprecationWarning``, leaving users who run with default warning
filters silenced and inconsistent with the established precedent.

Match the pattern: same message body, both signals, same stacklevel.

* docs: breadcrumb explaining why SYNC_BRIDGE_TIMEOUT survives PR #592

Nabin's re-review pointed out that ``base.py`` is the lone place where
the 300s pattern survives, while ``custom.py`` and ``throttle_manager.py``
both retired theirs. Without a comment, a future reader (or a lint sweep)
could mistake this for an oversight and "consistency-fix" it the wrong way.

Add a short note at the constant naming the two retired siblings, the
reason this one stayed (no ``ModelFacade`` context to derive from), and
the fact that it's tracked for a structural follow-up.
2026-05-04 16:22:13 -03:00
Andre Manoel
47c72b3d87
fix(async): pack of fixes for async engine under degraded providers (#585)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(async): exclude all retryable errors from early-shutdown gate

The gate previously only excluded `ModelRateLimitError`, leaving
`ModelTimeoutError`, `ModelInternalServerError`, and
`ModelAPIConnectionError` to count toward the sliding-window error
rate. Under provider degradation these errors cluster in time
(concurrent in-flight requests time out together), so 5/10 in a row
is easy and trips the gate even when salvage could recover the rows.

Refs #575.

* feat(async): WARN log when provider showing degraded performance

Diagnostic A/Bs against build.nvidia.com showed runs failing silently
under provider degradation - no log indication that retryable errors
were piling up until the early-shutdown gate fired (or, post-fix,
until salvage exhaustion). Surfacing this earlier helps users
distinguish "DataDesigner is broken" from "the upstream provider is
slow today."

Tracks a separate sliding window over retryable-vs-not for every task
outcome (independent of the early-shutdown gate's window) and emits a
throttled WARN when the rolling fraction crosses the threshold.

Refs #575.

* fix(async): salvage partial row groups on early shutdown

Before: when the early-shutdown gate fired, any row group still in
flight stayed in `_rg_states` un-checkpointed. The buffer manager
later raised `FileNotFoundError` when the builder tried to read the
finalized parquet. User-visible result: `0 records produced`.

After: a new `_finalize_after_shutdown` step runs in `run()`'s finally
block, after `_cancel_workers` has drained in-flight tasks (Codex
caveat: in-flight `from_scratch`/`batch` tasks must not be allowed to
write into a buffer that's being finalized). For each remaining row
group it drops rows that aren't fully complete, then delegates to the
existing `_checkpoint_completed_row_groups` so the buffer manager's
zero-survivor handling (skip empty parquet, free buffer) kicks in
unchanged.

Also surfaces partial completion as a structured signal: scheduler
exposes `early_shutdown: bool` and `partial_row_groups: tuple[int, ...]`
properties so callers can detect partial completion programmatically
rather than parsing log lines. Builder uses this to emit a more
specific WARN distinguishing early shutdown from non-shutdown drops.

Refs #575.

* fix(throttle): reset consecutive_429s on non-rate-limit failure

In `release_failure`, the cascade counter wasn't reset, so a sequence
like 429 → 500 → 429 was treated as 2 consecutive 429s. The cascade
counter feeds AIMD's reduce-once-per-cascade logic; the second 429
should start a fresh cascade and trigger another concurrency reduction,
but currently doesn't.

Standalone bug surfaced during #575 investigation; not on the failure
path that drives the gate-trip outcome but worth fixing while we're
in this code.

* fix(custom): preserve retryability through CustomColumnGenerator wrap

A real-workload run of #575 showed the early-shutdown gate still trips
even with the gate-exclusion fix in place: the trigger is 10 timeouts
inside Anonymizer's QA-repair custom columns, all wrapped in
CustomColumnGenerationError (non-retryable) by the catch-all in
CustomColumnGenerator.

Two fixes here:

1. Re-raise RETRYABLE_MODEL_ERRORS unchanged before the wrap so the
   scheduler's _is_retryable correctly classifies them.

2. Surface _AsyncBridgedModelFacade timeouts as ModelTimeoutError
   instead of stdlib TimeoutError. Without this the sync bridge times
   out as the wrong exception type and is still classified non-retryable
   even after fix #1.

Also moves _RETRYABLE_MODEL_ERRORS from async_scheduler to
models/errors as the public RETRYABLE_MODEL_ERRORS tuple - both the
scheduler and the wrap site need it, and models/errors is the
appropriate home alongside the error class definitions.

Refs #575.

* feat(interface): typed DataDesignerEarlyShutdownError on zero-record runs

When the async scheduler hits early shutdown and produces zero
records, the buffer manager skips writing parquet (correctly), so
ArtifactStorage.load_dataset_with_dropped_columns() raises
FileNotFoundError. Previously this surfaced as a generic
DataDesignerGenerationError wrapping the FileNotFoundError, which is
ambiguous (could be missing files for any reason).

This commit:

- Adds DataDesignerEarlyShutdownError as a subclass of
  DataDesignerGenerationError so existing handlers still match while
  callers that want to react programmatically (retry on different
  alias, surface a degraded-provider message, etc.) can catch the
  specific type.
- Plumbs the scheduler's structured signals (early_shutdown,
  partial_row_groups) up through the builder so they're available at
  data_designer.create() time without re-introspecting the scheduler.
- create() raises the typed error in both failure modes (load fails
  or empty DataFrame returned) when builder.early_shutdown is True.

Refs #575.

* fix(async): emit first degraded-provider WARN regardless of clock state

  Initialize _last_degraded_warn_at to -inf so the first WARN is always
  emitted. The previous initialization to 0.0 suppressed the first WARN on
  fresh CI runners where time.monotonic() returns a small value (system
  boot uptime), making the throttle interval check (now - 0.0 < interval)
  true on the first attempt.

* fix(async): address review findings on early-shutdown salvage PR

Five real correctness issues caught in review of the original PR, plus a
few smaller cleanups and test simplifications.

Throttle - cascade reset (regression of existing AIMD invariant):
release_failure() now resets consecutive_429s only when in_flight == 0.
Resetting unconditionally broke "reduce once per cascade" when 429/500/429
arrived interleaved within a single in-flight burst - the second 429 was
treated as a new cascade and the limit got halved twice for what was
effectively one rate-limit event.

Interface - typed-error gating: DataDesignerEarlyShutdownError now fires
only when early_shutdown is true AND actual_num_records == 0. Without
this, a partial-salvage run that fails to load for unrelated reasons
(corrupt parquet, schema drift, disk hiccup) was misdiagnosed as "zero
records produced," hiding the real cause.

Async - WARN window scope: the degraded-provider warning was fed by every
task outcome, including samplers and non-LLM customs. In realistic
pipelines (one model column, several non-model columns) the rate stayed
under threshold even when every model call was failing, silencing the
WARN exactly when it mattered. Now gated on is_llm.

Async/builder - signal preservation across raises: scheduler.early_shutdown
and partial_row_groups are captured in a try/finally around future.result(),
so a processor failure during the salvage path doesn't drop the
structured signal. Both build() and build_preview() now reset per-run
state at the start so reused builders don't leak prior-run flags.

Async - dead code: dispatch_error capture in run() was unread (the post-
finally check is unreachable on the exception path). Removed.

Smaller cleanups:
- early-shutdown WARN says "non-retryable error rate exceeded threshold"
- bridge timeout WARN demoted to debug (ModelTimeoutError already surfaces
  it; the throttled degraded-provider WARN is the user-facing signal)
- TODO note for threading degraded_warn_* through RunConfig
- doc note in _finalize_after_shutdown clarifying that pre-batch processor
  isn't re-run on partial-salvage row groups

Tests:
- new regression tests for the cascade burst case, partial-salvage error
  gating, and LLM-only WARN window
- direct unit test for _reset_run_state
- dedup via _make_storage / _seed_plus_cell_setup helpers
- WARN emission cases parametrized into a single test
- shared parametrize lists hoisted to module-level constants
- redundant cascade test dropped in favor of the more thorough drain
  variant; redundant healthy-baseline test folded into the zero-survivor
  test

* chore(async): address Nabin's review comments

Style cleanups, parametrization, docstring polish, and one consistency
fix in the typed-error path. All non-blocking ("Ship it (with nits)").

interface/data_designer.py:
- preview() now raises DataDesignerEarlyShutdownError when shutdown
  produced zero records (parity with create()), and also gates on
  actual_num_records == 0 so partial-salvage runs that fail to load
  don't get misdiagnosed
- create()'s defensive empty-DF guard mirrors the load-failure guard
  with the same actual_num_records == 0 check

async_scheduler.py:
- _record_retryable_outcome docstring clarifies that the call site
  filters by is_llm; the function alone reads as if every outcome feeds
  the window

dataset_builder.py:
- moved _reset_run_state() down past the public methods to match the
  project's public-before-private convention

test_custom.py:
- flattened TestAsyncBridgedModelFacade class into module-level test
  functions (matches the rest of the file)
- hoisted inline imports (asyncio, threading, patch, _AsyncBridgedModelFacade,
  SyncClientUnavailableError) to top of file
- driven retryable-error parametrize off RETRYABLE_MODEL_ERRORS directly
  instead of the hand-rolled factory list, so new retryable types pick
  up coverage automatically
- dropped the redundant "Sanity" block in test_async_bridge_timeout_raises_
  model_timeout_error - pytest.raises already enforces the type, the
  duplicate block was running the same slow scenario twice

test_async_scheduler.py:
- parametrize over RETRYABLE_MODEL_ERRORS directly (same as above)

test_data_designer.py:
- added preview-path tests for the typed-error and partial-salvage
  fall-through cases
- updated the existing empty-DF test to also patch actual_num_records=0
  (otherwise the new gating in the empty-DF guard skips the typed error)

* test(interface): consolidate create() error-dispatch tests into a matrix

Five separate tests (two existing, three new from earlier in this PR)
all probed the same dispatch logic in create(): "given a load outcome
and a builder state, which error type should fire?" Pulled them into a
single parametrized matrix indexed by (load_side_effect, early_shutdown,
actual_num_records).

Net result: 5 named tests → 1 parametrized test with 6 cells, and the
previously-missing empty_df + shutdown + partial salvage cell is now
covered.

Test names retain readable IDs (load_fails_shutdown_zero_records etc.)
so failures still pinpoint the exact case in pytest output.
2026-04-30 14:43:35 -03:00
Nabin Mulepati
05c2e8df2e
fix: normalize image_url blocks to OpenAI-compliant dict format (#577)
* fix: normalize image_url blocks to OpenAI-compliant dict format (#576)

ImageContext.get_contexts() produced bare-string and non-standard dict
shapes for image_url content blocks, which broke the native OpenAI
adapter (passes blocks through as-is) and only worked with Anthropic
by accident via defensive handling in the translation layer.

- Wrap all image_url values in {"url": ...} dict (OpenAI spec)
- Remove non-standard "format" key from base64 dicts
- Tighten Anthropic translate_image_url_block to require dict input

Fixes #576

Made-with: Cursor

* fix: reject malformed image_url blocks instead of silently dropping them

translate_image_url_block now raises TypeError when image_url is not a
dict. Since all image_url blocks are constructed internally, a bare
string indicates an internal bug and should fail loudly.

Made-with: Cursor

* address review: tighten return type, add OpenAI + data-URI tests

- Narrow _auto_resolve_context_value return type to dict[str, str]
- Add OpenAI-client regression tests for image_url dict passthrough
- Cover both bare-URL and bare-data-URI rejection in Anthropic tests

Made-with: Cursor
2026-04-28 09:35:27 -06:00
Andre Manoel
bfa7a46c9a
chore: async engine readiness - blockers and polish before default (#553)
* chore: async engine readiness blockers (#462)

- Processor callback failures (pre-batch and post-batch) now raise
  DatasetGenerationError instead of silently dropping row groups
- Early shutdown and all error paths drain in-flight workers via a
  finally block in AsyncTaskScheduler.run()
- Pre-batch and post-batch processors that change row count in async
  mode raise immediately (strict_row_count guard)
- Partial completion logs a warning when actual < target records
- allow_resize=True auto-falls back to sync engine with a deprecation
  warning instead of raising, using a per-run _use_async flag
- Preview path mirrors the trace check from the full build path;
  PreviewResults exposes task_traces

Closes #462

* fix: address review findings for async engine readiness

- Prevent double-wrapping of DatasetGenerationError in scheduler callbacks
- Fix stacklevel in allow_resize DeprecationWarning to point at user code
- Update stale comment to reflect fail-fast behavior
- Rename misleading test and remove unused caplog fixture
- Add zero-warnings assertion for happy-path case
- Move warnings import to module level

* fix: address review comments on async engine readiness

- Extract _is_async_trace_enabled() helper to deduplicate trace check
- Post-batch row-count guard now raises DatasetProcessingError (not
  DatasetGenerationError) so the scheduler wraps it with rg_id
  symmetrically with the pre-batch path
- Add test_dropped_rows_reduce_actual_record_count for partial
  completion path

* fix: address second-round review feedback on async engine readiness

- DeprecationWarning no longer swallowed by interface error wrapper
- Incomplete-RG log only fires on clean scheduler exits
- Post-batch row-count guard moved into ProcessorRunner (strict_row_count)
- Expose active_worker_count property on AsyncTaskScheduler
- Drop unused monkeypatch fixture and pytest import

* test: fold metadata-count test into dropped-rows test

Remove test_write_metadata_records_actual_and_target_counts (poked
_actual_num_records directly) and assert metadata counts in
test_dropped_rows_reduce_actual_record_count instead, which exercises
the same path through the public API.
2026-04-22 19:31:21 -03:00
Andre Manoel
f6128220da
fix: prevent sticky progress bar ghost lines from terminal wrapping (#565)
* fix: prevent sticky progress bar ghost lines from terminal wrapping

When a formatted bar line exceeded the terminal width, it wrapped to
multiple physical lines but _drawn_lines only counted 1 per bar.
Subsequent cursor-up clears missed the wrapped portions, leaving ghost
lines that accumulated every redraw cycle.

- Count physical lines in _redraw based on visible width vs terminal width
- Cap rate at 9999.9 and eta at 999s to prevent stats field overflow
- Remove max(10, ...) bar_width floor; degrade gracefully on narrow terminals
- Add update_many() for batch updates with a single redraw cycle
- Use update_many() in AsyncProgressReporter to reduce N redraws to 1

* test: strengthen wrapping and degradation tests per review feedback

- Monkeypatch shutil.get_terminal_size to force narrow terminal in tests
- Inject oversized lines via _format_bar patch to exercise physical line
  counting (ceiling division) code path
- Assert output line width <= width-1 in graceful degradation test
- Assert _drawn_lines == 1 in degradation mode (no false wrapping)

* test: flatten test classes and replace private attribute assertions

Address review feedback: use flat pytest functions per DEVELOPMENT.md
conventions instead of class-based test suites. Move inline imports
to module level.

Replace most _drawn_lines and _bars assertions with public output
proxies (counting CURSOR_UP_CLEAR sequences and checking rendered
bar content). Keep _drawn_lines access only where no clean public
proxy exists (multi-checkpoint add/remove test, zero-bars-remaining
after log_final).

* refactor: expose drawn_lines as public read-only property

Replace _drawn_lines access in tests with a public property,
consistent with the existing is_active pattern.
2026-04-22 13:19:12 -03:00
Przemysław Boruta
956b8cd30d
refactor: unify duplicate DAG construction (dag.py + ExecutionGraph) (#511)
* refactor: unify DAG construction by moving topological sort into execution_graph.py

Eliminates dag.py and its networkx dependency by moving
topologically_sort_column_configs into execution_graph.py as a
module-level function. Side-effect resolution is now O(1) via a
side_effect_map dict (previously O(n²) linear scan). Kahn's algorithm
is reused in-place rather than leaning on networkx.topological_sort.

Closes #510

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: relax non-deterministic ordering assertion in test_dag

test_judge and test_code_and_depends_on_validation_reasoning_traces have
no mutual dependency and reach in-degree 0 simultaneously in Kahn's
algorithm. Set iteration order varies with PYTHONHASHSEED, making the
strict list assertion flaky. Assert only the topological invariants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(engine): document intentional skip.columns omission in topologically_sort_column_configs

ExecutionGraph.create handles skip.when ordering edges in its own
two-pass build; the pre-sort function only needs required_columns
to produce a valid ColumnConfigT ordering for config compilation.

* fix(engine): restore skip.columns edges in topologically_sort_column_configs

The sync builder executes generators in compile-time sort order (from
_column_configs, populated via this function), not ExecutionGraph order.
Dropping skip.columns edges caused evaluate_skip_when to hit UndefinedError
when the referenced column hadn't been generated yet, silently skipping rows.

Also:
- Refactor edge building into _add_edge() helper with a label parameter to
  distinguish "required" from "skip.when" edges in debug output
- Rename test_dag.py -> test_topological_sort.py to match the new module location
- Add from __future__ import annotations (required by AGENTS.md)
- Add test_side_effect_column_ordering covering the side_effect_map.get() path
- Add test_skip_when_column_ordering covering the skip.columns edge path

* fix(tests): move SkipConfig import to module level in test_topological_sort

* refactor(execution-graph): extract nested closures to module-level helpers, fix docs and test style

- Extract `resolve`/`_add_edge` nested closures in `topologically_sort_column_configs`
  to module-level `_resolve_dag_column` and `_add_dag_edge` per STYLEGUIDE.md
- Add self-edge guard in `_add_dag_edge` (consistent with `ExecutionGraph.create`)
- Update `architecture/dataset-builders.md` to remove stale `dag.py`/NetworkX references
- Fix import order in `test_topological_sort.py` (SkipConfig before column_configs)
- Add `-> None` return annotations to legacy test functions

* refactor(execution-graph): extract shared Kahn helper, inline resolve, fix module-level ordering

- Extract `_kahns_topological_sort` shared helper used by both
  `ExecutionGraph.get_topological_order` and `topologically_sort_column_configs`
- Inline `_resolve_dag_column` into `_add_dag_edge` (no other call sites)
- Move private helpers after the public function (public-before-private per STYLEGUIDE)
- Add docstring to `topologically_sort_column_configs`
- Update architecture/dataset-builders.md to mention both sort sites for skip.when edges

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
2026-04-22 10:13:46 -03:00
Eric W. Tramel
8be4ff787f
feat: add RunConfig jinja rendering engine (#557) 2026-04-17 15:06:27 -04:00
Andre Manoel
a965bc1542
fix: bridge model.generate() to agenerate() for custom columns in async engine (#545)
* feat: bridge model.generate() to agenerate() for custom columns in async engine

Custom column generators that call model.generate() fail under the async
engine because the sync HTTP client is unavailable. Add an
_AsyncBridgedModelFacade proxy in _build_models_dict() that intercepts the
sync-client RuntimeError and schedules agenerate() on the engine's persistent
event loop via run_coroutine_threadsafe. Includes a deadlock guard for async
custom columns running on the event loop.

* refactor: wrap facades at sync call site, not in _build_models_dict

Move _AsyncBridgedModelFacade wrapping from _build_models_dict() into
_invoke_generator_function() so the async path gets raw facades. The
bridge proxy is only needed for sync custom columns; async columns
already have direct access to model.agenerate().

* fix: address review feedback - typed exception, timeout cleanup, kwargs test

- Introduce SyncClientUnavailableError so the facade catches by type
  instead of matching error strings (review comment #1)
- Add future.cancel() + logger.warning() on timeout to match the
  _run_coroutine_sync pattern in base.py (review comment #2)
- Assert kwargs forwarding in the async bridge test (review comment #4)

* fix: let SyncClientUnavailableError propagate through @catch_llm_exceptions

The decorator catches all exceptions and wraps them into DataDesignerError,
which prevented the async bridge proxy from ever seeing the original error.
Add an early match case that re-raises SyncClientUnavailableError directly.

* refactor: make SYNC_BRIDGE_TIMEOUT a public constant

Drop the underscore prefix since the constant is exported and used
across modules (base.py and custom.py).
2026-04-17 13:01:55 -03:00
Nabin Mulepati
a9af365e8e
feat: add skip.when conditional column generation (#502)
* plan: add skip_when for conditional column generation (#479)

Adds implementation plan for a `skip_when` field on `SingleColumnConfig`
that enables conditional column generation. When the Jinja2 expression
evaluates truthy, the cell is set to None and the generator is skipped.
Skips auto-propagate through the DAG to downstream columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: remove HopChain example from skip_when plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: replace HopChain example with generic product review example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: add open questions on skip sentinel value and row filtering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* plan: major revision — SkipConfig model, sync engine support, decouple propagation

- Introduce SkipConfig(when, value) as nested model on SingleColumnConfig
- Move propagate_skip to SingleColumnConfig as independent field, fixing
  bug where columns with no SkipConfig couldn't participate in propagation
- Add full sync engine implementation (Steps 4a-4d) covering both
  _fan_out_with_threads and _run_full_column_generator dispatch paths
- Add serialization boundary stripping for both DatasetBatchManager (sync)
  and RowGroupBufferManager (async)
- Simplify architecture diagrams for readability
- Update all references, design decisions, verification plan

Made-with: Cursor

* updates

* plan: document get_required_columns for skip propagation

- Explain why propagation must not use get_upstream_columns() once
  skip.when adds DAG edges; add _required_columns and
  get_required_columns() to the execution graph plan
- Point async _run_cell at get_required_columns for parity with sync
- Clarify DropSkippedRowsProcessorConfig vs stripping __skipped__ for
  DataFrames; tighten resolved-questions wording
- Extend DAG/graph verification with gating_col regression case

Refs #479

Made-with: Cursor

* plan: centralize __skipped__ handling in skip_provenance

- Document new skip_provenance.py (key constant, read/write/strip API)
- Point sync builder, async scheduler, and batch buffers at shared helpers
- Strip metadata before every DataFrame from buffer dicts, including
  FULL_COLUMN active subsets
- Split §3 into skip_evaluator vs skip_provenance; extend verification

Refs #479

Made-with: Cursor

* plan: align doc title with SkipConfig / skip.when

Drop legacy skip_when naming in headings and #362 cross-reference.

Refs #479

Made-with: Cursor

* plan: address review — delimiter validation, centralized error handling, caller-owns-deserialization

- SkipConfig._validate_when_syntax now checks find_undeclared_variables
  is non-empty, rejecting expressions without {{ }} delimiters that
  would silently skip every row
- evaluate_skip_when centralizes try/except so both sync and async
  engines get identical fail-safe behavior on eval errors
- evaluate_skip_when takes a single pre-deserialized record; caller
  runs deserialize_json_values once and passes to both skip eval and
  generator (no double deserialization, no redundant parameter)
- Update _should_skip_cell, async _run_cell, Files Modified table,
  and verification section accordingly

Refs #479

Made-with: Cursor

* plan: add get_side_effect_columns accessor to execution graph spec

Document _side_effects_by_producer inverse map and
get_side_effect_columns() accessor on ExecutionGraph, needed by
_write_skip_to_record / apply_skip_to_record to clear __trace,
__reasoning_content, etc. on skip. Added to both Step 2b metadata
section and Files Modified table.

The __skipped__ leak into active_df (greptile's other P1) was already
fixed in 70463789 via strip_skip_metadata_from_records.

Refs #479

Made-with: Cursor

* add skip.when conditional column generation

Introduce SkipConfig on SingleColumnConfig to gate column generation
with a Jinja2 expression. Columns can be skipped by expression or by
upstream propagation (propagate_skip flag).

- SkipConfig: Pydantic model with config-time syntax/delimiter/variable
  validation and cached column extraction from the Jinja2 AST
- skip_evaluator: runtime expression evaluation via NativeSandboxedEnvironment
  with fail-safe error handling (skip on expected failures)
- skip_provenance: centralized __skipped__ record tracking shared by
  sync builder, async scheduler, and buffer managers
- DAG/ExecutionGraph: skip.columns wired as dependency edges in both
  topological sort and static execution graph
- Validation: validate_skip_references checks reference existence,
  sampler/seed scope, and allow_resize conflicts
- Sync builder: cell-by-cell and full-column skip with merge-back
- Async scheduler: cell and batch skip with live-buffer provenance

Made-with: Cursor

* fix review findings for skip.when implementation

- Add skip evaluation to _fan_out_with_async (was missing, causing
  skipped rows to still be sent to the LLM)
- Preserve __skipped__ provenance on non-skipped records after
  full-column generation so multi-hop propagation works
- Use single live-buffer reference in _run_batch skip loop for
  consistency with _run_cell
- Move Template import to TYPE_CHECKING and reorder import blocks
- Replace O(n²) sum() with itertools.chain in dag.py
- Add set_required_columns/set_propagate_skip/set_skip_config
  setters to ExecutionGraph for symmetry with existing API

Made-with: Cursor

* add conditional generation with skip recipe and refactor skip helpers

Add a new recipe demonstrating skip.when patterns (expression gate,
propagation, opt-out) with a customer support ticket pipeline.

Also extract _should_skip_record in async_scheduler, remove the
redundant propagate_skip param from should_skip_by_propagation, and
pass a precomputed all_side_effects set through the DAG sort.

Made-with: Cursor

* updates

* fixes

* remove recipe > inject conditional gen into existing tutorial

* regen colab notebooks

* fix: handle missing execution graph in _column_can_skip

Return False when the graph has not been initialized instead of raising,
since skip logic cannot apply before generators are set up.

Made-with: Cursor

* parametrize some tests

* public before private

* slight refactor for readability

* parametrize some tests

* minor fixes

* reanme internla skip tracker key name

* clarify intent in comment

* when skipped _run_cell should return skipped value even though the consumer doesn't currenlty care about it

* remove inline import

* minor refactor for clarity

* fix: preserve skip metadata across replace_buffer and exclude allow_resize from skip branch

Two bugs in the sequential engine's _run_full_column_generator:

1. replace_buffer(df.to_dict()) erased __internal_skipped_columns in
   three code paths (MultiColumnConfig, non-skip-aware, has_skipped=False
   fallthrough), breaking propagate_skip for downstream columns when an
   independent FULL_COLUMN generator ran between skip-setting and
   propagating columns.

2. _column_can_skip returned True for allow_resize=True columns via
   propagation, causing the skip-aware merge path to raise on the 1:1
   row-count check for 1:N generators.

- Add restore_skip_metadata helper to skip_tracker.py
- Guard _column_can_skip against allow_resize=True columns
- Refactor _run_full_column_generator into three focused methods
- Remove dead allow_resize / _log_resize_if_changed from skip path
- Remove redundant _require_graph() calls in skip helpers
- Add single_column_config_by_name cached property
- Add integration tests for both bugs and unit tests for the helper

Made-with: Cursor

* address review comments on skip.when PR (#502)

- Extract shared skip decision logic (_should_skip_cell / _should_skip_record)
  into should_skip_column_for_record() in skip_evaluator.py so both sync and
  async engines call the same function (andreatgretel review comment)
- Extend SkipConfig self-reference validation to cover side-effect columns
  (e.g. review__trace on the review column) — previously only checked
  self.name, now checks self.name | self.side_effect_columns
- Add async engine integration tests for skip paths: cell-by-cell with
  propagation and full-column batch skip (exercises _run_cell / _run_batch)
- Fix test_allow_resize_column_not_blocked_by_upstream_skip to use default
  propagate_skip=True so it actually exercises the allow_resize guard
- Move get_skipped_column_names from skip_tracker to skip_evaluator (sole
  production consumer)

Made-with: Cursor

* address cr feedback

* Fix issue with full column  generating messing up order of skipped rows

* add skip conditional generation edge case tests

- test_skip_evaluator: parametrized should_skip_column_for_record covering
  propagation, expression gates, short-circuiting, and disabled propagation
- test_execution_graph: skip metadata accessors (get_skip_config,
  should_propagate_skip, get_required_columns, get_side_effect_columns,
  resolve_side_effect, skip.when DAG edges)
- test_dataset_builder: chained transitive propagation (4 levels),
  two independent skip gates, custom skip.value, row count preservation

Made-with: Cursor

* fix: make expression jinja validator private

Rename assert_expression_valid_jinja to _assert_expression_valid_jinja
to match the private naming convention used by other model validators.

Made-with: Cursor

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:31:50 -06:00
Eric W. Tramel
64f31bc581
feat: add generic and OpenRouter attribution headers (#542) 2026-04-14 11:59:49 -04:00
Andre Manoel
533a94b78f
fix: async engine side-effect column propagation and collision resolution (#509)
* fix: async engine side-effect column propagation and collision resolution

ExecutionGraph.set_side_effect() now uses first-writer-wins instead of
last-writer-wins, matching sync engine semantics where earlier consumers
see the first producer's value. This prevents false DAGCircularDependencyError
when multiple generators declare the same side-effect column at different
pipeline stages.

AsyncTaskScheduler now includes side-effect columns in _instance_to_columns
so their values are written to the RowGroupBufferManager and available to
downstream prompt templates.

Fixes #508

* fix: separate side-effect columns from completion tracking in async scheduler

Side-effect columns added to _instance_to_columns caused KeyError in
CompletionTracker._validate_strategy() because they are not registered
in the execution graph. Split into _instance_to_write_columns (buffer
writes, includes side-effects) and _instance_to_columns (completion
tracking, real columns only).

* fix: warn on side-effect collision and clarify scheduler column maps

Log a warning when multiple producers register the same side-effect
column (first-writer-wins still applies). Rename _instance_to_columns
and _instance_to_write_columns per review feedback for clarity.

* fix: raise ConfigCompilationError on duplicate side-effect producers

Replace first-writer-wins collision handling with a hard error.
Each side-effect column must have exactly one producer; duplicates
are a configuration issue to be fixed at the source.

* fix: reject duplicate side-effect producers in sync DAG path

Mirror the async path check: raise ConfigCompilationError when two
custom columns declare the same side-effect column name during
topological sort.
2026-04-13 20:16:06 -03:00
Johnny Greco
4a2813654a
fix: always return ISO-8601 from datetime postproc (#484) (#512)
* fix: always return ISO-8601 from datetime postproc (#484)

The DatetimeFormatMixin.postproc heuristics inferred output format from
value distribution, silently stripping date/time components for small
datasets or narrow date ranges. Replace with deterministic ISO-8601
output via vectorized strftime. Users who need custom formats can still
set convert_to on the SamplerColumnConfig.

* docs: update convert_to docstring and add DatetimeFormatMixin docstring

The SamplerColumnConfig.convert_to docstring incorrectly stated that
only "float", "int", or "str" are accepted. Datetime/timedelta samplers
accept strftime format strings. Also document the ISO-8601 default.

* test: add regression test for #484 via DataDesigner.preview API

Captures the exact reproducer from the issue: a single-record datetime
preview through the public DataDesigner.preview() interface must return
a full ISO-8601 timestamp, not a bare year string.

* test: trim redundant datetime tests, align reproducer with issue #484

- Remove postproc_same_day_records (subsumed by same_month + no_convert_to)
- Remove postproc_always_parseable (subsumed by stdlib_fromisoformat)
- Remove all_same_month integration test (subsumed by narrow_range_single_day)
- Update single_record test to use unit="h" matching the issue reproducer

* fix: address review nits — move datetime import to module scope, drop redundant isinstance
2026-04-09 12:50:40 -04:00
Johnny Greco
fdd5ebb5ef
feat: add Pi Coding Agent rollout seed source (#513) (#514)
Add support for ingesting Pi Coding Agent session artifacts as an agent
rollout seed source. Pi sessions are tree-structured JSONL files; the
handler resolves the active conversation path by walking from the last
entry back to the root via id/parentId links.

Key points:
- Tree-structured sessions with automatic active-path resolution
- Entry-level types: model_change, compaction, branch_summary,
  custom_message, thinking_level_change
- Message roles: user, assistant (inline ToolCall/ThinkingContent/
  TextContent blocks), toolResult, bashExecution (synthesized as
  tool-call pairs), custom, compactionSummary, branchSummary
- Extract shared normalize_message_content to utils.py (was duplicated
  in Hermes handler)
2026-04-09 12:07:47 -04:00
Nabin Mulepati
c27ad62a14
fix: use non-blocking dispatch to prevent pipeline starvation (#504) (#505)
Replace blocking semaphore acquire in the dispatch loop with a
non-blocking try_acquire that breaks out when the semaphore is full.
This causes the outer loop to re-query the frontier, picking up
newly-ready downstream tasks instead of draining a stale snapshot.

Fixes #504

Made-with: Cursor
2026-04-08 13:50:26 -06:00
Eric W. Tramel
7891dd53cb
feat: add Hermes Agent rollout support (#500) 2026-04-07 12:39:49 -04:00
Eric W. Tramel
58870bb83f
feat: add ATIF rollout ingestion (#495) 2026-04-06 11:06:14 -04:00
Nabin Mulepati
d43ac1cb2e
test: add transport-wiring regression tests for #459 (#485)
* test: add transport-wiring regression tests for #459

The existing test_client_limits_respect_max_parallel_requests only
checks the limits property, which was always computed correctly even
before the fix.  Add mock-capture tests that patch HTTPTransport and
AsyncHTTPTransport constructors, trigger lazy init via completion() /
acompletion(), and assert the constructors received the correct limits.
These tests fail on the pre-fix code (assert_called_once fails because
the old code never explicitly constructed the transport).

Made-with: Cursor

* test: address review — parametrize over AnthropicClient, use helpers

Parametrize all three #459 regression tests over both
OpenAICompatibleClient and AnthropicClient (wiring lives in
HttpModelClient, so both subclasses need coverage). Use the existing
_make_openai_client / _make_anthropic_client helpers with **kwargs
instead of constructing clients directly. Move transport patch
constants up with the other module-level constants.

Made-with: Cursor
2026-04-01 13:54:33 -06:00
Przemysław Boruta
97119863da
fix: respect max_parallel_requests in HTTP connection pool size (#460)
* fix: respect max_parallel_requests in HTTP connection pool size

Pass a pre-configured HTTPTransport/AsyncHTTPTransport with the correct
limits into RetryTransport instead of letting it create its own pool
with httpx defaults (100 connections). Previously, the limits calculated
from max_parallel_requests were passed to httpx.Client(limits=...) which
silently ignores them when a custom transport is provided.

Fixes #459

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Przemysław <przemekboruta@interia.pl>

* fix: document transport param and annotate private attr chains in tests

Address Greptile P2 review comments on PR #460:
- Add docstring entry for the new `transport` parameter in
  `create_retry_transport` explaining accepted types and None default
- Add inline comments in pool-size regression tests explaining the
  private attribute chain (_sync/_async_transport → _pool → _max_connections)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Przemysław <przemekboruta@interia.pl>

* refactor: address review feedback — public limits property, clean tests

- Drop three tests from test_retry.py that reached into private
  attributes of third-party RetryTransport (_sync_transport,
  _async_transport); the end-to-end contract is covered by the
  pool-size regression test
- Expose a public `limits` property on HttpModelClient so tests and
  diagnostic code can assert the pool configuration without walking
  private attribute chains across three libraries
- Replace two private-chain pool assertions with a single
  `client.limits.max_connections == 600` check against the new property
- Trim "inner" from the transport docstring entry (nabinchha suggestion)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Przemysław <przemekboruta@interia.pl>

---------

Signed-off-by: Przemysław <przemekboruta@interia.pl>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-03-31 15:50:54 -06:00
Andre Manoel
c146af5146
chore: async engine follow-up - rename, preview, lifecycle, progress (#456)
* chore: rename ColumnWiseDatasetBuilder, wire async preview, unify row-group lifecycle

- Rename ColumnWiseDatasetBuilder to DatasetBuilder and
  column_wise_builder.py to dataset_builder.py, update all references
- Extract _prepare_async_run() factory shared by build and preview paths
- Add _build_async_preview() for async preview with no disk checkpoints
- Replace on_row_group_complete/on_checkpoint_complete with single
  on_finalize_row_group callback; caller handles checkpointing
- Add free_row_group() on RowGroupBufferManager for discard-without-write
- Free fully-dropped row groups instead of finalizing them
- Add consolidated AsyncProgressReporter for async generation logging

Closes #437, closes #442, closes #444

* feat: add consolidated async progress reporting with row group context

- Add AsyncProgressReporter: groups per-column progress into a single
  log block emitted at configurable intervals (default 5s)
- Add quiet mode to ProgressTracker to suppress per-tracker logging
  when used with the consolidated reporter
- Add ContextVar-based row group tagging (RG1, RG2, ...) for log
  messages emitted inside async tasks (samplers, expressions, seeds)
- Add progress_interval to RunConfig for user-configurable reporting
- Remove log_start_async from ProgressTracker (superseded by reporter)

Closes #443

* fix async preview and progress reporting

* feat: add opt-in sticky ANSI progress bars for generation

Add RunConfig.progress_bar setting that replaces periodic log-line
progress with sticky terminal bars that stay at the bottom while
logs scroll above. Pure ANSI escape codes, no new dependencies.

Disabled by default - existing log-based output unchanged.

* docs: add missing progress_interval docstring in RunConfig

* fix: update progress bar on every completion in async path

Skip the time gate when the progress bar is active so the bar
redraws on every record instead of every progress_interval seconds.

* fix: resolve row-group semaphore deadlock when all tasks are deferred

When all tasks for admitted row groups fail with transient errors,
the row-group semaphore never releases, blocking admission of new
row groups. Fix by salvaging stalled row groups inline - retrying
deferred tasks immediately so row groups can checkpoint and free
their semaphore slots.

Also updates row group log format to (x/X) with leading zeros.

* fix: eagerly salvage stalled row groups to avoid wasting semaphore slots

Run inline salvage after every checkpoint pass instead of only when
globally stalled. Row groups with 0 in-flight and only deferred tasks
are salvaged immediately, freeing their semaphore slot for new work.

* fix: address review findings from greptile and codex

- Use `with` statement for progress bar context (safe __exit__ on error)
- Check bar.is_active instead of bar is not None (non-TTY fallback)
- Record failures (not skips) for tasks that exhaust salvage retries
- Record skipped tasks when pre-batch filtering drops rows

* fix: stable progress bar width and accurate failure counts

- Pre-compute fixed stats width at bar creation to prevent bar
  resizing when failed count appears
- Cap displayed completed at total to avoid >100% on retries
- Exclude already-failed columns from skip recording to prevent
  double-counting in progress reporter

* fix: address Nabin's review - exclude_columns, dead code, docstring

- Add exclude_columns={task.column} on non-retryable batch/from_scratch
  drop path to prevent double-counting (same pattern as cell path)
- Simplify salvage drop to per-task exclude (is_dropped guard handles
  multi-column case)
- Remove dead _in_flight_for_rg method
- Fix context.py docstring to match actual (x/X) format

* fix: _drain_frontier exits before dispatching ready salvage tasks

After salvage discards a cell task from dispatched (making it
available in the frontier), _drain_frontier broke immediately
because nothing was in-flight yet. The task and its downstream
were never re-dispatched, leaving the row group incomplete.

Fix: only break when both ready and in-flight are empty.

* fix: salvage edge cases found by code review

- _drain_frontier: only break when both ready and in-flight are empty
- _salvage_rounds: re-mark sibling columns as dispatched after
  from_scratch retry to prevent duplicate dispatch
- _salvage_stalled_row_groups: separate exhausted tasks from new
  drain failures to avoid treating non-stalled tasks as permanent
- _checkpoint_completed_row_groups: clean up deferred tasks for
  checkpointed row groups
- Early shutdown: salvage stalled row groups before exiting

* fix: skip record_failure for already-dropped rows in salvage

When multiple columns fail for the same row, the first drop records
a skip for the other column. Without this guard, record_failure
fires again for the second column, double-counting it.
2026-03-25 14:50:48 -03:00
Nabin Mulepati
1356408c6e
feat: remove litellm dependency and bridge path (#455)
* adjust plan

* feat: remove litellm dependency and bridge path (PR-7)

- Delete litellm_bridge.py adapter, litellm_overrides.py, and their tests
- Remove LiteLLM fallback branch and DATA_DESIGNER_MODEL_BACKEND env var
  from clients/factory.py; unknown provider_type now raises ValueError
- Remove apply_litellm_patches() call from models/factory.py
- Remove LiteLLM exception match arms and DownstreamLLMExceptionMessageParser
  from models/errors.py; port context window detail extraction to
  _extract_context_window_detail for native ProviderError path
- Remove litellm from lazy_heavy_imports.py and pyproject.toml runtime deps
- Remove flatten_extra_body parameter from TransportKwargs.from_request
- Clean up LiteLLM references in docstrings, comments, and AGENTS.md
- Add full ProviderErrorKind test coverage to test_model_errors.py
- Update benchmark script to patch OpenAICompatibleClient instead of
  CustomRouter

Made-with: Cursor

* fix: forward tools to _fake_response in benchmark patch

The old CustomRouter patch forwarded **kwargs (including tools) to
_fake_response, but the new OpenAICompatibleClient patch only passed
model and messages — silently disabling tool-call simulation in
benchmark scenarios that exercise allow_tools.

Made-with: Cursor

* fix: address PR-7 review feedback

- Return ChatCompletionResponse from benchmark fakes instead of
  FakeResponse to match the native client contract (facade expects
  .message, not .choices[0].message)
- Add ids= to parametrize block in test_model_errors.py for readability
- Remove unnecessary try/except from _extract_context_window_detail;
  the `if marker in` guard is sufficient
- Make context window marker match case-insensitive
- Replace stale httpx.AsyncClient callout in async_concurrency.py
  docstring with generic "async-stateful resources"

Made-with: Cursor
2026-03-24 11:41:00 -06:00
Nabin Mulepati
f24e88d0f8
feat: wire ThrottledModelClient and dual-semaphore scheduler (#449)
* feat: constrain HttpModelClient to single concurrency mode

Constrain each HttpModelClient instance to sync or async at
construction time, eliminating dual-mode lifecycle complexity
that caused transport leaks and cross-mode teardown bugs.

- Add ClientConcurrencyMode StrEnum replacing Literal type alias
- Add concurrency_mode constructor param with mode enforcement
  guards on _get_sync_client / _get_async_client
- Simplify close()/aclose() to single-mode teardown (cross-mode
  calls are no-ops)
- Thread client_concurrency_mode through factory chain from
  DATA_DESIGNER_ASYNC_ENGINE env var
- Add ModelRegistry.arun_health_check() async mirror and wire
  async dispatch in ColumnWiseDatasetBuilder
- Make ensure_async_engine_loop public (used cross-module)
- Fix test helpers to derive concurrency mode from injected client
- Add PR-5 architecture notes

* fix: address review findings on HttpModelClient lifecycle

- Fix transport double-close in close()/aclose() by delegating
  teardown to the httpx client when one exists (if/elif pattern);
  only close transport directly if no client was ever created
- Reject mismatched client/mode injection in constructor (e.g.
  async_client on a sync-mode instance raises ValueError)
- Add 5-minute wall-clock timeout to future.result() in async
  health check dispatch
- Add constructor validation tests for both mismatch directions
- Update PR-5 architecture notes

Made-with: Cursor

* fix: address PR-5 review feedback on HttpModelClient lifecycle

- Type _transport as RetryTransport | None, removing type: ignore
  suppressions in close()/aclose()
- Make transport fully lazy (None by default) and accept optional
  transport constructor param so injected-client paths don't
  eagerly allocate an unused RetryTransport
- Cancel future on TimeoutError in async health check dispatch so
  timed-out coroutines don't linger on the shared event loop
- Set health check timeout to 180s (3 min) matching architecture
  notes
- Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the
  list parametrizes both sync and async mode tests
- Update architecture notes timeout from 180→300 back to 180 to
  match implementation

Made-with: Cursor

* add plan for pr-6

* update plan

* feat: wire ThrottledModelClient and dual-semaphore scheduler for AIMD concurrency control

Introduces per-request AIMD throttle feedback by wrapping every
ModelClient with ThrottledModelClient (acquire/release around each HTTP
call) and adding a dual-semaphore scheme to AsyncTaskScheduler that
separates submission slots from LLM-wait slots via a one-way handoff.

Key changes:
- ThrottledModelClient: decorator that acquires/releases ThrottleManager
  permits around sync and async completion, embedding, and image calls,
  routing 429s to release_rate_limited for AIMD backoff.
- RetryConfig: removes 429 from retryable_status_codes so rate-limit
  signals propagate to the AIMD feedback loop instead of being silently
  retried at the transport layer.
- AsyncTaskScheduler: adds TrackingSemaphore, LLM-wait semaphore, and
  is_llm_bound-driven handoff; extracts _main_dispatch_loop and
  _salvage_rounds; tracks worker tasks for clean cancellation.
- ColumnGenerator.is_llm_bound property on base, model-registry, and
  custom generators to classify which columns need the handoff.
- ThrottleConfig (Pydantic model on RunConfig) for user-tunable AIMD
  knobs (reduce_factor, additive_increase, success_window, block_seconds).
- ModelRegistry.get_aggregate_max_parallel_requests() to size the
  LLM-wait semaphore from registered model configs.
- Renames throttle.py -> throttle_manager.py for clarity.

Made-with: Cursor

* fix: address PR-6 review feedback on concurrency safety

- Shield _cancel_workers() with asyncio.shield in the CancelledError
  handler to prevent double-cancellation from leaking semaphore permits
- Guard ThrottleManager release calls in _throttled_sync/_athrottled
  with try/except so a failing release doesn't mask the original error
  or permanently leak the in-flight permit counter
- Add registration guard in ThrottleManager.try_acquire() that raises
  RuntimeError if called before register(), making the ordering
  invariant explicit rather than relying on temporal coupling
- Add is_registered() public query method on ThrottleManager
- Expand TrackingSemaphore docstring noting CPython _value dependency

Made-with: Cursor

* fix test

* feat: AIMD throttle refinements — cascade dampening, ceiling stabilization, early shutdown exclusion

Dampen aggressive cascade 429 reduction (only the first 429 in a burst
reduces the limit; subsequent cascade 429s release permits without
further reduction). Add rate_limit_ceiling with configurable overshoot
to stabilize concurrency instead of sawtooth recovery. Exclude
ModelRateLimitError from early shutdown error rate. Consolidate AIMD
defaults into ThrottleConfig ClassVar constants as single source of
truth, and update ThrottleManager to accept ThrottleConfig directly.
Update architecture notes for PR-6.

Made-with: Cursor

* fix: address PR-6 review feedback on retry boundary, capacity polling, and test coverage

- Keep 429 in transport retry list for sync-mode clients (no salvage
  queue); only strip for async-mode where AIMD + salvage handles it.
  Add strip_rate_limit_codes kwarg to create_retry_transport; wire
  through HttpModelClient based on concurrency mode.
- Return CAPACITY_POLL_INTERVAL (50ms) instead of default_block_seconds
  (2s) from try_acquire when at capacity but not rate-limited, so
  callers poll responsively when a slot could free in milliseconds.
- Document registration ordering invariant on create_model_client's
  throttle_manager param.
- Add semaphore permit assertions to test_scheduler_llm_bound_one_way_handoff
  matching the non-LLM counterpart test.
- Update architecture notes for all changes.

Made-with: Cursor

* add missing test

* fix: consolidate ThrottleConfig and improve naming

Move ceiling_overshoot from ThrottleManager constructor kwarg into
ThrottleConfig (single source of truth for all AIMD tuning knobs).
Rename block_seconds → cooldown_seconds and block_duration →
cooldown_duration throughout for clarity. Remove DEFAULT_CEILING_OVERSHOOT
module constant from throttle_manager.py.

Made-with: Cursor

* fix: annotate state capture invariant in acquire_sync/acquire_async

Add inline comment explaining why the captured DomainThrottleState
reference is safe to reuse in the finally block (objects are never
replaced after creation).

Made-with: Cursor
2026-03-24 09:03:11 -06:00
Nabin Mulepati
015d4f188f
feat: Constrain HttpModelClient to single concurrency mode... (#439)
* feat: constrain HttpModelClient to single concurrency mode

Constrain each HttpModelClient instance to sync or async at
construction time, eliminating dual-mode lifecycle complexity
that caused transport leaks and cross-mode teardown bugs.

- Add ClientConcurrencyMode StrEnum replacing Literal type alias
- Add concurrency_mode constructor param with mode enforcement
  guards on _get_sync_client / _get_async_client
- Simplify close()/aclose() to single-mode teardown (cross-mode
  calls are no-ops)
- Thread client_concurrency_mode through factory chain from
  DATA_DESIGNER_ASYNC_ENGINE env var
- Add ModelRegistry.arun_health_check() async mirror and wire
  async dispatch in ColumnWiseDatasetBuilder
- Make ensure_async_engine_loop public (used cross-module)
- Fix test helpers to derive concurrency mode from injected client
- Add PR-5 architecture notes

* fix: address review findings on HttpModelClient lifecycle

- Fix transport double-close in close()/aclose() by delegating
  teardown to the httpx client when one exists (if/elif pattern);
  only close transport directly if no client was ever created
- Reject mismatched client/mode injection in constructor (e.g.
  async_client on a sync-mode instance raises ValueError)
- Add 5-minute wall-clock timeout to future.result() in async
  health check dispatch
- Add constructor validation tests for both mismatch directions
- Update PR-5 architecture notes

Made-with: Cursor

* fix: address PR-5 review feedback on HttpModelClient lifecycle

- Type _transport as RetryTransport | None, removing type: ignore
  suppressions in close()/aclose()
- Make transport fully lazy (None by default) and accept optional
  transport constructor param so injected-client paths don't
  eagerly allocate an unused RetryTransport
- Cancel future on TimeoutError in async health check dispatch so
  timed-out coroutines don't linger on the shared event loop
- Set health check timeout to 180s (3 min) matching architecture
  notes
- Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the
  list parametrizes both sync and async mode tests
- Update architecture notes timeout from 180→300 back to 180 to
  match implementation

Made-with: Cursor

* fix: address Greptile review findings on health check methods

- Use bare `raise` instead of `raise e` in both run_health_check and
  arun_health_check to preserve original traceback frames
- Add GenerationType.IMAGE test coverage for sync and async health
  checks (stub-image config + generate_image/agenerate_image patches)

Made-with: Cursor

* Update packages/data-designer-engine/tests/engine/models/test_model_registry.py
2026-03-20 12:23:03 -06:00
Andre Manoel
7e812630cf
feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder (#429)
* feat: wire async task-queue scheduler into ColumnWiseDatasetBuilder

* chore: add async benchmark notebook and demo scripts

* fix: address all PR review comments on async builder integration

- Wire on_batch_complete through on_row_group_complete callback
- Mark trailing slots as dropped in replace_dataframe when processor filters rows
- Ensure checkpoint still runs when on_before_checkpoint raises
- Gate non-seed task dispatch on pre-batch completion
- Add public run_pre_batch_on_df to ProcessorRunner (replaces private _run_stage call)
- Add public is_column_complete_for_rg to CompletionTracker (replaces private _completed access)
- Type task_traces as list[TaskTrace] in results.py
- Add async_trace docstring to RunConfig
- Move module-level log into _build_async
- Add replace_dataframe unit tests (same-size, dropped rows, fewer rows)
- Assert on public outcomes in scheduler tests instead of private attributes
- Parametrize allow_resize validation tests
- Cache seed_cols before main loop
- Remove redundant disable_early_shutdown from AsyncTaskScheduler

* style: fix ruff format for lambda expression

* fix: address open review issues on async scheduler

- Flush completed row groups before breaking on early shutdown (data loss)
- Change error rate check from >= to > so disable_early_shutdown sentinel
  (1.0) never triggers at 100% failure rate
- Extract seeds-complete check into helper and call it in salvage rounds
  via _drain_frontier, with pre-batch gating, so pre-batch processor runs
  even when seed tasks succeed only after retry
- Fix is_column_complete_for_rg to check _batch_complete first, then verify
  all non-dropped rows for CELL_BY_CELL columns
- Replace O(|in-flight|) scan in _in_flight_for_rg with per-RG counter

* fix: sync pre-batch row drops to CompletionTracker and restore stderr safely

Pre-batch processors that filter rows marked them as dropped in
RowGroupBufferManager but not in CompletionTracker, causing unnecessary
LLM calls for rows that would be discarded at checkpoint time.

Also wrap the benchmark warmup stderr redirect in try/finally so stderr
is restored if _run_once raises.

* fix: prune _admitted_rg_ids on row group checkpoint

Prevents unbounded growth of the admission set across large runs.

* chore: remove demo/async files from PR

Dev-time benchmarks and manual test scripts - kept locally, not needed in the PR.

* fix: wire disable_early_shutdown into AsyncTaskScheduler

RunConfig.disable_early_shutdown was forwarded to the sync executor
but silently ignored in the async path. Now passed through to the
scheduler's _check_error_rate.

* test: add e2e test for async engine concurrency

Verifies the async scheduler dispatches independent LLM columns
concurrently by checking for overlapping task trace intervals.
Uses a wide DAG (sampler -> 2 parallel LLM columns) with 2 records.
Requires NVIDIA_API_KEY.

* fix: drop row group on on_before_checkpoint failure instead of writing unprocessed data

Matches on_seeds_complete failure behavior and avoids silently
checkpointing unfiltered rows when a post-batch processor fails.

* fix: skip on_before_checkpoint when no POST_BATCH processors configured

Avoids unnecessary DataFrame round-trip for every row group in the
common case where no post-batch processors exist.

* fix: address remaining review nits from nabinchha and greptile summary

- Gate on_seeds_complete on PRE_BATCH processors (matches on_before_checkpoint pattern)
- Cache seed_cols as instance attr instead of recomputing in _dispatch_seeds
- Iterate list(self._active_rgs) snapshot in _run_seeds_complete_check
- Add logger.debug to telemetry except block
- Add design comment on on_before_checkpoint failure drop behavior
- Rename row_group param to row_group_index in is_column_complete_for_rg
- Document rg_id as current_batch_number equivalence
- Use mock.patch.object in e2e test instead of direct mutation
- Add max(0, ...) floor guard on _in_flight_counts decrement
- Rename _ensure_async_engine_loop to public ensure_async_engine_loop
- Move AsyncTaskScheduler import to module level in integration tests

* fix: preserve async callback contract and e2e setup

* fix: prune _seeds_dispatched_rgs and _pre_batch_done_rgs on checkpoint

* refactor: consolidate per-RG state into _RowGroupState dataclass

Replace 5 independent collections (_active_rgs, _admitted_rg_ids,
_seeds_dispatched_rgs, _pre_batch_done_rgs, _in_flight_counts) with a
single _rg_states dict keyed by row group ID. Cleanup is now a single
`del` instead of N separate discards, eliminating the class of bugs
where one collection is missed during row group teardown.

* fix: skip checkpoint and callbacks when on_before_checkpoint fails

When on_before_checkpoint raises and all rows are dropped, the code
previously fell through to checkpoint_row_group and on_row_group_complete,
sending a spurious progress notification for a batch with zero records.
Now gates both on a `dropped` flag so they are skipped after failure.

* fix: snapshot dropped rows before await in _run_batch and sync tracker on checkpoint failure

Two fixes:
- _run_batch: snapshot dropped rows before `await agenerate` so the
  row-count expectation matches batch_df. Concurrent tasks can drop rows
  during the await, causing a spurious ValueError that would drop the
  entire row group. Write-back now re-checks is_dropped to skip rows
  dropped mid-flight.
- _checkpoint_completed_row_groups: add tracker.drop_row alongside
  buffer_manager.drop_row when on_before_checkpoint fails, keeping
  both in sync.

* feat: sliding window error rate and out-of-order row group completion test

Replace cumulative error counters with a deque-based sliding window so
that early transient failures do not permanently inflate the error rate
in long-running jobs. Add tests for the sliding window recovery path
and for deterministic out-of-order row group checkpoint ordering.

* fix: use real time delays in out-of-order completion test

asyncio.sleep(0) interleaving is not deterministic across Python
versions. Switch to asyncio.sleep(num_records * 0.02) so the smaller
row group genuinely finishes seeds first regardless of event loop
scheduling.

* fix: prevent ZeroDivisionError when shutdown_error_window is 0

Change RunConfig.shutdown_error_window constraint from ge=0 to ge=1
so the sliding window denominator is never zero.

* fix: address Greptile review nits in async_scheduler

- Move del _rg_states inside try/finally so semaphore is always released
- Add exc_info=True to pre-batch failure log for consistent tracebacks
- Short-circuit _check_error_rate when _early_shutdown already set

* fix: address Greptile summary findings

- Remove duplicate async engine log in build() (kept in _build_async)
- Guard _run_seeds_complete_check with has_pre_batch at both call sites
- Change error rate comparison from > to >= to match sync path semantics
2026-03-20 13:05:09 -03:00
Eric W. Tramel
a0fb04ee07
feat: agent rollout trace ingestion (#399) 2026-03-20 09:17:35 -04:00
Nabin Mulepati
a9a16ae61a
feat: Native Anthropic adapter with shared HTTP client infrastructure (#426)
* plans for model facade overhaul

* update plan

* add review

* address feedback + add more details after several self reviews

* update plan doc

* address nits

* Add cannonical objects

* self-review feedback + address

* add LiteLLMRouter protocol to strongly type bridge router param

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* simplify some things

* add a protol for http response like object

* move HttpResponse

* update PR-1 architecture notes for lifecycle and router protocol

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address PR #359 feedback: exception wrapping, shared parsing, test improvements

- Wrap all LiteLLM router calls in try/except to normalize raw exceptions
  into canonical ProviderError at the bridge boundary (blocking review item)
- Extract reusable response-parsing helpers into clients/parsing.py for
  shared use across future native adapters
- Add async image parsing path using httpx.AsyncClient to avoid blocking
  the event loop in agenerate_image
- Add retry_after field to ProviderError for future retry engine support
- Fix _to_int_or_none to parse numeric strings from providers
- Create test conftest.py with shared mock_router/bridge_client fixtures
- Parametrize duplicate image generation and error mapping tests
- Add tests for exception wrapping across all bridge methods

* Use contextlib to dry out some code

* Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity

- Parse RFC 7231 HTTP-date strings in Retry-After header (used by
  Azure and Anthropic during rate-limiting) in addition to numeric
  delay-seconds
- Clarify collect_non_none_optional_fields docstring explaining why
  f.default is None is the correct check for optional field forwarding
- Add tests for HTTP-date and garbage Retry-After values

* Address Greptile feedback: FastAPI detail parsing, comment fixes

- Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE
- Handle list-format detail arrays in _extract_structured_message for
  FastAPI/Pydantic validation errors
- Document scope boundary for vision content in collect_raw_image_candidates

* add PR-2 architecture notes for model facade overhaul

* save progress on pr2

* small refactor

* address feedback

* Address greptile comment in pr1

* refactor ProviderError from dataclass to regular Exception

- Replace @dataclass + __post_init__ with explicit __init__ that calls
  super().__init__ properly, avoiding brittle field-ordering dependency
- Store cause via __cause__ only, removing the redundant .cause attr
- Update match pattern in handle_llm_exceptions for non-dataclass type
- Rename shadowed local `fields` to `optional_fields` in TransportKwargs

* Address greptile feedback

* PR feedback

* track usage tracking in finally block for images

* pr feedback

* add native OpenAI adapter with retry and throttle infrastructure

- Implement OpenAICompatibleClient using httpx with RetryTransport
- Add ThrottleManager with AIMD concurrency control and structured logging
- Route provider_type=openai to native adapter in client factory
- Add extract_reasoning_content helper for vLLM field migration
- Make ModelRegistry own ThrottleManager and RetryConfig explicitly
- Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override

Made-with: Cursor

* Self CR

* fix claude slop

* Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler

* wrap facade close in try/catch

* clean up stray params

* fix: address review findings from model facade overhaul PR3

- Fix metadata bug: drop unknown kwargs instead of passing them to
  ChatCompletionRequest (which has no metadata field), preventing a
  runtime TypeError.
- Lazy-init httpx clients: sync and async clients are created on first
  use instead of eagerly in the constructor, with constructor injection
  for testability.
- Remove defensive getattr on httpx.Response.status_code (always present).
- Add comment clarifying throttle manager wiring is deferred.
- Refactor tests to use constructor injection instead of private attribute
  mutation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix stray inclusion of metadata

* small regression fix

* address more feedback

* self review

* Fixes

* new test for aimd lifecycle

* update plan docs

* update plans with refs to prs

* fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test lay init

* fix timeout for openaicompatibleadapter

* remove unused attr

* fix: address review findings from PR #402

- Guard reasoning_content fallback with isinstance(str) check to prevent
  non-string provider values from violating the return type contract
- Normalize provider_type comparison to lowercase so mixed-case configs
  (e.g. "OpenAI") route to the native adapter instead of silently falling
  through to the bridge
- Document register-before-acquire ordering invariant on ThrottleManager
- Add TODO to remove 429 from retryable_status_codes once ThrottleManager
  is wired via AsyncTaskScheduler (plan 346)

Made-with: Cursor

* Address pr feedback

* fix method order

* feat: native Anthropic adapter with image block translation

Implement AnthropicClient as a native httpx-based adapter for the
Anthropic Messages API, following the same patterns as
OpenAICompatibleClient. Route provider_type="anthropic" to the native
adapter in the client factory, with unknown providers falling through
to LiteLLMBridgeClient.

Key adapter behaviors:
- System message extraction to top-level `system` parameter
- OpenAI image_url content blocks translated to Anthropic image/source format
- tool_use / thinking content blocks normalized to canonical types
- x-api-key auth (not Bearer), anthropic-version header
- max_tokens defaults to 4096 when omitted
- stop mapped to stop_sequences
- Embeddings and image generation raise unsupported capability errors

Made-with: Cursor

* fix: exclude OpenAI-specific params from Anthropic payload and drop unnecessary re.DOTALL

Expand _ANTHROPIC_EXCLUDE to filter response_format, frequency_penalty,
presence_penalty, and seed from TransportKwargs forwarding. Remove the
unnecessary re.DOTALL flag from _DATA_URI_RE since base64 data never
contains newlines.

Made-with: Cursor

* Fix failing test

* fix: address PR #402 review findings

- Add POST to RetryTransport allowed_methods so retries actually fire
  for all adapter endpoints (chat, embeddings, images are all POST)
- Guard close()/aclose() with _init_lock to prevent TOCTOU race with
  lazy client initialization
- Detect "unsupported parameter" patterns in 400 responses so the
  native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST
- Return None from coerce_message_content for image-only content lists
  instead of leaking the Python repr via str(content)
- Restore per-request timeout forwarding for LiteLLMBridgeClient via
  _with_timeout helper (broken when "timeout" was added to _META_FIELDS)
- Normalize ModelProvider.provider_type to lowercase via field_validator
  so consumers don't need per-site .lower() calls
- Fix unclosed httpx.AsyncClient in lazy-init test

Made-with: Cursor

* updates to DRY out code between the two adapters

* refactor code to introduce HttpModelClient

* update tests

* fix: address PR #426 review findings

- Release _aclient in close() to prevent async connection pool leak
  when both sync and async clients were initialized
- Drop malformed image_url blocks (missing image_url key) instead of
  forwarding them unchanged to the Anthropic API
- Preserve image blocks in system messages by returning Anthropic
  block-list format when non-text content is present
- Rename extract_system_text to extract_system_content and add
  merge_system_parts helper for mixed string/block system parts

Made-with: Cursor

* fix: improve error classification and surface provider messages for 400s

- Handle /v1 in Anthropic endpoint gracefully to avoid path duplication
- Add QUOTA_EXCEEDED provider error kind for credit/billing failures
- Extend UNSUPPORTED_PARAMS detection for mutually exclusive params
- Surface raw provider message in formatted errors for 400 status codes
- Consolidate provider message helpers into single _attach_provider_message

Made-with: Cursor

* fix: address PR #426 review findings (round 2)

- Fix close() double-close of shared transport by closing self._transport
  directly instead of accessing private aclient._transport (critical)
- Add TODO for threading.Lock → asyncio.Lock split (plan-346)
- Remove unused _model_id from HttpModelClient and all callers
- Export AnthropicClient from adapters __init__.py
- Filter empty text blocks in translate_tool_result_content join
- Move mock helpers to conftest.py with consistent json.dumps text default
- Add __init__.py files to enable absolute imports from test conftest
- Add bridge env override test for anthropic provider
- Add ConnectionError and non-JSON response tests for AnthropicClient
- Assert secret_resolver.resolve called with correct key ref in factory tests

Made-with: Cursor

* Update license headers

* Fix anthropic tool call flow

* fix: add explicit UNSUPPORTED_CAPABILITY error mapping

- Add ModelUnsupportedCapabilityError so unsupported operations
  (e.g. Anthropic embeddings/image-generation) surface a specific
  error instead of falling through to generic ModelAPIError
- Forward the provider's operation-specific message in the cause
- Add parametrized test case for the new error path
2026-03-19 11:18:40 -06:00
Mike Knepper
12baad776d
feat: Refactor person data reading for client ddb connection control (#393) 2026-03-19 09:34:57 -05:00
Eric W. Tramel
c88508790f
test: trim FileSystemSeedReader follow-up coverage (#432) 2026-03-19 09:47:55 -04:00
Andre Manoel
61fa0150f7
fix: support nested field access in schema transform templates (#435)
* feat: support nested field access in schema transform templates

Enable {{ result.quality.score }} style dot notation in schema
transform Jinja2 templates, where result is a deserialized JSON column.

Previously, _json_escape_record flattened all dict values to escaped
JSON strings before Jinja2 saw them. This made the rendered output
valid JSON but prevented nested access since Jinja2 only saw strings.

The fix introduces TemplateValue, a wrapper that defers the choice
between "drill into nested dict" and "render as escaped string" to
template evaluation time. Jinja2 resolves dot notation via __getattr__
(returning a new TemplateValue for the nested value), and converts to
string via __str__ (delegating to a caller-provided str_fn). This is
necessary because plain dicts render as Python repr ({'key': 'val'})
which is invalid JSON - we need to control __str__ to produce properly
escaped JSON, and that requires a wrapper object.

Other Jinja2 consumers (prompt templates, expression columns) don't
need this - Jinja2 natively supports dot access on plain dicts via
getattr-to-getitem fallback, and plain str() is fine for text output.
Schema transform is unique because its output must be valid JSON.

* Address PR review comments

- Fix boolean serialization: add bool check before str in _escape_value_for_json
  to produce JSON 'true'/'false' instead of Python 'True'/'False'
- Add class-level _record_str_fn annotation to WithJinja2UserTemplateRendering
- Rename skip_record_sanitization to _skip_record_sanitization (underscore prefix)
  to signal internal-only usage, and document it in safe_render docstring
- Add defensive error handling in TemplateValue.__getitem__ and __iter__
- Promote test input data to parametrize column, removing brittle string scan

* Address second round of PR review comments

- Add __eq__ and __hash__ to TemplateValue so Jinja2 equality
  conditionals (e.g. {% if result.label == "excellent" %}) work
- Add inline comment explaining deliberate double-encode in
  _escape_value_for_json for dict/list values
- Default _record_str_fn to None at class level so accessing it
  before prepare_jinja2_template_renderer doesn't mask the real error

* refactor: replace TemplateValue with Jinja2 finalize hook

Use Jinja2's built-in finalize hook for value-to-string conversion
and a getattr override for dict-key-priority lookup, eliminating the
custom TemplateValue wrapper class entirely.

* docs: add missing param docs to prepare_jinja2_template_renderer
2026-03-18 15:28:18 -03:00
Eric W. Tramel
3d33680b7d
feat: support 1-to-many FileSystemSeedReader hydration (#424) 2026-03-17 20:56:19 -04:00
Nabin Mulepati
255e78da83
feat: Native OpenAI adapter with retry and AIMD throttle infrastructure (#402)
* plans for model facade overhaul

* update plan

* add review

* address feedback + add more details after several self reviews

* update plan doc

* address nits

* Add cannonical objects

* self-review feedback + address

* add LiteLLMRouter protocol to strongly type bridge router param

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* simplify some things

* add a protol for http response like object

* move HttpResponse

* update PR-1 architecture notes for lifecycle and router protocol

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address PR #359 feedback: exception wrapping, shared parsing, test improvements

- Wrap all LiteLLM router calls in try/except to normalize raw exceptions
  into canonical ProviderError at the bridge boundary (blocking review item)
- Extract reusable response-parsing helpers into clients/parsing.py for
  shared use across future native adapters
- Add async image parsing path using httpx.AsyncClient to avoid blocking
  the event loop in agenerate_image
- Add retry_after field to ProviderError for future retry engine support
- Fix _to_int_or_none to parse numeric strings from providers
- Create test conftest.py with shared mock_router/bridge_client fixtures
- Parametrize duplicate image generation and error mapping tests
- Add tests for exception wrapping across all bridge methods

* Use contextlib to dry out some code

* Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity

- Parse RFC 7231 HTTP-date strings in Retry-After header (used by
  Azure and Anthropic during rate-limiting) in addition to numeric
  delay-seconds
- Clarify collect_non_none_optional_fields docstring explaining why
  f.default is None is the correct check for optional field forwarding
- Add tests for HTTP-date and garbage Retry-After values

* Address Greptile feedback: FastAPI detail parsing, comment fixes

- Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE
- Handle list-format detail arrays in _extract_structured_message for
  FastAPI/Pydantic validation errors
- Document scope boundary for vision content in collect_raw_image_candidates

* add PR-2 architecture notes for model facade overhaul

* save progress on pr2

* small refactor

* address feedback

* Address greptile comment in pr1

* refactor ProviderError from dataclass to regular Exception

- Replace @dataclass + __post_init__ with explicit __init__ that calls
  super().__init__ properly, avoiding brittle field-ordering dependency
- Store cause via __cause__ only, removing the redundant .cause attr
- Update match pattern in handle_llm_exceptions for non-dataclass type
- Rename shadowed local `fields` to `optional_fields` in TransportKwargs

* Address greptile feedback

* PR feedback

* track usage tracking in finally block for images

* pr feedback

* add native OpenAI adapter with retry and throttle infrastructure

- Implement OpenAICompatibleClient using httpx with RetryTransport
- Add ThrottleManager with AIMD concurrency control and structured logging
- Route provider_type=openai to native adapter in client factory
- Add extract_reasoning_content helper for vLLM field migration
- Make ModelRegistry own ThrottleManager and RetryConfig explicitly
- Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override

Made-with: Cursor

* Self CR

* fix claude slop

* Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler

* wrap facade close in try/catch

* clean up stray params

* fix: address review findings from model facade overhaul PR3

- Fix metadata bug: drop unknown kwargs instead of passing them to
  ChatCompletionRequest (which has no metadata field), preventing a
  runtime TypeError.
- Lazy-init httpx clients: sync and async clients are created on first
  use instead of eagerly in the constructor, with constructor injection
  for testability.
- Remove defensive getattr on httpx.Response.status_code (always present).
- Add comment clarifying throttle manager wiring is deferred.
- Refactor tests to use constructor injection instead of private attribute
  mutation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix stray inclusion of metadata

* small regression fix

* address more feedback

* self review

* Fixes

* new test for aimd lifecycle

* update plan docs

* update plans with refs to prs

* fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test lay init

* fix timeout for openaicompatibleadapter

* remove unused attr

* fix: address review findings from PR #402

- Guard reasoning_content fallback with isinstance(str) check to prevent
  non-string provider values from violating the return type contract
- Normalize provider_type comparison to lowercase so mixed-case configs
  (e.g. "OpenAI") route to the native adapter instead of silently falling
  through to the bridge
- Document register-before-acquire ordering invariant on ThrottleManager
- Add TODO to remove 429 from retryable_status_codes once ThrottleManager
  is wired via AsyncTaskScheduler (plan 346)

Made-with: Cursor

* Address pr feedback

* fix method order

* Fix failing test

* fix: address PR #402 review findings

- Add POST to RetryTransport allowed_methods so retries actually fire
  for all adapter endpoints (chat, embeddings, images are all POST)
- Guard close()/aclose() with _init_lock to prevent TOCTOU race with
  lazy client initialization
- Detect "unsupported parameter" patterns in 400 responses so the
  native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST
- Return None from coerce_message_content for image-only content lists
  instead of leaking the Python repr via str(content)
- Restore per-request timeout forwarding for LiteLLMBridgeClient via
  _with_timeout helper (broken when "timeout" was added to _META_FIELDS)
- Normalize ModelProvider.provider_type to lowercase via field_validator
  so consumers don't need per-site .lower() calls
- Fix unclosed httpx.AsyncClient in lazy-init test
2026-03-17 09:26:58 -06:00
Andre Manoel
0b324478f1
feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine (#404)
* feat: add AsyncTaskScheduler and RowGroupBufferManager for async engine

* test: add retryable salvage and eager row-drop scheduler tests

Add two missing test cases for AsyncTaskScheduler:
- Transient 503 failure deferred to salvage round and recovered
- Downstream column never dispatched for rows dropped by upstream failure

Update plan checkboxes to reflect PR 3 completion status.

* fix: address Greptile review findings on scheduler and buffer manager

- Non-retryable batch/from_scratch failure now drops all rows in the
  row group so it reaches a terminal state and gets checkpointed
- Salvage rounds re-dispatch from_scratch tasks directly instead of
  relying on the frontier (which never contains them)
- Log error if scheduler exits with unfinished row groups
- update_batch raises ValueError on size mismatch
- Free _row_group_sizes entry on checkpoint
- Add row-count mismatch warning in _run_batch writeback
- Document intentional stateful lock ordering in _dispatch_seeds

* refactor: remove dead _seed_frontier method from CompletionTracker

* refactor: restore seed_frontier as public opt-in method on CompletionTracker

Keep the tracker self-contained for static introspection (capacity
planning, task enumeration) without requiring a scheduler. The scheduler
does not call it - it manages root dispatch directly for stateful locks
and multi-column dedup.

* fix: complete salvage path - stateful locks, frontier drain, checkpoint sweep

- Acquire stateful lock before re-dispatching from_scratch tasks in
  salvage (prevents RuntimeError on lock release)
- Replace single-pass salvage drain with _drain_frontier() that
  re-checks the frontier after each task completes (ensures downstream
  tasks enqueued during retry are dispatched)
- Add post-salvage checkpoint sweep via _checkpoint_completed_row_groups
  (row groups completed during salvage are now written to parquet)
- Extract _checkpoint_completed_row_groups helper, reused by both the
  main loop and post-salvage sweep

* fix: checkpoint per salvage round, simplify deferred list, fire on_complete for empty groups

- Move _checkpoint_completed_row_groups inside the salvage loop so
  completed row groups are flushed promptly between rounds
- Simplify _deferred from list[tuple[Task, int]] to list[Task] since
  the attempt count was unused (salvage_max_rounds bounds retries)
- Fire on_complete(None) when all rows in a row group are dropped so
  callers always receive a completion signal

* fix: address PR review - _is_retryable, semaphore safety, dispatched pruning

Replace string-matching _is_retryable with isinstance checks against
typed ModelXxxError exceptions. Add try/finally around checkpoint to
protect _rg_semaphore.release() from deadlocks. Prune completed tasks
from _dispatched to prevent unbounded growth. Re-register batch_alias
in salvage dispatch. Replace assert with ValueError in _run_cell.
Parameterize on_complete Callable signature. Add test for
on_complete(None) when all rows dropped.

* fix: guard against dispatching tasks for checkpointed row groups

When a from_scratch task fails non-retryably and all rows are dropped,
downstream full_column tasks become vacuously ready and enter the
frontier. If dispatched in the same loop iteration that checkpoints
the row group, the task's coroutine runs against a freed buffer
(KeyError). The guard in _execute_task_inner skips such tasks early,
and a `skipped` flag keeps them in _dispatched to prevent re-dispatch.

Also addresses remaining review feedback:
- agenerate({}) fallback now passes empty DataFrame
- dict comprehensions simplified to dict(row_groups)
- get_row return type annotated as dict[str, Any]
- configs parameter fully typed in test helper
- mock generators use self.config.name

* fix: raise on batch size mismatch, isolate checkpoint failures

- _run_batch: raise ValueError on row count mismatch instead of warning,
  matching the update_batch path and preventing silent data corruption.
- _checkpoint_completed_row_groups: catch per-RG exceptions so one failed
  checkpoint doesn't block subsequent row groups or leak semaphore slots.
2026-03-16 21:31:17 -03:00
Eric W. Tramel
28c8345909
feat: add built-in filesystem seed readers (#421) 2026-03-16 17:40:27 -04:00
Eric W. Tramel
5783435979
feat: Improve generation failure reporting for schema and timeout failures (#416) 2026-03-13 18:38:01 -04:00
Johnny Greco
26a9cf23ac
feat: normalize validator and constraint discriminators (#414)
* feat: normalize validator and constraint discriminators

* docs: add docstring and comment to Constraint base class

Address Greptile review feedback:
- Add docstring to Constraint noting it should not be instantiated directly
- Add comment explaining the rhs fallback behavior in the resolver

* refactor: restore ABC on Constraint base class

* refactor: add explicit None guard in constraint resolver

* Fix legacy numeric sampler constraint detection

* fix: address PR review feedback from nabinchha

- Guard _can_coerce_to_float against inf/nan strings
- Add -> None return type annotations to test functions
- Add clarifying comments to ColumnConstraintT vs ColumnConstraintInputT
- Add tests for tagged constraint round-trip and missing rhs validation
2026-03-13 17:34:23 -04:00
Nabin Mulepati
d6b8433e25
fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) (#412)
* fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409)

TransportKwargs.from_request() flattened extra_body keys into top-level
kwargs, causing LiteLLM to reject provider-specific params like
reasoning_effort via its per-provider allowlist validation.

Add a flatten_extra_body flag (default True for backward compat) so the
LiteLLM bridge can opt out and preserve extra_body as a distinct kwarg
that LiteLLM forwards without validation.

Made-with: Cursor

* fix: address PR #412 review comments

Update stale docstrings in TransportKwargs and facade.py to reflect the
new flatten_extra_body flag, and add an edge-case test for empty
extra_body with flatten disabled.
2026-03-13 13:35:32 -06:00