Commit graph

26 commits

Author SHA1 Message Date
Eric W. Tramel
61125a02d0 fix: align progress panel metrics
Render the progress legend as a stable table with live token-rate columns, attribute model usage to active generation columns across async bridge boundaries, and cancel the async scheduler cleanly on KeyboardInterrupt.

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
2026-05-21 13:37:53 -04:00
Nabin Mulepati
bd0410bb05
fix(engine): actionable error when a Jinja field is missing/None/empty (#633)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(engine): actionable error when a Jinja field is missing/None/empty

Empty-render and missing-attribute failures used to surface as the
generic "User provided prompt generation template is invalid." either
because `sanitize_user_exceptions` stripped the detail or because
Jinja's raw `UndefinedError` leaked through. Both now raise a new
`EmptyTemplateRenderError` carrying a row-level diagnostic that names
the offending chain and includes copy-pasteable Jinja conditional and
SkipConfig fix patterns.

Closes #629.

* fix(engine): address PR review feedback on EmptyTemplateRenderError

Addresses the open review comments on #633:

1. (Greptile P1) Gate expression in the suggested remediation template
   was one accessor too deep when the root variable was entirely absent
   from the record, causing the suggested fix to itself raise
   UndefinedError. Fall back to gating on the root name alone when
   sample_name is not in record.

2. (andreatgretel) The AST walker reported loop-local names as missing
   culprits (e.g. ``person`` in ``{% for person in people %}...{% endfor %}``).
   Filter extracted chains through ``meta.find_undeclared_variables`` to
   defer to Jinja's canonical scope tracking.

3. (andreatgretel follow-up) Empty collections used as loop iterables
   (``items=[]``) fell through to the no-culprit fallback. Add a new
   ``_CULPRIT_EMPTY_COLLECTION`` classification so they're surfaced.

4. Minor: add ``from exception`` to ``safe_render``'s UndefinedError
   re-raise for traceback consistency with the native engine path, and
   add a note on the load-bearing exception ordering in
   ``sanitize_user_exceptions``.
2026-05-20 09:51:21 -06:00
Nabin Mulepati
4b93f5b245
feat: let column configs declare all model aliases for the startup health check (#626)
* feat(engine): let column configs declare all model aliases for the startup health check

Plugin column configs that depend on more than one model alias (generator + judge,
critic, etc.) previously could not opt their secondary aliases into the standard
startup health check, and configs without a `model_alias` field crashed the
collection loop with AttributeError.

Add `SingleColumnConfig.get_model_aliases()` as the single override hook the
builder uses to enumerate aliases. The default returns the column's primary
`model_alias` (if any), so built-in LLM, embedding, and image columns work
unchanged. `CustomColumnConfig` overrides it to surface decorator-declared
aliases, replacing the special-case `isinstance` branch in the builder. Plugin
configs with multiple model fields override it to opt every endpoint into the
health check.

Fixes #606

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

* fix(config): forward empty model_alias to startup health check

SingleColumnConfig.get_model_aliases() used `if alias` to filter, which
also dropped empty-string aliases. Empty model_alias values are accepted
by the config model and previously reached run_health_check, where they
failed fast with "No model config with alias '' found!". Treating them
as "no model endpoints" silently delayed that error to first generation.

Use `alias is not None` so only a truly missing attribute skips the
health check, and add a regression test that exercises an empty-string
model_alias on a built-in config.

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>

---------

Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-11 11:33:50 -06:00
Andre Manoel
61cdeefb17
feat: make async engine the default execution path (#592)
Some checks failed
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish devnotes / deploy (push) Has been cancelled
* feat: make async engine the default execution path

The async engine has been hardening as opt-in for several releases. Make it
the default and address the prerequisites flagged for the flip.

Default flip
- DATA_DESIGNER_ASYNC_ENGINE defaults to "1" at both consumption sites
- Set DATA_DESIGNER_ASYNC_ENGINE=0 for one transitional release to opt out
- allow_resize=True still falls back to sync with a DeprecationWarning

Python 3.10 support
- Replace asyncio.TaskGroup (3.11+) in async_concurrency.py with
  gather-with-explicit-cancel; semantics preserved because _run_task already
  swallows its own exceptions and uses _shutdown_event for sibling cancellation
- Remove the sys.version_info < (3, 11) runtime guard
- Remove the matching pytest skipif so the executor tests run on 3.10 too

Derived timeouts (replaces two hardcoded 300s constants)
- ThrottleManager.acquire_sync/async default to timeout=None (no deadline)
  instead of DEFAULT_ACQUIRE_TIMEOUT=300; HTTP request timeout already bounds
  actual work, queue waits scale with provider speed and AIMD
- _AsyncBridgedModelFacade derives the sync->async bridge timeout from the
  model's inference_parameters.timeout and the call's max_correction_steps;
  one knob (per-model timeout) drives both deadlines, no new config surface
- Add ModelFacade.request_timeout property so the bridge can read the
  effective timeout the client is configured with

Root-cause surfacing
- AsyncTaskScheduler captures the first non-retryable error and exposes it
  via first_non_retryable_error
- Interface threads it through DataDesignerGenerationError when 0 records
  are produced without early-shutdown, so deterministic failures (e.g. bad
  seed sources) surface their original message instead of a wrapped
  FileNotFoundError on the parquet path

Tests
- New: throttle no-deadline default behavior (sync+async), parametrized
  derived bridge timeout, restored async_concurrency tests on 3.10
- Updated: test_dataset_builder.py uses an autouse fixture to pin its
  Mock-based tests to the sync engine they cover; existing bridge tests
  set facade.request_timeout for the new derivation

Docs
- Replace the stale LiteLLM security notice in README with a short
  async-default heads-up and link to the migration guide
- Add docs/migration-async-default.md covering per-model timeouts,
  custom-column thread safety, mocking model calls, run outcomes, and
  the opt-out
- Append a short Update section to the async-all-the-way-down dev note

* test: extract _compute_bridge_timeout helper for direct testing

The parametrized bridge-timeout test was patching ``concurrent.futures.Future.result``
to capture the timeout the bridge passed in. That reaches into stdlib internals
(DEVELOPMENT.md "Mock at boundaries: Keep mocking shallow") and the ``ids=`` argument
on the parametrize was missing.

Extracts the formula into a module-level ``_compute_bridge_timeout`` helper. The test
now calls the helper directly with no mocking, and the parametrize gets readable ids.
Behavior is unchanged.

* test(e2e): align demo plugins with async engine contracts

The e2e demo plugins exercise plugin discovery and full DD lifecycle. Two
of them were written against sync-engine semantics that the async engine
restricts:

- DemoColumnGeneratorImpl was a ColumnGeneratorFullColumn with no
  required_columns. The async engine routes ``no-upstream`` columns
  through the from-scratch path, which passes an empty DataFrame to
  generators that aren't FromScratchColumnGenerator subclasses. The
  generator then produces 0 rows and the scheduler raises
  ``update_batch received 0 values``. Switching the plugin to
  FromScratchColumnGenerator with generate_from_scratch(num_records)
  matches what the plugin actually does (produces a constant column
  without input) and works on both engines.

- RegexFilterProcessor implemented process_before_batch with row-count
  changes. The async engine enforces row-count invariance in pre- and
  post-batch processor stages by design. Moving the filter to
  process_after_generation preserves the plugin's purpose (regex-based
  row filtering) at a stage that supports row-count changes on both
  engines. Test assertions check the final dataset, so the stage shift
  is transparent.

Both changes are demo-plugin updates only; no production code change.

* fix: address Codex review findings on async-default flip

Three bugs and two test-quality concerns surfaced by an independent review of
the prior commits. Each was real and worth fixing in the flip PR.

Bug fixes
- Sync-fallback path was creating async-only model clients. The default flip
  meant ``client_concurrency_mode = ASYNC`` for every default run, but the
  ``allow_resize=True`` path falls back to the sync engine — sync ``model.generate()``
  calls then hit ``SyncClientUnavailableError``. The resolution decision now
  lives at the DataDesigner interface level via
  ``_resolve_client_concurrency_mode``: it considers both the env var and the
  config (allow_resize forces sync clients) and is passed explicitly to
  ``create_resource_provider``. Direct callers of the factory still get the
  env-var default.

- Sync→async bridge timeout ignored the per-call ``timeout=`` override. A
  custom column calling ``model.generate(timeout=600)`` against a slow endpoint
  was being cancelled at the model-config default, not 600s. The bridge now
  prefers ``kwargs.get("timeout")`` over ``facade.request_timeout``.

- Bridge timeout formula missed ``max_conversation_restarts``. One logical
  generation can do ``(1 + max_conversation_restarts) × (1 + max_correction_steps)``
  HTTP requests; the formula now multiplies both, matching the worst-case
  attempt budget.

Engine routing fix (also surfaced by failing e2e plugin tests)
- ``_run_from_scratch`` else-branch passed an empty DataFrame to non-FromScratch
  generators classified as seeds (no upstream columns), so ``ColumnGeneratorFullColumn``
  with no required_columns produced 0 rows for an ``rg_size``-row buffer. Now
  passes an ``rg_size``-row snapshot of the row-group buffer, mirroring the
  sync engine's FULL_COLUMN contract.
- The earlier ``DemoColumnGeneratorImpl`` workaround (rewrite as ``FromScratchColumnGenerator``)
  is reverted; the engine fix subsumes it. The processor-plugin fix
  (``process_after_generation`` for the regex filter) stays — pre-batch
  row-count change is intentionally rejected by the async engine.

Test improvements
- Throttle no-deadline tests are parametrized over ``(timeout=0.0, raises)``
  and ``(timeout=None, waits)``, pinning that ``None`` is genuinely distinct
  from any finite default. Sync and async counterparts mirror.
- New regression tests for ``first_non_retryable_error`` surfacing covering
  both load-raises and load-returns-empty paths, asserting the original
  exception is chained via ``__cause__`` and that the typed
  ``DataDesignerEarlyShutdownError`` doesn't fire in this branch.
- New parametrized regression test for ``_resolve_client_concurrency_mode``
  covering all four (env × allow_resize) combinations.
- New parametrized test for the per-call ``timeout=`` override flowing into
  the bridge timeout calculation.
- Bridge formula tests extended with ``max_conversation_restarts`` cases.

* test: trim redundant parametrize cases in async-default tests

Three parametrize cases were duplicating coverage already provided by
existing standalone tests:

- ``test_acquire_*_timeout_branches`` parametrized over ``(0.0, raises)``
  and ``(None, waits)``. The ``raises`` half duplicates
  ``test_acquire_*_raises_timeout_when_at_capacity``. Replaced with two
  focused ``..._default_no_deadline_waits_for_release`` tests covering
  only the no-deadline branch.

- ``test_resolve_client_concurrency_mode_matches_engine_choice`` had four
  cases. The ``async-off + allow-resize`` case asserts ``SYNC`` because the
  env var alone forces it; the allow_resize input is moot. Dropped.

- ``test_async_bridge_honors_per_call_timeout`` had three cases. The
  "override below floor" case cross-products the per-call override flow
  with the floor-clamping behavior already covered by
  ``test_compute_bridge_timeout``. Dropped.

Net: -25 lines of test code with no loss of essential coverage.

* docs: fold migration page into existing concept docs

The standalone ``Migrating to the async default`` page didn't fit the
existing docs style — present tense, behavior over comparisons, content
in the natural concept home. Folding it in:

- ``architecture-and-performance.md`` gets a new ``Async Engine`` section
  covering per-model timeouts, run outcomes (partial completion +
  ``DataDesignerEarlyShutdownError``), and the transitional opt-out.
  Three stale ``async engine is landing soon`` callouts updated to
  reflect the flip.
- ``custom_columns.md`` gets two short notes: a thread-safety callout
  near Generation Strategies, and a mocking-with-spec note in
  Development Testing.
- ``async-all-the-way-down.md`` Update section now points at the new
  arch-and-perf section.
- README heads-up links to the same anchor.
- ``migration-async-default.md`` removed; mkdocs.yml entry dropped.

* docs: frame Execution Model as sync-engine specifics

Small targeted edits to make the user-facing concept docs consistent
with the post-flip state. No restructuring.

- ``architecture-and-performance.md``: the ``Execution Model`` callout
  now opens with two engines, links to the new ``Async Engine`` section,
  and frames the existing column-at-a-time description as sync-engine
  semantics. The ``Step 2: Process columns sequentially`` paragraph notes
  the async engine relaxes this. The ``Key Concepts`` table differentiates
  per-engine for ``Batching`` and ``Sequential columns``; ``Parallel cells``
  is the same on both.
- ``processors.md``: added a warning callout about the async engine's
  row-count invariance in pre- and post-batch stages, with the guidance
  to use ``process_after_generation()`` for row-filtering or expansion.

* fix: address review nits from PR #592 (Nabin)

Four targeted fixes from the review.

Worth-addressing (warning):
- ``test_acquire_async_default_no_deadline_waits_for_release`` was
  spawning the release task without holding a strong reference. The
  loop's weak-ref bookkeeping could GC it before the inner ``await``
  observes the release, producing a CI flake. Hold the task and
  ``await`` it in ``finally``.

Take-it-or-leave-it (applied):
- Root-cause error surfacing now includes the exception type name:
  ``f"🛑 {type(root_cause).__name__}: {root_cause}"`` so users see
  ``ValueError: ...`` instead of just the message string. The
  ``__cause__`` chain is preserved either way.
- Drop the defensive ``getattr(c, "allow_resize", False)`` in
  ``_resolve_client_concurrency_mode`` — every member of
  ``ColumnConfigT`` inherits ``allow_resize: bool = False`` from
  ``SingleColumnConfig``.
- One-line comment near the root-cause surfacing branch noting that
  ``actual_num_records == 0`` is async-only (sync runs leave it at
  ``-1``), so the branch is async-only by construction.

Not addressed in this PR (filing as follow-ups):
- ``SYNC_BRIDGE_TIMEOUT = 300`` still hardcoded in
  ``column_generators/generators/base.py:_run_coroutine_sync``. That
  bridge has no model-facade context to derive a timeout from, so the
  fix is a structural refactor outside this PR's scope.
- First-error capture loses subsequent-error context. The "first wins"
  heuristic is documented; richer aggregation is a follow-up.

* fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync

This was the third hardcoded 300s timeout (Nabin flagged it on PR #592).
The path is the generic sync→async bridge in ``ColumnGenerator.generate()``:
when a subclass overrides only ``agenerate()``, the sync entry point runs
the coroutine in a background thread.

Same philosophy we applied to the throttle queue wait elsewhere in the
PR: a defensive deadline on top of work that's already bounded by the
HTTP timeout doesn't add safety, it just produces spurious failures on
slow self-hosted endpoints. Drop the constant, the timeout exception
handling, and the ``timed_out`` bookkeeping. ``pool.shutdown(wait=True)``
becomes the simple cleanup.

Tests in ``test_async_generators.py`` exercise the happy path only and
don't depend on the timeout firing.

* Revert "fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync"

This reverts commit 7a0b77d44c.

* docs+feat: deprecate the sync-engine opt-out path

Nabin asked whether the docs should adopt explicit "deprecation" language
on the opt-out path. Doing both:

- Doc: ``architecture-and-performance.md``'s ``Opting out`` section now
  uses an ``!!! warning "Deprecated"`` admonition that names the env var
  as a deprecated escape hatch and notes the run-time warning.
- Code: ``DataDesigner._resolve_client_concurrency_mode`` emits a
  ``DeprecationWarning`` when ``DATA_DESIGNER_ASYNC_ENGINE=0`` is detected.
  Same precedent as the existing ``allow_resize=True`` warning. Auto-fallback
  via ``allow_resize`` does not double-warn here; the builder layer emits
  its own warning later.
- Test: parametrized regression now asserts ``pytest.warns(DeprecationWarning)``
  on the opt-out branch and treats any warning on the async-on branches as
  a failure (``simplefilter("error")`` inside the ``catch_warnings`` block).

* fix: emit logger.warning alongside DeprecationWarning on env-var opt-out

Parity fix from Nabin's re-review of PR #592. The ``allow_resize=True``
auto-fallback path in ``_resolve_async_compatibility`` emits both a
``logger.warning("⚠️ ...")`` and a ``DeprecationWarning``. The new
``DATA_DESIGNER_ASYNC_ENGINE=0`` opt-out path was only emitting the
``DeprecationWarning``, leaving users who run with default warning
filters silenced and inconsistent with the established precedent.

Match the pattern: same message body, both signals, same stacklevel.

* docs: breadcrumb explaining why SYNC_BRIDGE_TIMEOUT survives PR #592

Nabin's re-review pointed out that ``base.py`` is the lone place where
the 300s pattern survives, while ``custom.py`` and ``throttle_manager.py``
both retired theirs. Without a comment, a future reader (or a lint sweep)
could mistake this for an oversight and "consistency-fix" it the wrong way.

Add a short note at the constant naming the two retired siblings, the
reason this one stayed (no ``ModelFacade`` context to derive from), and
the fact that it's tracked for a structural follow-up.
2026-05-04 16:22:13 -03:00
Andre Manoel
47c72b3d87
fix(async): pack of fixes for async engine under degraded providers (#585)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(async): exclude all retryable errors from early-shutdown gate

The gate previously only excluded `ModelRateLimitError`, leaving
`ModelTimeoutError`, `ModelInternalServerError`, and
`ModelAPIConnectionError` to count toward the sliding-window error
rate. Under provider degradation these errors cluster in time
(concurrent in-flight requests time out together), so 5/10 in a row
is easy and trips the gate even when salvage could recover the rows.

Refs #575.

* feat(async): WARN log when provider showing degraded performance

Diagnostic A/Bs against build.nvidia.com showed runs failing silently
under provider degradation - no log indication that retryable errors
were piling up until the early-shutdown gate fired (or, post-fix,
until salvage exhaustion). Surfacing this earlier helps users
distinguish "DataDesigner is broken" from "the upstream provider is
slow today."

Tracks a separate sliding window over retryable-vs-not for every task
outcome (independent of the early-shutdown gate's window) and emits a
throttled WARN when the rolling fraction crosses the threshold.

Refs #575.

* fix(async): salvage partial row groups on early shutdown

Before: when the early-shutdown gate fired, any row group still in
flight stayed in `_rg_states` un-checkpointed. The buffer manager
later raised `FileNotFoundError` when the builder tried to read the
finalized parquet. User-visible result: `0 records produced`.

After: a new `_finalize_after_shutdown` step runs in `run()`'s finally
block, after `_cancel_workers` has drained in-flight tasks (Codex
caveat: in-flight `from_scratch`/`batch` tasks must not be allowed to
write into a buffer that's being finalized). For each remaining row
group it drops rows that aren't fully complete, then delegates to the
existing `_checkpoint_completed_row_groups` so the buffer manager's
zero-survivor handling (skip empty parquet, free buffer) kicks in
unchanged.

Also surfaces partial completion as a structured signal: scheduler
exposes `early_shutdown: bool` and `partial_row_groups: tuple[int, ...]`
properties so callers can detect partial completion programmatically
rather than parsing log lines. Builder uses this to emit a more
specific WARN distinguishing early shutdown from non-shutdown drops.

Refs #575.

* fix(throttle): reset consecutive_429s on non-rate-limit failure

In `release_failure`, the cascade counter wasn't reset, so a sequence
like 429 → 500 → 429 was treated as 2 consecutive 429s. The cascade
counter feeds AIMD's reduce-once-per-cascade logic; the second 429
should start a fresh cascade and trigger another concurrency reduction,
but currently doesn't.

Standalone bug surfaced during #575 investigation; not on the failure
path that drives the gate-trip outcome but worth fixing while we're
in this code.

* fix(custom): preserve retryability through CustomColumnGenerator wrap

A real-workload run of #575 showed the early-shutdown gate still trips
even with the gate-exclusion fix in place: the trigger is 10 timeouts
inside Anonymizer's QA-repair custom columns, all wrapped in
CustomColumnGenerationError (non-retryable) by the catch-all in
CustomColumnGenerator.

Two fixes here:

1. Re-raise RETRYABLE_MODEL_ERRORS unchanged before the wrap so the
   scheduler's _is_retryable correctly classifies them.

2. Surface _AsyncBridgedModelFacade timeouts as ModelTimeoutError
   instead of stdlib TimeoutError. Without this the sync bridge times
   out as the wrong exception type and is still classified non-retryable
   even after fix #1.

Also moves _RETRYABLE_MODEL_ERRORS from async_scheduler to
models/errors as the public RETRYABLE_MODEL_ERRORS tuple - both the
scheduler and the wrap site need it, and models/errors is the
appropriate home alongside the error class definitions.

Refs #575.

* feat(interface): typed DataDesignerEarlyShutdownError on zero-record runs

When the async scheduler hits early shutdown and produces zero
records, the buffer manager skips writing parquet (correctly), so
ArtifactStorage.load_dataset_with_dropped_columns() raises
FileNotFoundError. Previously this surfaced as a generic
DataDesignerGenerationError wrapping the FileNotFoundError, which is
ambiguous (could be missing files for any reason).

This commit:

- Adds DataDesignerEarlyShutdownError as a subclass of
  DataDesignerGenerationError so existing handlers still match while
  callers that want to react programmatically (retry on different
  alias, surface a degraded-provider message, etc.) can catch the
  specific type.
- Plumbs the scheduler's structured signals (early_shutdown,
  partial_row_groups) up through the builder so they're available at
  data_designer.create() time without re-introspecting the scheduler.
- create() raises the typed error in both failure modes (load fails
  or empty DataFrame returned) when builder.early_shutdown is True.

Refs #575.

* fix(async): emit first degraded-provider WARN regardless of clock state

  Initialize _last_degraded_warn_at to -inf so the first WARN is always
  emitted. The previous initialization to 0.0 suppressed the first WARN on
  fresh CI runners where time.monotonic() returns a small value (system
  boot uptime), making the throttle interval check (now - 0.0 < interval)
  true on the first attempt.

* fix(async): address review findings on early-shutdown salvage PR

Five real correctness issues caught in review of the original PR, plus a
few smaller cleanups and test simplifications.

Throttle - cascade reset (regression of existing AIMD invariant):
release_failure() now resets consecutive_429s only when in_flight == 0.
Resetting unconditionally broke "reduce once per cascade" when 429/500/429
arrived interleaved within a single in-flight burst - the second 429 was
treated as a new cascade and the limit got halved twice for what was
effectively one rate-limit event.

Interface - typed-error gating: DataDesignerEarlyShutdownError now fires
only when early_shutdown is true AND actual_num_records == 0. Without
this, a partial-salvage run that fails to load for unrelated reasons
(corrupt parquet, schema drift, disk hiccup) was misdiagnosed as "zero
records produced," hiding the real cause.

Async - WARN window scope: the degraded-provider warning was fed by every
task outcome, including samplers and non-LLM customs. In realistic
pipelines (one model column, several non-model columns) the rate stayed
under threshold even when every model call was failing, silencing the
WARN exactly when it mattered. Now gated on is_llm.

Async/builder - signal preservation across raises: scheduler.early_shutdown
and partial_row_groups are captured in a try/finally around future.result(),
so a processor failure during the salvage path doesn't drop the
structured signal. Both build() and build_preview() now reset per-run
state at the start so reused builders don't leak prior-run flags.

Async - dead code: dispatch_error capture in run() was unread (the post-
finally check is unreachable on the exception path). Removed.

Smaller cleanups:
- early-shutdown WARN says "non-retryable error rate exceeded threshold"
- bridge timeout WARN demoted to debug (ModelTimeoutError already surfaces
  it; the throttled degraded-provider WARN is the user-facing signal)
- TODO note for threading degraded_warn_* through RunConfig
- doc note in _finalize_after_shutdown clarifying that pre-batch processor
  isn't re-run on partial-salvage row groups

Tests:
- new regression tests for the cascade burst case, partial-salvage error
  gating, and LLM-only WARN window
- direct unit test for _reset_run_state
- dedup via _make_storage / _seed_plus_cell_setup helpers
- WARN emission cases parametrized into a single test
- shared parametrize lists hoisted to module-level constants
- redundant cascade test dropped in favor of the more thorough drain
  variant; redundant healthy-baseline test folded into the zero-survivor
  test

* chore(async): address Nabin's review comments

Style cleanups, parametrization, docstring polish, and one consistency
fix in the typed-error path. All non-blocking ("Ship it (with nits)").

interface/data_designer.py:
- preview() now raises DataDesignerEarlyShutdownError when shutdown
  produced zero records (parity with create()), and also gates on
  actual_num_records == 0 so partial-salvage runs that fail to load
  don't get misdiagnosed
- create()'s defensive empty-DF guard mirrors the load-failure guard
  with the same actual_num_records == 0 check

async_scheduler.py:
- _record_retryable_outcome docstring clarifies that the call site
  filters by is_llm; the function alone reads as if every outcome feeds
  the window

dataset_builder.py:
- moved _reset_run_state() down past the public methods to match the
  project's public-before-private convention

test_custom.py:
- flattened TestAsyncBridgedModelFacade class into module-level test
  functions (matches the rest of the file)
- hoisted inline imports (asyncio, threading, patch, _AsyncBridgedModelFacade,
  SyncClientUnavailableError) to top of file
- driven retryable-error parametrize off RETRYABLE_MODEL_ERRORS directly
  instead of the hand-rolled factory list, so new retryable types pick
  up coverage automatically
- dropped the redundant "Sanity" block in test_async_bridge_timeout_raises_
  model_timeout_error - pytest.raises already enforces the type, the
  duplicate block was running the same slow scenario twice

test_async_scheduler.py:
- parametrize over RETRYABLE_MODEL_ERRORS directly (same as above)

test_data_designer.py:
- added preview-path tests for the typed-error and partial-salvage
  fall-through cases
- updated the existing empty-DF test to also patch actual_num_records=0
  (otherwise the new gating in the empty-DF guard skips the typed error)

* test(interface): consolidate create() error-dispatch tests into a matrix

Five separate tests (two existing, three new from earlier in this PR)
all probed the same dispatch logic in create(): "given a load outcome
and a builder state, which error type should fire?" Pulled them into a
single parametrized matrix indexed by (load_side_effect, early_shutdown,
actual_num_records).

Net result: 5 named tests → 1 parametrized test with 6 cells, and the
previously-missing empty_df + shutdown + partial salvage cell is now
covered.

Test names retain readable IDs (load_fails_shutdown_zero_records etc.)
so failures still pinpoint the exact case in pytest output.
2026-04-30 14:43:35 -03:00
Nabin Mulepati
05c2e8df2e
fix: normalize image_url blocks to OpenAI-compliant dict format (#577)
* fix: normalize image_url blocks to OpenAI-compliant dict format (#576)

ImageContext.get_contexts() produced bare-string and non-standard dict
shapes for image_url content blocks, which broke the native OpenAI
adapter (passes blocks through as-is) and only worked with Anthropic
by accident via defensive handling in the translation layer.

- Wrap all image_url values in {"url": ...} dict (OpenAI spec)
- Remove non-standard "format" key from base64 dicts
- Tighten Anthropic translate_image_url_block to require dict input

Fixes #576

Made-with: Cursor

* fix: reject malformed image_url blocks instead of silently dropping them

translate_image_url_block now raises TypeError when image_url is not a
dict. Since all image_url blocks are constructed internally, a bare
string indicates an internal bug and should fail loudly.

Made-with: Cursor

* address review: tighten return type, add OpenAI + data-URI tests

- Narrow _auto_resolve_context_value return type to dict[str, str]
- Add OpenAI-client regression tests for image_url dict passthrough
- Cover both bare-URL and bare-data-URI rejection in Anthropic tests

Made-with: Cursor
2026-04-28 09:35:27 -06:00
Eric W. Tramel
8be4ff787f
feat: add RunConfig jinja rendering engine (#557) 2026-04-17 15:06:27 -04:00
Andre Manoel
a965bc1542
fix: bridge model.generate() to agenerate() for custom columns in async engine (#545)
* feat: bridge model.generate() to agenerate() for custom columns in async engine

Custom column generators that call model.generate() fail under the async
engine because the sync HTTP client is unavailable. Add an
_AsyncBridgedModelFacade proxy in _build_models_dict() that intercepts the
sync-client RuntimeError and schedules agenerate() on the engine's persistent
event loop via run_coroutine_threadsafe. Includes a deadlock guard for async
custom columns running on the event loop.

* refactor: wrap facades at sync call site, not in _build_models_dict

Move _AsyncBridgedModelFacade wrapping from _build_models_dict() into
_invoke_generator_function() so the async path gets raw facades. The
bridge proxy is only needed for sync custom columns; async columns
already have direct access to model.agenerate().

* fix: address review feedback - typed exception, timeout cleanup, kwargs test

- Introduce SyncClientUnavailableError so the facade catches by type
  instead of matching error strings (review comment #1)
- Add future.cancel() + logger.warning() on timeout to match the
  _run_coroutine_sync pattern in base.py (review comment #2)
- Assert kwargs forwarding in the async bridge test (review comment #4)

* fix: let SyncClientUnavailableError propagate through @catch_llm_exceptions

The decorator catches all exceptions and wraps them into DataDesignerError,
which prevented the async bridge proxy from ever seeing the original error.
Add an early match case that re-raises SyncClientUnavailableError directly.

* refactor: make SYNC_BRIDGE_TIMEOUT a public constant

Drop the underscore prefix since the constant is exported and used
across modules (base.py and custom.py).
2026-04-17 13:01:55 -03:00
Eric W. Tramel
28c8345909
feat: add built-in filesystem seed readers (#421) 2026-03-16 17:40:27 -04:00
Andre Manoel
8fff7c07fe
feat: add async generator migration with symmetric bridging and statefulness (#378)
* feat: add async generator migration with symmetric bridging and statefulness

- Symmetric generate/agenerate bridging in base ColumnGenerator
- is_stateful property; SeedDatasetColumnGenerator declares True
- Async wrappers for FromScratchColumnGenerator and ColumnGeneratorFullColumn
- Native async paths for ImageCellGenerator and EmbeddingCellGenerator
- CustomColumnGenerator.agenerate with full validation parity
- Extract _postprocess_result for shared sync/async output validation

* fix: avoid blocking caller on sync bridge timeout

Use explicit pool lifecycle instead of context manager so that
a TimeoutError releases the caller immediately via
shutdown(wait=False) rather than blocking on pool.__exit__.

* fix: widen agenerate type signature to match generate

Add @overload declarations so the base agenerate accepts both
dict and pd.DataFrame, mirroring the existing generate pattern.

* fix: ensure pool shutdown on sync bridge success path

The else clause after return was unreachable, leaking the
ThreadPoolExecutor on every successful call. Capture the result
first, shut down the pool, then return.

* fix: use try/finally for pool shutdown in sync bridge

Ensures ThreadPoolExecutor is shut down on all exit paths,
including non-TimeoutError exceptions from the coroutine.

* refactor: extract shared validation in ImageCellGenerator

Move duplicated input validation and prompt rendering into
_prepare_image_inputs, shared by generate and agenerate.

* refactor: extract shared input prep in EmbeddingCellGenerator

* address PR review feedback

- add _is_overridden helper for symmetric generate/agenerate guards
- move defensive .copy() into base agenerate, remove subclass overrides
- re-raise as builtin TimeoutError for Python 3.10 compat
- rename is_stateful to is_order_dependent with improved docstring
- replace brittle .fget test with object.__new__
- add async tests for ImageCellGenerator and EmbeddingCellGenerator
2026-03-11 14:20:09 -03:00
Andre Manoel
88989d1854
fix: replace removed DuckDB record_batch() with to_arrow_reader() (#380)
DuckDB 1.5.0 (released 2026-03-09) removed the record_batch() method
from the Python Relation API, breaking SeedDatasetColumnGenerator.

Migrate to the stable to_arrow_reader() API and bump the minimum
DuckDB version to >=1.5.0.

Fixes #379
2026-03-09 14:04:34 -03:00
Nabin Mulepati
8f7a72094a
feat: auto-detect ImageContext format for image-to-image generation (#342)
* updates to support image->image

* update notebooks

* regen colab notebooks

* simplify tests
2026-02-20 15:54:42 -05:00
Andre Manoel
70dc48884e
feat: add allow_resize for 1:N and N:1 generation patterns (#286)
* feat: add allow_resize for 1:N and N:1 generation patterns

Adds support for generators that produce a different number of records
than the input (expansion or retraction). This addresses GitHub issue #265.

Changes:
- Add `allow_resize` parameter to `update_records()` in DatasetBatchManager
- Add `allow_resize` field to CustomColumnConfig
- Add validation requiring FULL_COLUMN strategy when allow_resize=True
- Track and report actual_num_records in metadata (may differ from target)
- Add logging when batch size changes
- Add example_allow_resize.py demonstrating the feature
- Add comprehensive tests

* docs: add allow_resize to custom columns documentation

* refactor: consolidate buffer API and elevate allow_resize to base config

- Merge update_records and replace_buffer into a single replace_buffer
  method with allow_resize parameter on DatasetBatchManager
- Move allow_resize field from CustomColumnConfig to SingleColumnConfig
  so plugins inherit it without needing a mixin
- Align example and logging with final CustomColumn API
- Parametrize resize tests and extract shared stub in test_columns

* test: add chained resize and multi-batch integration tests

- Add expand->retract->expand chaining test (single batch)
- Add multi-batch resize test verifying combined parquet output
- Update example to chain expand/retract/expand with preview+build
- Use 💥/✂️ emojis for resize logging (expand/retract)

* extend allow_resize to cell-by-cell (return dict or list[dict])

- Config: allow allow_resize with CELL_BY_CELL; relax validator
- Custom generator: accept dict | list[dict] when cell_by_cell + allow_resize;
  validate per row via _validate_cell_output
- Builder: collect results by index when cell allow_resize, flatten and
  replace_buffer; add _log_resize_if_changed and _column_display_name
- Docs: ALL_CAPS for strategies, simplify allow_resize table text
- Tests: parametrized preview and multibatch; factories with n param;
  _RESIZE_SPECS with inline factory calls; ids ordered like specs

* reorder allow_resize specs and add edge-case tests

- Rename specs: full_x3, cell_x2, cell_plus_full_chain; add cell_filter_odd,
  cell_drop_all to _RESIZE_SPECS
- Stubs before specs: _resize_full_keep_first, _resize_cell_expand,
  _resize_cell_filter_odd, _resize_cell_drop_all; drop cell factories
- Remove FULL/CELL constants; use GenerationStrategy.* in _RESIZE_SPECS
- Preview/multibatch parametrize: _preview and _multibatch ids; two full_x3
  multibatch cases (5_2, 4_2) first
- Handle all-batches-skipped in multibatch test (empty df when path missing)
- test_custom: add test_cell_by_cell_allow_resize_return_list_single (1:1 via list)

* tidy allow_resize: drop validator, shared stub, explicit flag

- Remove validate_allow_resize_requires_full_column from CustomColumnConfig
- Rename StubColumnConfigWithoutEmoji to StubColumnConfig in test_columns
- Pass allow_resize=False in _write_processed_batch replace_buffer call

* fix: add missing f prefix to error message in custom.py

* docs(plugins): add section on setting allow_resize=True for resize plugins

* fix: address PR review comments on allow_resize

- Replace getattr with direct attribute access where config is always
  SingleColumnConfig (custom.py, cell-by-cell path in builder)
- Keep getattr in _run_full_column_generator which also handles
  multi-column configs without allow_resize
- Restructure allow_resize validation branching in CustomColumnGenerator
- Fix error message wording: "key" -> "column"

* fix: remove duplicate tool_alias log, fix test docstring

- Remove tool_alias log from _setup_fan_out (callers already log it)
- Fix docstring: CELL_BY_CELL -> FULL_COLUMN in resize test factory

* fix: avoid duplicate undeclared-column warning in _validate_output

Inline the strip instead of delegating to _validate_cell_output,
which would log the same warning a second time.

* fix: use lazy.pd instead of pd for runtime pandas usage in tests

The pd import is under TYPE_CHECKING, so runtime calls need lazy.pd.
2026-02-18 18:39:31 -03:00
Johnny Greco
1439bbea7e
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time

Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.

Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations

Reduces CLI import-time from ~1.67s to ~0.46s.

* perf: defer pandas/numpy in io_helpers and add config_list benchmark

- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
  with module-level __getattr__ (for backwards-compatible external
  access / test mocks) and function-level imports in the 3 functions
  that actually use them (read_parquet_dataset, smart_load_dataframe,
  _convert_to_serializable). Importing io_helpers no longer triggers
  pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
  bodies to avoid loading repositories, Rich, and prompt_toolkit at
  module import time.
- Add `config_list` (data-designer config list) measurement to the
  CLI startup benchmark with isolated cold measurement in a separate
  venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.

* Refine lazy import usage and TYPE_CHECKING cleanup

* Run license header updater on PR-touched files

* fix: update sqlfluff mock target for lazy imports in test_sql

* perf: cache globals() in lazy __getattr__ to avoid repeated lookups

Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.

* perf: lazy CLI command loading and deferred heavy import evaluations

- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files

- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes

- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks

- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use

- Update test mock targets to patch at usage-site for module-level imports

* refactor: use direct pandas import in seed_source_dataframe

Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.

* update lazy import pattern

* update tests to use lazy import namespace

Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.

* tighten import perf test thresholds

Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.

* document pandas import requirement

Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.

* increase timeout time

* use lazy pandas imports in visualization tests

- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted

* fix lazy pandas runtime usage and preview mocks

Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 16:24:15 -05:00
Nabin Mulepati
8e2fd3286f
feat: add image generation support with multi-modal context (#317) 2026-02-12 14:00:28 -07:00
Andre Manoel
f8b7c905e8
fix: include CUSTOM type in execution DAG and warn on generator errors (#324)
* fix: include CUSTOM type in execution DAG classification

Custom columns have required_columns and side_effect_columns
but were excluded from the DAG, causing incorrect execution
order when they depend on or are depended upon by other columns.

Co-authored-by: Lipika Ramaswamy <lramaswamy@nvidia.com>

* add warning when custom generator function fails

Log a warning in cell-by-cell mode so users know the record
will be skipped. In full-column mode the error message is
already descriptive enough via the DatasetGenerationError chain.

Co-authored-by: Lipika Ramaswamy <lramaswamy@nvidia.com>

---------

Co-authored-by: Lipika Ramaswamy <lramaswamy@nvidia.com>
2026-02-11 17:21:33 -03:00
Johnny Greco
f74f25872c
chore: quiet tool call logs and add tool usage statistics (#293)
* add tool usage statistics tracking

- Add ToolUsageStats class with metrics for tool calls, turns, and
  statistical aggregates (mean/stddev per generation)
- Extend ModelUsageStats to include tool_usage tracking
- Update ModelFacade.generate() to track total tool calls and turns
- Update tests with tool_call_count method and new assertions

* silence noisy mcp logs

* log message updates

* add tools enabled info message

* exclude empty tool_usage from usage stats output

* add tool usage summary logging after column generation

- Track tool usage snapshots before/after column processing
- Log mean tool calls per generation for columns with tools enabled
- Add get_tool_usage_snapshot/get_tool_usage_delta methods to ModelRegistry
- Remove unused extra_info parameter from progress_tracker.log_start()
- Add comprehensive tests for ToolUsageStats

* pretty format model usage logs

* reuse stubs and fixtures

* add merge method to ToolUsageStats for accurate stats aggregation

The previous implementation used extend() to combine tool usage stats,
but extend() is designed for single generation data. This caused
incorrect stddev calculations when merging stats from multiple sources.

- Add ToolUsageStats.merge() that properly combines sum-of-squares
- Update ModelUsageStats.extend() to use merge() for tool usage
- Add tests verifying stddev accuracy after merging

* fix tool usage stats missing generations_with_tools count

When tracking tool usage after generation, the ToolUsageStats was
created without setting generations_with_tools, causing the usage
summary to report zeros for calls/gen and turns/gen metrics.

* fix tool usage delta objects returning incorrect stddev values

- Simplify facade API to use tool_usage.extend() directly
- Return NaN for stddev when sum of squares wasn't tracked
- Add docstring to get_tool_usage_delta explaining NaN behavior
- Add comprehensive tests for stddev variance calculation

* fix tool usage delta stddev by including sum of squares in deltas

Convert sum_of_squares_turns and sum_of_squares_calls from private
attributes to public fields, enabling them to be included in delta
calculations. This allows get_tool_usage_delta to return objects that
compute accurate stddev values instead of NaN.

* fix test to use get_tool_usage_snapshot for accurate stddev tracking

The test was manually constructing a ToolUsageStats snapshot without
sum_of_squares fields, causing stddev to be NaN. Now uses the proper
snapshot method that includes all fields needed for delta calculations.

* use nvidia-reasoning by default

* mean -> average in log message

* refactor log indentation to use centralized LOG_INDENT constant

- Add LOG_INDENT constant to logging.py for consistent indentation
- Replace hardcoded "  |-- " strings across all log statements
- Add tool alias and MCP provider info to pre-generation logs
- Improve model usage log format for better consistency
- Update tests to match new log formats

* simplify usage stats dict access in model registry

Remove defensive .get() calls and unnecessary type casts since
the usage statistics dictionary structure is now guaranteed.

* walrus baby

* simplify tool usage tracking and reduce log verbosity

- Remove mean/stddev calculations from ToolUsageStats in favor of simple
  counts and generation ratios
- Add total_generations field to track all tool-enabled generations
- Simplify registry log output to show generations ratio (with_tools/total)
- Remove per-column tool usage snapshot/delta logging from column builder
- Track tool usage for all tool-enabled generations, not just those with calls

* format inference parameters as multi-line log output

- Add get_formatted_params() method to BaseInferenceParams
- Add LOG_DOUBLE_INDENT constant for nested indentation
- Update log_pre_generation() to display each parameter on its own line

* update tests to use LOG_INDENT constants

Align test assertions with the centralized log indentation
constants introduced in the logging module refactor.

* two-space consistency
2026-02-05 10:14:02 -05:00
Andre Manoel
62bae42dc2
feat: Add CustomColumnGenerator for user-defined column generation (#254)
* first attempt

* iterating a bit

* some improvements + multiturn example

* adapting to new monorepo structure

* refining

* fixed test

* fixing license headers

* adding docs

* adding test for failed generation

* allowing strategy to be picked

* renaming argument

* lint

* remove recommendation

* renaming for consistency

* addressing comments pt1

* addressing comments pt2

* addressing comments pt3

* adding a mock for development

* addressing greptile comments

* revamping

* docs: streamline custom columns documentation

* docs: simplify CustomColumnConfig docstring

Remove verbose code example and detailed function signatures from
docstring to match the pattern of other config classes in the file.

* test: clean up custom column tests

- Remove tests for private _custom_column_metadata attribute
- Combine redundant generator creation tests
- Reuse stub_resource_provider and stub_model_facade fixtures

* test: consolidate custom column tests

Reduce from 26 to 11 tests while maintaining coverage:
- Combine redundant config/decorator/creation tests
- Use parametrized tests for error conditions
- Remove duplicate validation tests for full_column strategy
- Simplify section headers

* refactor: deduplicate CustomColumnGenerator logic

Merge cell-by-cell and full-column code paths:
- _generate_cell_by_cell + _generate_full_column -> _generate
- _validate_output_columns + _validate_output_columns_df -> _validate_output

* chore: merge example files into single notebook-style example.py

Combine example.py, example_multiturn.py, and example_benchmark_strategies.py
into a single file with #%% cell markers for Jupyter/VS Code notebook mode.

* addressing greptile comments

* refactor: reuse generate_text in generate_text_batch

* refactor: replace CustomColumnContext with models dict

- Remove CustomColumnContext class; users now receive models dict directly
- Add DataDesigner.get_models() for experimentation outside pipeline
- Make parser optional in ModelFacade.generate() (defaults to identity)
- Validate parameter names: row/df, generator_params, models
- Update examples, tests, and docs for new API

* fix: address PR review comments from Nabin and greptile

- Make decorator metadata public (custom_column_metadata)
- Simplify get_generation_strategy() to directly return config value
- Use !r formatting in error messages
- Use lazy imports pattern for pandas (TYPE_CHECKING + lazy_heavy_imports)
- Remove redundant error logging before re-raise
- Validate max 3 positional parameters
- Use GenerationStrategy enum in example instead of string

* fix: replace lambda with module-level identity function in facade

Use pickleable _identity function instead of lambda x: x for the
default parser argument, ensuring compatibility with multiprocessing.

* fix: restore inherited attributes in LLM column docstrings

Restores the "Inherited Attributes" sections that were unintentionally
removed from LLMCodeColumnConfig, LLMStructuredColumnConfig, and
LLMJudgeColumnConfig docstrings.

* docs: clarify model_aliases is required for LLM access

Updated documentation and docstrings to clarify that model_aliases
populates the models dict (not just health checks).

* fix: address PR review comments from nabinchha

- clarify model_aliases requirement in docs
- add note about model alias validation during health check
- combine two loops into one in _run_model_health_check_if_needed
- add signature validation at decoration time
- enforce decorated functions in CustomColumnConfig validator
- simplify generator to only validate strategy-specific first param

* fix: address remaining PR review comments

- remove example.py (development artifact)
- fix get_models return type to dict[str, ModelFacade]

* test: update tests for decoration-time validation

- expect ValidationError instead of InvalidConfigError for non-callable
- split param validation test into decoration-time and runtime tests
2026-02-03 19:23:39 -03:00
Eric W. Tramel
5430bcbe99
Remove debug_trace_override (#290) 2026-02-03 12:09:30 -05:00
Eric W. Tramel
532d21a8d7
feat: add extract_reasoning_content option to LLM columns (#285) 2026-02-03 10:25:24 -05:00
Eric W. Tramel
510761107b
feat: Add TraceType enum for granular trace control (#284) 2026-02-02 19:43:51 -05:00
Eric W. Tramel
7248b9fc8f
Update trace normalization to ChatML content blocks (#283) 2026-02-02 18:22:16 -05:00
Eric W. Tramel
e6e58e692e
feat: MCP (Model Context Protocol) tool calling integration for LLM columns (#248) 2026-02-02 09:41:58 -05:00
Johnny Greco
0d51539aa6
feat: add message trace support for LLM generation (#272)
Add support for capturing full conversation traces during LLM generation,
enabling debugging and fine-tuning dataset creation.

Changes:
- Add `with_trace` field to LLMTextColumnConfig for per-column trace control
- Add `debug_override_save_all_column_traces` to RunConfig for global trace
- Introduce ChatMessage dataclass for structured message representation
- Update ModelFacade.generate() to return full message trace
- Rename trace column postfix from `__reasoning_trace` to `__trace`
- Add comprehensive traces documentation

Traces capture system/user/assistant messages in order, enabling visibility
into the full generation conversation including correction retries.
2026-01-30 17:03:07 -05:00
Johnny Greco
c19f35639f
chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
Johnny Greco
ae0665fa16
refactor: slim package refactor into three subpackages (#240)
* remove old structure

* major shuffle

* streamline project configs

* update make commands

* updates to make commands

* remove essentials

* initialize logger in interface

* uv lock

* ignore notepad

* update workflows

* fix e2e project config

* generate colab notebooks

* resolve default model settings in interface

* fix build commands

* update perf import make command

* cleaning up some slop

* update recipes

* move conftest files to tests/

* update subpackage readmes

* streamline config_logging

* use exports

* update perf import usage pattern

* update for IDE behavior with ruff

* remove engine's fixtures file

* add note to about lazy imports

* update dependencies

* update docs

* doc fixes

* uv lock

* updates to catch up with main

* clean up makefile

* remove package gitignores

* define deps only once

* isolate tests

* add test for protetion rule

* create temp dirs for isolated tests

* catch up to main

* update headers

* re apply changes

* better result summaries for isolated tests

* move exports into top-level init

* fix client importlib version syntax

* catch up with main
2026-01-27 13:53:20 -05:00