mirror of
https://github.com/NVIDIA-NeMo/DataDesigner
synced 2026-05-24 09:48:29 +00:00
26 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
61125a02d0 |
fix: align progress panel metrics
Render the progress legend as a stable table with live token-rate columns, attribute model usage to active generation columns across async bridge boundaries, and cancel the async scheduler cleanly on KeyboardInterrupt. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com> |
||
|
|
bd0410bb05
|
fix(engine): actionable error when a Jinja field is missing/None/empty (#633)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(engine): actionable error when a Jinja field is missing/None/empty Empty-render and missing-attribute failures used to surface as the generic "User provided prompt generation template is invalid." either because `sanitize_user_exceptions` stripped the detail or because Jinja's raw `UndefinedError` leaked through. Both now raise a new `EmptyTemplateRenderError` carrying a row-level diagnostic that names the offending chain and includes copy-pasteable Jinja conditional and SkipConfig fix patterns. Closes #629. * fix(engine): address PR review feedback on EmptyTemplateRenderError Addresses the open review comments on #633: 1. (Greptile P1) Gate expression in the suggested remediation template was one accessor too deep when the root variable was entirely absent from the record, causing the suggested fix to itself raise UndefinedError. Fall back to gating on the root name alone when sample_name is not in record. 2. (andreatgretel) The AST walker reported loop-local names as missing culprits (e.g. ``person`` in ``{% for person in people %}...{% endfor %}``). Filter extracted chains through ``meta.find_undeclared_variables`` to defer to Jinja's canonical scope tracking. 3. (andreatgretel follow-up) Empty collections used as loop iterables (``items=[]``) fell through to the no-culprit fallback. Add a new ``_CULPRIT_EMPTY_COLLECTION`` classification so they're surfaced. 4. Minor: add ``from exception`` to ``safe_render``'s UndefinedError re-raise for traceback consistency with the native engine path, and add a note on the load-bearing exception ordering in ``sanitize_user_exceptions``. |
||
|
|
4b93f5b245
|
feat: let column configs declare all model aliases for the startup health check (#626)
* feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
61cdeefb17
|
feat: make async engine the default execution path (#592)
Some checks failed
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish devnotes / deploy (push) Has been cancelled
* feat: make async engine the default execution path
The async engine has been hardening as opt-in for several releases. Make it
the default and address the prerequisites flagged for the flip.
Default flip
- DATA_DESIGNER_ASYNC_ENGINE defaults to "1" at both consumption sites
- Set DATA_DESIGNER_ASYNC_ENGINE=0 for one transitional release to opt out
- allow_resize=True still falls back to sync with a DeprecationWarning
Python 3.10 support
- Replace asyncio.TaskGroup (3.11+) in async_concurrency.py with
gather-with-explicit-cancel; semantics preserved because _run_task already
swallows its own exceptions and uses _shutdown_event for sibling cancellation
- Remove the sys.version_info < (3, 11) runtime guard
- Remove the matching pytest skipif so the executor tests run on 3.10 too
Derived timeouts (replaces two hardcoded 300s constants)
- ThrottleManager.acquire_sync/async default to timeout=None (no deadline)
instead of DEFAULT_ACQUIRE_TIMEOUT=300; HTTP request timeout already bounds
actual work, queue waits scale with provider speed and AIMD
- _AsyncBridgedModelFacade derives the sync->async bridge timeout from the
model's inference_parameters.timeout and the call's max_correction_steps;
one knob (per-model timeout) drives both deadlines, no new config surface
- Add ModelFacade.request_timeout property so the bridge can read the
effective timeout the client is configured with
Root-cause surfacing
- AsyncTaskScheduler captures the first non-retryable error and exposes it
via first_non_retryable_error
- Interface threads it through DataDesignerGenerationError when 0 records
are produced without early-shutdown, so deterministic failures (e.g. bad
seed sources) surface their original message instead of a wrapped
FileNotFoundError on the parquet path
Tests
- New: throttle no-deadline default behavior (sync+async), parametrized
derived bridge timeout, restored async_concurrency tests on 3.10
- Updated: test_dataset_builder.py uses an autouse fixture to pin its
Mock-based tests to the sync engine they cover; existing bridge tests
set facade.request_timeout for the new derivation
Docs
- Replace the stale LiteLLM security notice in README with a short
async-default heads-up and link to the migration guide
- Add docs/migration-async-default.md covering per-model timeouts,
custom-column thread safety, mocking model calls, run outcomes, and
the opt-out
- Append a short Update section to the async-all-the-way-down dev note
* test: extract _compute_bridge_timeout helper for direct testing
The parametrized bridge-timeout test was patching ``concurrent.futures.Future.result``
to capture the timeout the bridge passed in. That reaches into stdlib internals
(DEVELOPMENT.md "Mock at boundaries: Keep mocking shallow") and the ``ids=`` argument
on the parametrize was missing.
Extracts the formula into a module-level ``_compute_bridge_timeout`` helper. The test
now calls the helper directly with no mocking, and the parametrize gets readable ids.
Behavior is unchanged.
* test(e2e): align demo plugins with async engine contracts
The e2e demo plugins exercise plugin discovery and full DD lifecycle. Two
of them were written against sync-engine semantics that the async engine
restricts:
- DemoColumnGeneratorImpl was a ColumnGeneratorFullColumn with no
required_columns. The async engine routes ``no-upstream`` columns
through the from-scratch path, which passes an empty DataFrame to
generators that aren't FromScratchColumnGenerator subclasses. The
generator then produces 0 rows and the scheduler raises
``update_batch received 0 values``. Switching the plugin to
FromScratchColumnGenerator with generate_from_scratch(num_records)
matches what the plugin actually does (produces a constant column
without input) and works on both engines.
- RegexFilterProcessor implemented process_before_batch with row-count
changes. The async engine enforces row-count invariance in pre- and
post-batch processor stages by design. Moving the filter to
process_after_generation preserves the plugin's purpose (regex-based
row filtering) at a stage that supports row-count changes on both
engines. Test assertions check the final dataset, so the stage shift
is transparent.
Both changes are demo-plugin updates only; no production code change.
* fix: address Codex review findings on async-default flip
Three bugs and two test-quality concerns surfaced by an independent review of
the prior commits. Each was real and worth fixing in the flip PR.
Bug fixes
- Sync-fallback path was creating async-only model clients. The default flip
meant ``client_concurrency_mode = ASYNC`` for every default run, but the
``allow_resize=True`` path falls back to the sync engine — sync ``model.generate()``
calls then hit ``SyncClientUnavailableError``. The resolution decision now
lives at the DataDesigner interface level via
``_resolve_client_concurrency_mode``: it considers both the env var and the
config (allow_resize forces sync clients) and is passed explicitly to
``create_resource_provider``. Direct callers of the factory still get the
env-var default.
- Sync→async bridge timeout ignored the per-call ``timeout=`` override. A
custom column calling ``model.generate(timeout=600)`` against a slow endpoint
was being cancelled at the model-config default, not 600s. The bridge now
prefers ``kwargs.get("timeout")`` over ``facade.request_timeout``.
- Bridge timeout formula missed ``max_conversation_restarts``. One logical
generation can do ``(1 + max_conversation_restarts) × (1 + max_correction_steps)``
HTTP requests; the formula now multiplies both, matching the worst-case
attempt budget.
Engine routing fix (also surfaced by failing e2e plugin tests)
- ``_run_from_scratch`` else-branch passed an empty DataFrame to non-FromScratch
generators classified as seeds (no upstream columns), so ``ColumnGeneratorFullColumn``
with no required_columns produced 0 rows for an ``rg_size``-row buffer. Now
passes an ``rg_size``-row snapshot of the row-group buffer, mirroring the
sync engine's FULL_COLUMN contract.
- The earlier ``DemoColumnGeneratorImpl`` workaround (rewrite as ``FromScratchColumnGenerator``)
is reverted; the engine fix subsumes it. The processor-plugin fix
(``process_after_generation`` for the regex filter) stays — pre-batch
row-count change is intentionally rejected by the async engine.
Test improvements
- Throttle no-deadline tests are parametrized over ``(timeout=0.0, raises)``
and ``(timeout=None, waits)``, pinning that ``None`` is genuinely distinct
from any finite default. Sync and async counterparts mirror.
- New regression tests for ``first_non_retryable_error`` surfacing covering
both load-raises and load-returns-empty paths, asserting the original
exception is chained via ``__cause__`` and that the typed
``DataDesignerEarlyShutdownError`` doesn't fire in this branch.
- New parametrized regression test for ``_resolve_client_concurrency_mode``
covering all four (env × allow_resize) combinations.
- New parametrized test for the per-call ``timeout=`` override flowing into
the bridge timeout calculation.
- Bridge formula tests extended with ``max_conversation_restarts`` cases.
* test: trim redundant parametrize cases in async-default tests
Three parametrize cases were duplicating coverage already provided by
existing standalone tests:
- ``test_acquire_*_timeout_branches`` parametrized over ``(0.0, raises)``
and ``(None, waits)``. The ``raises`` half duplicates
``test_acquire_*_raises_timeout_when_at_capacity``. Replaced with two
focused ``..._default_no_deadline_waits_for_release`` tests covering
only the no-deadline branch.
- ``test_resolve_client_concurrency_mode_matches_engine_choice`` had four
cases. The ``async-off + allow-resize`` case asserts ``SYNC`` because the
env var alone forces it; the allow_resize input is moot. Dropped.
- ``test_async_bridge_honors_per_call_timeout`` had three cases. The
"override below floor" case cross-products the per-call override flow
with the floor-clamping behavior already covered by
``test_compute_bridge_timeout``. Dropped.
Net: -25 lines of test code with no loss of essential coverage.
* docs: fold migration page into existing concept docs
The standalone ``Migrating to the async default`` page didn't fit the
existing docs style — present tense, behavior over comparisons, content
in the natural concept home. Folding it in:
- ``architecture-and-performance.md`` gets a new ``Async Engine`` section
covering per-model timeouts, run outcomes (partial completion +
``DataDesignerEarlyShutdownError``), and the transitional opt-out.
Three stale ``async engine is landing soon`` callouts updated to
reflect the flip.
- ``custom_columns.md`` gets two short notes: a thread-safety callout
near Generation Strategies, and a mocking-with-spec note in
Development Testing.
- ``async-all-the-way-down.md`` Update section now points at the new
arch-and-perf section.
- README heads-up links to the same anchor.
- ``migration-async-default.md`` removed; mkdocs.yml entry dropped.
* docs: frame Execution Model as sync-engine specifics
Small targeted edits to make the user-facing concept docs consistent
with the post-flip state. No restructuring.
- ``architecture-and-performance.md``: the ``Execution Model`` callout
now opens with two engines, links to the new ``Async Engine`` section,
and frames the existing column-at-a-time description as sync-engine
semantics. The ``Step 2: Process columns sequentially`` paragraph notes
the async engine relaxes this. The ``Key Concepts`` table differentiates
per-engine for ``Batching`` and ``Sequential columns``; ``Parallel cells``
is the same on both.
- ``processors.md``: added a warning callout about the async engine's
row-count invariance in pre- and post-batch stages, with the guidance
to use ``process_after_generation()`` for row-filtering or expansion.
* fix: address review nits from PR #592 (Nabin)
Four targeted fixes from the review.
Worth-addressing (warning):
- ``test_acquire_async_default_no_deadline_waits_for_release`` was
spawning the release task without holding a strong reference. The
loop's weak-ref bookkeeping could GC it before the inner ``await``
observes the release, producing a CI flake. Hold the task and
``await`` it in ``finally``.
Take-it-or-leave-it (applied):
- Root-cause error surfacing now includes the exception type name:
``f"🛑 {type(root_cause).__name__}: {root_cause}"`` so users see
``ValueError: ...`` instead of just the message string. The
``__cause__`` chain is preserved either way.
- Drop the defensive ``getattr(c, "allow_resize", False)`` in
``_resolve_client_concurrency_mode`` — every member of
``ColumnConfigT`` inherits ``allow_resize: bool = False`` from
``SingleColumnConfig``.
- One-line comment near the root-cause surfacing branch noting that
``actual_num_records == 0`` is async-only (sync runs leave it at
``-1``), so the branch is async-only by construction.
Not addressed in this PR (filing as follow-ups):
- ``SYNC_BRIDGE_TIMEOUT = 300`` still hardcoded in
``column_generators/generators/base.py:_run_coroutine_sync``. That
bridge has no model-facade context to derive a timeout from, so the
fix is a structural refactor outside this PR's scope.
- First-error capture loses subsequent-error context. The "first wins"
heuristic is documented; richer aggregation is a follow-up.
* fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync
This was the third hardcoded 300s timeout (Nabin flagged it on PR #592).
The path is the generic sync→async bridge in ``ColumnGenerator.generate()``:
when a subclass overrides only ``agenerate()``, the sync entry point runs
the coroutine in a background thread.
Same philosophy we applied to the throttle queue wait elsewhere in the
PR: a defensive deadline on top of work that's already bounded by the
HTTP timeout doesn't add safety, it just produces spurious failures on
slow self-hosted endpoints. Drop the constant, the timeout exception
handling, and the ``timed_out`` bookkeeping. ``pool.shutdown(wait=True)``
becomes the simple cleanup.
Tests in ``test_async_generators.py`` exercise the happy path only and
don't depend on the timeout firing.
* Revert "fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync"
This reverts commit
|
||
|
|
47c72b3d87
|
fix(async): pack of fixes for async engine under degraded providers (#585)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(async): exclude all retryable errors from early-shutdown gate The gate previously only excluded `ModelRateLimitError`, leaving `ModelTimeoutError`, `ModelInternalServerError`, and `ModelAPIConnectionError` to count toward the sliding-window error rate. Under provider degradation these errors cluster in time (concurrent in-flight requests time out together), so 5/10 in a row is easy and trips the gate even when salvage could recover the rows. Refs #575. * feat(async): WARN log when provider showing degraded performance Diagnostic A/Bs against build.nvidia.com showed runs failing silently under provider degradation - no log indication that retryable errors were piling up until the early-shutdown gate fired (or, post-fix, until salvage exhaustion). Surfacing this earlier helps users distinguish "DataDesigner is broken" from "the upstream provider is slow today." Tracks a separate sliding window over retryable-vs-not for every task outcome (independent of the early-shutdown gate's window) and emits a throttled WARN when the rolling fraction crosses the threshold. Refs #575. * fix(async): salvage partial row groups on early shutdown Before: when the early-shutdown gate fired, any row group still in flight stayed in `_rg_states` un-checkpointed. The buffer manager later raised `FileNotFoundError` when the builder tried to read the finalized parquet. User-visible result: `0 records produced`. After: a new `_finalize_after_shutdown` step runs in `run()`'s finally block, after `_cancel_workers` has drained in-flight tasks (Codex caveat: in-flight `from_scratch`/`batch` tasks must not be allowed to write into a buffer that's being finalized). For each remaining row group it drops rows that aren't fully complete, then delegates to the existing `_checkpoint_completed_row_groups` so the buffer manager's zero-survivor handling (skip empty parquet, free buffer) kicks in unchanged. Also surfaces partial completion as a structured signal: scheduler exposes `early_shutdown: bool` and `partial_row_groups: tuple[int, ...]` properties so callers can detect partial completion programmatically rather than parsing log lines. Builder uses this to emit a more specific WARN distinguishing early shutdown from non-shutdown drops. Refs #575. * fix(throttle): reset consecutive_429s on non-rate-limit failure In `release_failure`, the cascade counter wasn't reset, so a sequence like 429 → 500 → 429 was treated as 2 consecutive 429s. The cascade counter feeds AIMD's reduce-once-per-cascade logic; the second 429 should start a fresh cascade and trigger another concurrency reduction, but currently doesn't. Standalone bug surfaced during #575 investigation; not on the failure path that drives the gate-trip outcome but worth fixing while we're in this code. * fix(custom): preserve retryability through CustomColumnGenerator wrap A real-workload run of #575 showed the early-shutdown gate still trips even with the gate-exclusion fix in place: the trigger is 10 timeouts inside Anonymizer's QA-repair custom columns, all wrapped in CustomColumnGenerationError (non-retryable) by the catch-all in CustomColumnGenerator. Two fixes here: 1. Re-raise RETRYABLE_MODEL_ERRORS unchanged before the wrap so the scheduler's _is_retryable correctly classifies them. 2. Surface _AsyncBridgedModelFacade timeouts as ModelTimeoutError instead of stdlib TimeoutError. Without this the sync bridge times out as the wrong exception type and is still classified non-retryable even after fix #1. Also moves _RETRYABLE_MODEL_ERRORS from async_scheduler to models/errors as the public RETRYABLE_MODEL_ERRORS tuple - both the scheduler and the wrap site need it, and models/errors is the appropriate home alongside the error class definitions. Refs #575. * feat(interface): typed DataDesignerEarlyShutdownError on zero-record runs When the async scheduler hits early shutdown and produces zero records, the buffer manager skips writing parquet (correctly), so ArtifactStorage.load_dataset_with_dropped_columns() raises FileNotFoundError. Previously this surfaced as a generic DataDesignerGenerationError wrapping the FileNotFoundError, which is ambiguous (could be missing files for any reason). This commit: - Adds DataDesignerEarlyShutdownError as a subclass of DataDesignerGenerationError so existing handlers still match while callers that want to react programmatically (retry on different alias, surface a degraded-provider message, etc.) can catch the specific type. - Plumbs the scheduler's structured signals (early_shutdown, partial_row_groups) up through the builder so they're available at data_designer.create() time without re-introspecting the scheduler. - create() raises the typed error in both failure modes (load fails or empty DataFrame returned) when builder.early_shutdown is True. Refs #575. * fix(async): emit first degraded-provider WARN regardless of clock state Initialize _last_degraded_warn_at to -inf so the first WARN is always emitted. The previous initialization to 0.0 suppressed the first WARN on fresh CI runners where time.monotonic() returns a small value (system boot uptime), making the throttle interval check (now - 0.0 < interval) true on the first attempt. * fix(async): address review findings on early-shutdown salvage PR Five real correctness issues caught in review of the original PR, plus a few smaller cleanups and test simplifications. Throttle - cascade reset (regression of existing AIMD invariant): release_failure() now resets consecutive_429s only when in_flight == 0. Resetting unconditionally broke "reduce once per cascade" when 429/500/429 arrived interleaved within a single in-flight burst - the second 429 was treated as a new cascade and the limit got halved twice for what was effectively one rate-limit event. Interface - typed-error gating: DataDesignerEarlyShutdownError now fires only when early_shutdown is true AND actual_num_records == 0. Without this, a partial-salvage run that fails to load for unrelated reasons (corrupt parquet, schema drift, disk hiccup) was misdiagnosed as "zero records produced," hiding the real cause. Async - WARN window scope: the degraded-provider warning was fed by every task outcome, including samplers and non-LLM customs. In realistic pipelines (one model column, several non-model columns) the rate stayed under threshold even when every model call was failing, silencing the WARN exactly when it mattered. Now gated on is_llm. Async/builder - signal preservation across raises: scheduler.early_shutdown and partial_row_groups are captured in a try/finally around future.result(), so a processor failure during the salvage path doesn't drop the structured signal. Both build() and build_preview() now reset per-run state at the start so reused builders don't leak prior-run flags. Async - dead code: dispatch_error capture in run() was unread (the post- finally check is unreachable on the exception path). Removed. Smaller cleanups: - early-shutdown WARN says "non-retryable error rate exceeded threshold" - bridge timeout WARN demoted to debug (ModelTimeoutError already surfaces it; the throttled degraded-provider WARN is the user-facing signal) - TODO note for threading degraded_warn_* through RunConfig - doc note in _finalize_after_shutdown clarifying that pre-batch processor isn't re-run on partial-salvage row groups Tests: - new regression tests for the cascade burst case, partial-salvage error gating, and LLM-only WARN window - direct unit test for _reset_run_state - dedup via _make_storage / _seed_plus_cell_setup helpers - WARN emission cases parametrized into a single test - shared parametrize lists hoisted to module-level constants - redundant cascade test dropped in favor of the more thorough drain variant; redundant healthy-baseline test folded into the zero-survivor test * chore(async): address Nabin's review comments Style cleanups, parametrization, docstring polish, and one consistency fix in the typed-error path. All non-blocking ("Ship it (with nits)"). interface/data_designer.py: - preview() now raises DataDesignerEarlyShutdownError when shutdown produced zero records (parity with create()), and also gates on actual_num_records == 0 so partial-salvage runs that fail to load don't get misdiagnosed - create()'s defensive empty-DF guard mirrors the load-failure guard with the same actual_num_records == 0 check async_scheduler.py: - _record_retryable_outcome docstring clarifies that the call site filters by is_llm; the function alone reads as if every outcome feeds the window dataset_builder.py: - moved _reset_run_state() down past the public methods to match the project's public-before-private convention test_custom.py: - flattened TestAsyncBridgedModelFacade class into module-level test functions (matches the rest of the file) - hoisted inline imports (asyncio, threading, patch, _AsyncBridgedModelFacade, SyncClientUnavailableError) to top of file - driven retryable-error parametrize off RETRYABLE_MODEL_ERRORS directly instead of the hand-rolled factory list, so new retryable types pick up coverage automatically - dropped the redundant "Sanity" block in test_async_bridge_timeout_raises_ model_timeout_error - pytest.raises already enforces the type, the duplicate block was running the same slow scenario twice test_async_scheduler.py: - parametrize over RETRYABLE_MODEL_ERRORS directly (same as above) test_data_designer.py: - added preview-path tests for the typed-error and partial-salvage fall-through cases - updated the existing empty-DF test to also patch actual_num_records=0 (otherwise the new gating in the empty-DF guard skips the typed error) * test(interface): consolidate create() error-dispatch tests into a matrix Five separate tests (two existing, three new from earlier in this PR) all probed the same dispatch logic in create(): "given a load outcome and a builder state, which error type should fire?" Pulled them into a single parametrized matrix indexed by (load_side_effect, early_shutdown, actual_num_records). Net result: 5 named tests → 1 parametrized test with 6 cells, and the previously-missing empty_df + shutdown + partial salvage cell is now covered. Test names retain readable IDs (load_fails_shutdown_zero_records etc.) so failures still pinpoint the exact case in pytest output. |
||
|
|
05c2e8df2e
|
fix: normalize image_url blocks to OpenAI-compliant dict format (#577)
* fix: normalize image_url blocks to OpenAI-compliant dict format (#576) ImageContext.get_contexts() produced bare-string and non-standard dict shapes for image_url content blocks, which broke the native OpenAI adapter (passes blocks through as-is) and only worked with Anthropic by accident via defensive handling in the translation layer. - Wrap all image_url values in {"url": ...} dict (OpenAI spec) - Remove non-standard "format" key from base64 dicts - Tighten Anthropic translate_image_url_block to require dict input Fixes #576 Made-with: Cursor * fix: reject malformed image_url blocks instead of silently dropping them translate_image_url_block now raises TypeError when image_url is not a dict. Since all image_url blocks are constructed internally, a bare string indicates an internal bug and should fail loudly. Made-with: Cursor * address review: tighten return type, add OpenAI + data-URI tests - Narrow _auto_resolve_context_value return type to dict[str, str] - Add OpenAI-client regression tests for image_url dict passthrough - Cover both bare-URL and bare-data-URI rejection in Anthropic tests Made-with: Cursor |
||
|
|
8be4ff787f
|
feat: add RunConfig jinja rendering engine (#557) | ||
|
|
a965bc1542
|
fix: bridge model.generate() to agenerate() for custom columns in async engine (#545)
* feat: bridge model.generate() to agenerate() for custom columns in async engine Custom column generators that call model.generate() fail under the async engine because the sync HTTP client is unavailable. Add an _AsyncBridgedModelFacade proxy in _build_models_dict() that intercepts the sync-client RuntimeError and schedules agenerate() on the engine's persistent event loop via run_coroutine_threadsafe. Includes a deadlock guard for async custom columns running on the event loop. * refactor: wrap facades at sync call site, not in _build_models_dict Move _AsyncBridgedModelFacade wrapping from _build_models_dict() into _invoke_generator_function() so the async path gets raw facades. The bridge proxy is only needed for sync custom columns; async columns already have direct access to model.agenerate(). * fix: address review feedback - typed exception, timeout cleanup, kwargs test - Introduce SyncClientUnavailableError so the facade catches by type instead of matching error strings (review comment #1) - Add future.cancel() + logger.warning() on timeout to match the _run_coroutine_sync pattern in base.py (review comment #2) - Assert kwargs forwarding in the async bridge test (review comment #4) * fix: let SyncClientUnavailableError propagate through @catch_llm_exceptions The decorator catches all exceptions and wraps them into DataDesignerError, which prevented the async bridge proxy from ever seeing the original error. Add an early match case that re-raises SyncClientUnavailableError directly. * refactor: make SYNC_BRIDGE_TIMEOUT a public constant Drop the underscore prefix since the constant is exported and used across modules (base.py and custom.py). |
||
|
|
28c8345909
|
feat: add built-in filesystem seed readers (#421) | ||
|
|
8fff7c07fe
|
feat: add async generator migration with symmetric bridging and statefulness (#378)
* feat: add async generator migration with symmetric bridging and statefulness - Symmetric generate/agenerate bridging in base ColumnGenerator - is_stateful property; SeedDatasetColumnGenerator declares True - Async wrappers for FromScratchColumnGenerator and ColumnGeneratorFullColumn - Native async paths for ImageCellGenerator and EmbeddingCellGenerator - CustomColumnGenerator.agenerate with full validation parity - Extract _postprocess_result for shared sync/async output validation * fix: avoid blocking caller on sync bridge timeout Use explicit pool lifecycle instead of context manager so that a TimeoutError releases the caller immediately via shutdown(wait=False) rather than blocking on pool.__exit__. * fix: widen agenerate type signature to match generate Add @overload declarations so the base agenerate accepts both dict and pd.DataFrame, mirroring the existing generate pattern. * fix: ensure pool shutdown on sync bridge success path The else clause after return was unreachable, leaking the ThreadPoolExecutor on every successful call. Capture the result first, shut down the pool, then return. * fix: use try/finally for pool shutdown in sync bridge Ensures ThreadPoolExecutor is shut down on all exit paths, including non-TimeoutError exceptions from the coroutine. * refactor: extract shared validation in ImageCellGenerator Move duplicated input validation and prompt rendering into _prepare_image_inputs, shared by generate and agenerate. * refactor: extract shared input prep in EmbeddingCellGenerator * address PR review feedback - add _is_overridden helper for symmetric generate/agenerate guards - move defensive .copy() into base agenerate, remove subclass overrides - re-raise as builtin TimeoutError for Python 3.10 compat - rename is_stateful to is_order_dependent with improved docstring - replace brittle .fget test with object.__new__ - add async tests for ImageCellGenerator and EmbeddingCellGenerator |
||
|
|
88989d1854
|
fix: replace removed DuckDB record_batch() with to_arrow_reader() (#380)
DuckDB 1.5.0 (released 2026-03-09) removed the record_batch() method from the Python Relation API, breaking SeedDatasetColumnGenerator. Migrate to the stable to_arrow_reader() API and bump the minimum DuckDB version to >=1.5.0. Fixes #379 |
||
|
|
8f7a72094a
|
feat: auto-detect ImageContext format for image-to-image generation (#342)
* updates to support image->image * update notebooks * regen colab notebooks * simplify tests |
||
|
|
70dc48884e
|
feat: add allow_resize for 1:N and N:1 generation patterns (#286)
* feat: add allow_resize for 1:N and N:1 generation patterns Adds support for generators that produce a different number of records than the input (expansion or retraction). This addresses GitHub issue #265. Changes: - Add `allow_resize` parameter to `update_records()` in DatasetBatchManager - Add `allow_resize` field to CustomColumnConfig - Add validation requiring FULL_COLUMN strategy when allow_resize=True - Track and report actual_num_records in metadata (may differ from target) - Add logging when batch size changes - Add example_allow_resize.py demonstrating the feature - Add comprehensive tests * docs: add allow_resize to custom columns documentation * refactor: consolidate buffer API and elevate allow_resize to base config - Merge update_records and replace_buffer into a single replace_buffer method with allow_resize parameter on DatasetBatchManager - Move allow_resize field from CustomColumnConfig to SingleColumnConfig so plugins inherit it without needing a mixin - Align example and logging with final CustomColumn API - Parametrize resize tests and extract shared stub in test_columns * test: add chained resize and multi-batch integration tests - Add expand->retract->expand chaining test (single batch) - Add multi-batch resize test verifying combined parquet output - Update example to chain expand/retract/expand with preview+build - Use 💥/✂️ emojis for resize logging (expand/retract) * extend allow_resize to cell-by-cell (return dict or list[dict]) - Config: allow allow_resize with CELL_BY_CELL; relax validator - Custom generator: accept dict | list[dict] when cell_by_cell + allow_resize; validate per row via _validate_cell_output - Builder: collect results by index when cell allow_resize, flatten and replace_buffer; add _log_resize_if_changed and _column_display_name - Docs: ALL_CAPS for strategies, simplify allow_resize table text - Tests: parametrized preview and multibatch; factories with n param; _RESIZE_SPECS with inline factory calls; ids ordered like specs * reorder allow_resize specs and add edge-case tests - Rename specs: full_x3, cell_x2, cell_plus_full_chain; add cell_filter_odd, cell_drop_all to _RESIZE_SPECS - Stubs before specs: _resize_full_keep_first, _resize_cell_expand, _resize_cell_filter_odd, _resize_cell_drop_all; drop cell factories - Remove FULL/CELL constants; use GenerationStrategy.* in _RESIZE_SPECS - Preview/multibatch parametrize: _preview and _multibatch ids; two full_x3 multibatch cases (5_2, 4_2) first - Handle all-batches-skipped in multibatch test (empty df when path missing) - test_custom: add test_cell_by_cell_allow_resize_return_list_single (1:1 via list) * tidy allow_resize: drop validator, shared stub, explicit flag - Remove validate_allow_resize_requires_full_column from CustomColumnConfig - Rename StubColumnConfigWithoutEmoji to StubColumnConfig in test_columns - Pass allow_resize=False in _write_processed_batch replace_buffer call * fix: add missing f prefix to error message in custom.py * docs(plugins): add section on setting allow_resize=True for resize plugins * fix: address PR review comments on allow_resize - Replace getattr with direct attribute access where config is always SingleColumnConfig (custom.py, cell-by-cell path in builder) - Keep getattr in _run_full_column_generator which also handles multi-column configs without allow_resize - Restructure allow_resize validation branching in CustomColumnGenerator - Fix error message wording: "key" -> "column" * fix: remove duplicate tool_alias log, fix test docstring - Remove tool_alias log from _setup_fan_out (callers already log it) - Fix docstring: CELL_BY_CELL -> FULL_COLUMN in resize test factory * fix: avoid duplicate undeclared-column warning in _validate_output Inline the strip instead of delegating to _validate_cell_output, which would log the same warning a second time. * fix: use lazy.pd instead of pd for runtime pandas usage in tests The pd import is under TYPE_CHECKING, so runtime calls need lazy.pd. |
||
|
|
1439bbea7e
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost. Key changes: - Defer controller imports to inside command functions - Remove eager re-export chains from CLI package __init__ files - Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time - Add lazy __getattr__ exports in interface/__init__.py - Replace module-level tokenizer init with cached lazy getter - Fix ModelProvider import to use config layer instead of engine - Update test mock paths to match new import locations Reduces CLI import-time from ~1.67s to ~0.46s. * perf: defer pandas/numpy in io_helpers and add config_list benchmark - Replace eager `from lazy_heavy_imports import pd, np` in io_helpers with module-level __getattr__ (for backwards-compatible external access / test mocks) and function-level imports in the 3 functions that actually use them (read_parquet_dataset, smart_load_dataframe, _convert_to_serializable). Importing io_helpers no longer triggers pandas/numpy loading. - Defer heavy imports in list and reset CLI commands into function bodies to avoid loading repositories, Rich, and prompt_toolkit at module import time. - Add `config_list` (data-designer config list) measurement to the CLI startup benchmark with isolated cold measurement in a separate venv and a --skip-config-list-check flag. - Update test mock paths to match new import locations. * Refine lazy import usage and TYPE_CHECKING cleanup * Run license header updater on PR-touched files * fix: update sqlfluff mock target for lazy imports in test_sql * perf: cache globals() in lazy __getattr__ to avoid repeated lookups Add globals() caching and explanatory comment to all three lazy __getattr__ implementations (lazy_heavy_imports, config/__init__, interface/__init__) so subsequent attribute accesses bypass __getattr__. * perf: lazy CLI command loading and deferred heavy import evaluations - Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files - Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes - Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks - Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use - Update test mock targets to patch at usage-site for module-level imports * refactor: use direct pandas import in seed_source_dataframe Drop lazy-loading for pandas in DataFrameSeedSource; use direct import for simplicity. * update lazy import pattern * update tests to use lazy import namespace Switch test modules to import data_designer.lazy_heavy_imports as lazy and reference heavy libraries through that namespace. This keeps heavy imports deferred during module import and aligns tests with the new lazy-import usage pattern. * tighten import perf test thresholds Document recent baseline timings and lower the allowed average import time and timeout so regressions are detected sooner. * document pandas import requirement Clarify that Pydantic needs DataFrame resolved at module load and that keeping the direct import preserves IDE typing support. * increase timeout time * use lazy pandas imports in visualization tests - replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports - add TYPE_CHECKING pandas import and keep CLI controller imports sorted * fix lazy pandas runtime usage and preview mocks Switch sample-record handling to lazy pandas types so runtime paths no longer depend on TYPE_CHECKING imports. Align preview controller tests to patch the module-local DataDesigner symbol, preventing real engine invocation in save results scenarios. |
||
|
|
8e2fd3286f
|
feat: add image generation support with multi-modal context (#317) | ||
|
|
f8b7c905e8
|
fix: include CUSTOM type in execution DAG and warn on generator errors (#324)
* fix: include CUSTOM type in execution DAG classification Custom columns have required_columns and side_effect_columns but were excluded from the DAG, causing incorrect execution order when they depend on or are depended upon by other columns. Co-authored-by: Lipika Ramaswamy <lramaswamy@nvidia.com> * add warning when custom generator function fails Log a warning in cell-by-cell mode so users know the record will be skipped. In full-column mode the error message is already descriptive enough via the DatasetGenerationError chain. Co-authored-by: Lipika Ramaswamy <lramaswamy@nvidia.com> --------- Co-authored-by: Lipika Ramaswamy <lramaswamy@nvidia.com> |
||
|
|
f74f25872c
|
chore: quiet tool call logs and add tool usage statistics (#293)
* add tool usage statistics tracking - Add ToolUsageStats class with metrics for tool calls, turns, and statistical aggregates (mean/stddev per generation) - Extend ModelUsageStats to include tool_usage tracking - Update ModelFacade.generate() to track total tool calls and turns - Update tests with tool_call_count method and new assertions * silence noisy mcp logs * log message updates * add tools enabled info message * exclude empty tool_usage from usage stats output * add tool usage summary logging after column generation - Track tool usage snapshots before/after column processing - Log mean tool calls per generation for columns with tools enabled - Add get_tool_usage_snapshot/get_tool_usage_delta methods to ModelRegistry - Remove unused extra_info parameter from progress_tracker.log_start() - Add comprehensive tests for ToolUsageStats * pretty format model usage logs * reuse stubs and fixtures * add merge method to ToolUsageStats for accurate stats aggregation The previous implementation used extend() to combine tool usage stats, but extend() is designed for single generation data. This caused incorrect stddev calculations when merging stats from multiple sources. - Add ToolUsageStats.merge() that properly combines sum-of-squares - Update ModelUsageStats.extend() to use merge() for tool usage - Add tests verifying stddev accuracy after merging * fix tool usage stats missing generations_with_tools count When tracking tool usage after generation, the ToolUsageStats was created without setting generations_with_tools, causing the usage summary to report zeros for calls/gen and turns/gen metrics. * fix tool usage delta objects returning incorrect stddev values - Simplify facade API to use tool_usage.extend() directly - Return NaN for stddev when sum of squares wasn't tracked - Add docstring to get_tool_usage_delta explaining NaN behavior - Add comprehensive tests for stddev variance calculation * fix tool usage delta stddev by including sum of squares in deltas Convert sum_of_squares_turns and sum_of_squares_calls from private attributes to public fields, enabling them to be included in delta calculations. This allows get_tool_usage_delta to return objects that compute accurate stddev values instead of NaN. * fix test to use get_tool_usage_snapshot for accurate stddev tracking The test was manually constructing a ToolUsageStats snapshot without sum_of_squares fields, causing stddev to be NaN. Now uses the proper snapshot method that includes all fields needed for delta calculations. * use nvidia-reasoning by default * mean -> average in log message * refactor log indentation to use centralized LOG_INDENT constant - Add LOG_INDENT constant to logging.py for consistent indentation - Replace hardcoded " |-- " strings across all log statements - Add tool alias and MCP provider info to pre-generation logs - Improve model usage log format for better consistency - Update tests to match new log formats * simplify usage stats dict access in model registry Remove defensive .get() calls and unnecessary type casts since the usage statistics dictionary structure is now guaranteed. * walrus baby * simplify tool usage tracking and reduce log verbosity - Remove mean/stddev calculations from ToolUsageStats in favor of simple counts and generation ratios - Add total_generations field to track all tool-enabled generations - Simplify registry log output to show generations ratio (with_tools/total) - Remove per-column tool usage snapshot/delta logging from column builder - Track tool usage for all tool-enabled generations, not just those with calls * format inference parameters as multi-line log output - Add get_formatted_params() method to BaseInferenceParams - Add LOG_DOUBLE_INDENT constant for nested indentation - Update log_pre_generation() to display each parameter on its own line * update tests to use LOG_INDENT constants Align test assertions with the centralized log indentation constants introduced in the logging module refactor. * two-space consistency |
||
|
|
62bae42dc2
|
feat: Add CustomColumnGenerator for user-defined column generation (#254)
* first attempt * iterating a bit * some improvements + multiturn example * adapting to new monorepo structure * refining * fixed test * fixing license headers * adding docs * adding test for failed generation * allowing strategy to be picked * renaming argument * lint * remove recommendation * renaming for consistency * addressing comments pt1 * addressing comments pt2 * addressing comments pt3 * adding a mock for development * addressing greptile comments * revamping * docs: streamline custom columns documentation * docs: simplify CustomColumnConfig docstring Remove verbose code example and detailed function signatures from docstring to match the pattern of other config classes in the file. * test: clean up custom column tests - Remove tests for private _custom_column_metadata attribute - Combine redundant generator creation tests - Reuse stub_resource_provider and stub_model_facade fixtures * test: consolidate custom column tests Reduce from 26 to 11 tests while maintaining coverage: - Combine redundant config/decorator/creation tests - Use parametrized tests for error conditions - Remove duplicate validation tests for full_column strategy - Simplify section headers * refactor: deduplicate CustomColumnGenerator logic Merge cell-by-cell and full-column code paths: - _generate_cell_by_cell + _generate_full_column -> _generate - _validate_output_columns + _validate_output_columns_df -> _validate_output * chore: merge example files into single notebook-style example.py Combine example.py, example_multiturn.py, and example_benchmark_strategies.py into a single file with #%% cell markers for Jupyter/VS Code notebook mode. * addressing greptile comments * refactor: reuse generate_text in generate_text_batch * refactor: replace CustomColumnContext with models dict - Remove CustomColumnContext class; users now receive models dict directly - Add DataDesigner.get_models() for experimentation outside pipeline - Make parser optional in ModelFacade.generate() (defaults to identity) - Validate parameter names: row/df, generator_params, models - Update examples, tests, and docs for new API * fix: address PR review comments from Nabin and greptile - Make decorator metadata public (custom_column_metadata) - Simplify get_generation_strategy() to directly return config value - Use !r formatting in error messages - Use lazy imports pattern for pandas (TYPE_CHECKING + lazy_heavy_imports) - Remove redundant error logging before re-raise - Validate max 3 positional parameters - Use GenerationStrategy enum in example instead of string * fix: replace lambda with module-level identity function in facade Use pickleable _identity function instead of lambda x: x for the default parser argument, ensuring compatibility with multiprocessing. * fix: restore inherited attributes in LLM column docstrings Restores the "Inherited Attributes" sections that were unintentionally removed from LLMCodeColumnConfig, LLMStructuredColumnConfig, and LLMJudgeColumnConfig docstrings. * docs: clarify model_aliases is required for LLM access Updated documentation and docstrings to clarify that model_aliases populates the models dict (not just health checks). * fix: address PR review comments from nabinchha - clarify model_aliases requirement in docs - add note about model alias validation during health check - combine two loops into one in _run_model_health_check_if_needed - add signature validation at decoration time - enforce decorated functions in CustomColumnConfig validator - simplify generator to only validate strategy-specific first param * fix: address remaining PR review comments - remove example.py (development artifact) - fix get_models return type to dict[str, ModelFacade] * test: update tests for decoration-time validation - expect ValidationError instead of InvalidConfigError for non-callable - split param validation test into decoration-time and runtime tests |
||
|
|
5430bcbe99
|
Remove debug_trace_override (#290)
|
||
|
|
532d21a8d7
|
feat: add extract_reasoning_content option to LLM columns (#285) | ||
|
|
510761107b
|
feat: Add TraceType enum for granular trace control (#284) | ||
|
|
7248b9fc8f
|
Update trace normalization to ChatML content blocks (#283) | ||
|
|
e6e58e692e
|
feat: MCP (Model Context Protocol) tool calling integration for LLM columns (#248) | ||
|
|
0d51539aa6
|
feat: add message trace support for LLM generation (#272)
Add support for capturing full conversation traces during LLM generation, enabling debugging and fine-tuning dataset creation. Changes: - Add `with_trace` field to LLMTextColumnConfig for per-column trace control - Add `debug_override_save_all_column_traces` to RunConfig for global trace - Introduce ChatMessage dataclass for structured message representation - Update ModelFacade.generate() to return full message trace - Rename trace column postfix from `__reasoning_trace` to `__trace` - Add comprehensive traces documentation Traces capture system/user/assistant messages in order, enabling visibility into the full generation conversation including correction retries. |
||
|
|
c19f35639f
|
chore: add publish script and update license headers (#253) | ||
|
|
ae0665fa16
|
refactor: slim package refactor into three subpackages (#240)
* remove old structure * major shuffle * streamline project configs * update make commands * updates to make commands * remove essentials * initialize logger in interface * uv lock * ignore notepad * update workflows * fix e2e project config * generate colab notebooks * resolve default model settings in interface * fix build commands * update perf import make command * cleaning up some slop * update recipes * move conftest files to tests/ * update subpackage readmes * streamline config_logging * use exports * update perf import usage pattern * update for IDE behavior with ruff * remove engine's fixtures file * add note to about lazy imports * update dependencies * update docs * doc fixes * uv lock * updates to catch up with main * clean up makefile * remove package gitignores * define deps only once * isolate tests * add test for protetion rule * create temp dirs for isolated tests * catch up to main * update headers * re apply changes * better result summaries for isolated tests * move exports into top-level init * fix client importlib version syntax * catch up with main |