mirror of
https://github.com/NVIDIA-NeMo/DataDesigner
synced 2026-05-24 09:48:29 +00:00
37 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
de1c2d7117 | feat: mark request feedback in throughput chart | ||
|
|
1cf84d53a7 |
fix: show token rates in progress demo
Emit synthetic token usage in the credential-free progress panel demo so the live token-rate columns are visible, and accept output-only provider usage as a progress event. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com> |
||
|
|
61125a02d0 |
fix: align progress panel metrics
Render the progress legend as a stable table with live token-rate columns, attribute model usage to active generation columns across async bridge boundaries, and cancel the async scheduler cleanly on KeyboardInterrupt. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com> |
||
|
|
c0a4dcbb85
|
feat: implement async scheduling admission control (#661)
Some checks are pending
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
|
||
|
|
a83968f701
|
feat: preserve multimodal MCP tool results (#689)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* feat: preserve multimodal MCP tool results Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: gate MCP generic image payload detection Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix: validate MCP image payload blocks Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
0860d62e23
|
fix: restore chat completion multi-choice support (#672)
* fix chat completion multi-choice support Restore the chat completion n request field and preserve all returned choices in the canonical response while keeping response.message as the first choice. Add coverage for request forwarding, compatibility access, multi-choice parsing, and generate forwarding. Fixes #620 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * strip n from generate requests Prevent generate and agenerate from forwarding multi-choice requests that they cannot expose, while keeping completion() multi-choice support intact. Add coverage for async parsing and Anthropic n exclusion. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * strip configured n from generate requests Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * rename multiple choice completion flag Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * move choice sanitizer to private helpers Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * order private facade helpers Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
71997624b3
|
feat: track reasoning token usage (#670)
Some checks are pending
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Config (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Config (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
* feat: track reasoning token usage Capture provider-reported reasoning-token breakdowns alongside output tokens without changing output token totals. Carry the field through model usage aggregation and add coverage for parsing, facade tracking, and deltas. Refs #665 * fix: show reasoning tokens in usage summary Include reasoning token counts in the local model usage summary while preserving output and total token semantics. Telemetry remains unchanged. Refs #665 * fix: estimate missing reasoning token counts When providers return reasoning content without a numeric usage breakdown, estimate reasoning tokens from that content while preserving provider-reported output and total token counts. Refs #665 * fix: track reasoning token count source * fix: simplify reasoning token source * fix: omit unknown reasoning tokens from logs * refactor: clarify reasoning token count helpers * test: move token counting tests * fix: enforce reasoning token source * fix: address reasoning usage review |
||
|
|
a4085c441a
|
feat: add AIMD startup ramp (#638)
Some checks failed
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Coverage Check (Python 3.11) (push) Has been cancelled
CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Lint and Format Check (push) Has been cancelled
CI / Check License Headers (push) Has been cancelled
Publish devnotes / deploy (push) Has been cancelled
Publish Fern devnotes / deploy (push) Has been cancelled
CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled
CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled
CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled
|
||
|
|
61cdeefb17
|
feat: make async engine the default execution path (#592)
Some checks failed
CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish devnotes / deploy (push) Has been cancelled
* feat: make async engine the default execution path
The async engine has been hardening as opt-in for several releases. Make it
the default and address the prerequisites flagged for the flip.
Default flip
- DATA_DESIGNER_ASYNC_ENGINE defaults to "1" at both consumption sites
- Set DATA_DESIGNER_ASYNC_ENGINE=0 for one transitional release to opt out
- allow_resize=True still falls back to sync with a DeprecationWarning
Python 3.10 support
- Replace asyncio.TaskGroup (3.11+) in async_concurrency.py with
gather-with-explicit-cancel; semantics preserved because _run_task already
swallows its own exceptions and uses _shutdown_event for sibling cancellation
- Remove the sys.version_info < (3, 11) runtime guard
- Remove the matching pytest skipif so the executor tests run on 3.10 too
Derived timeouts (replaces two hardcoded 300s constants)
- ThrottleManager.acquire_sync/async default to timeout=None (no deadline)
instead of DEFAULT_ACQUIRE_TIMEOUT=300; HTTP request timeout already bounds
actual work, queue waits scale with provider speed and AIMD
- _AsyncBridgedModelFacade derives the sync->async bridge timeout from the
model's inference_parameters.timeout and the call's max_correction_steps;
one knob (per-model timeout) drives both deadlines, no new config surface
- Add ModelFacade.request_timeout property so the bridge can read the
effective timeout the client is configured with
Root-cause surfacing
- AsyncTaskScheduler captures the first non-retryable error and exposes it
via first_non_retryable_error
- Interface threads it through DataDesignerGenerationError when 0 records
are produced without early-shutdown, so deterministic failures (e.g. bad
seed sources) surface their original message instead of a wrapped
FileNotFoundError on the parquet path
Tests
- New: throttle no-deadline default behavior (sync+async), parametrized
derived bridge timeout, restored async_concurrency tests on 3.10
- Updated: test_dataset_builder.py uses an autouse fixture to pin its
Mock-based tests to the sync engine they cover; existing bridge tests
set facade.request_timeout for the new derivation
Docs
- Replace the stale LiteLLM security notice in README with a short
async-default heads-up and link to the migration guide
- Add docs/migration-async-default.md covering per-model timeouts,
custom-column thread safety, mocking model calls, run outcomes, and
the opt-out
- Append a short Update section to the async-all-the-way-down dev note
* test: extract _compute_bridge_timeout helper for direct testing
The parametrized bridge-timeout test was patching ``concurrent.futures.Future.result``
to capture the timeout the bridge passed in. That reaches into stdlib internals
(DEVELOPMENT.md "Mock at boundaries: Keep mocking shallow") and the ``ids=`` argument
on the parametrize was missing.
Extracts the formula into a module-level ``_compute_bridge_timeout`` helper. The test
now calls the helper directly with no mocking, and the parametrize gets readable ids.
Behavior is unchanged.
* test(e2e): align demo plugins with async engine contracts
The e2e demo plugins exercise plugin discovery and full DD lifecycle. Two
of them were written against sync-engine semantics that the async engine
restricts:
- DemoColumnGeneratorImpl was a ColumnGeneratorFullColumn with no
required_columns. The async engine routes ``no-upstream`` columns
through the from-scratch path, which passes an empty DataFrame to
generators that aren't FromScratchColumnGenerator subclasses. The
generator then produces 0 rows and the scheduler raises
``update_batch received 0 values``. Switching the plugin to
FromScratchColumnGenerator with generate_from_scratch(num_records)
matches what the plugin actually does (produces a constant column
without input) and works on both engines.
- RegexFilterProcessor implemented process_before_batch with row-count
changes. The async engine enforces row-count invariance in pre- and
post-batch processor stages by design. Moving the filter to
process_after_generation preserves the plugin's purpose (regex-based
row filtering) at a stage that supports row-count changes on both
engines. Test assertions check the final dataset, so the stage shift
is transparent.
Both changes are demo-plugin updates only; no production code change.
* fix: address Codex review findings on async-default flip
Three bugs and two test-quality concerns surfaced by an independent review of
the prior commits. Each was real and worth fixing in the flip PR.
Bug fixes
- Sync-fallback path was creating async-only model clients. The default flip
meant ``client_concurrency_mode = ASYNC`` for every default run, but the
``allow_resize=True`` path falls back to the sync engine — sync ``model.generate()``
calls then hit ``SyncClientUnavailableError``. The resolution decision now
lives at the DataDesigner interface level via
``_resolve_client_concurrency_mode``: it considers both the env var and the
config (allow_resize forces sync clients) and is passed explicitly to
``create_resource_provider``. Direct callers of the factory still get the
env-var default.
- Sync→async bridge timeout ignored the per-call ``timeout=`` override. A
custom column calling ``model.generate(timeout=600)`` against a slow endpoint
was being cancelled at the model-config default, not 600s. The bridge now
prefers ``kwargs.get("timeout")`` over ``facade.request_timeout``.
- Bridge timeout formula missed ``max_conversation_restarts``. One logical
generation can do ``(1 + max_conversation_restarts) × (1 + max_correction_steps)``
HTTP requests; the formula now multiplies both, matching the worst-case
attempt budget.
Engine routing fix (also surfaced by failing e2e plugin tests)
- ``_run_from_scratch`` else-branch passed an empty DataFrame to non-FromScratch
generators classified as seeds (no upstream columns), so ``ColumnGeneratorFullColumn``
with no required_columns produced 0 rows for an ``rg_size``-row buffer. Now
passes an ``rg_size``-row snapshot of the row-group buffer, mirroring the
sync engine's FULL_COLUMN contract.
- The earlier ``DemoColumnGeneratorImpl`` workaround (rewrite as ``FromScratchColumnGenerator``)
is reverted; the engine fix subsumes it. The processor-plugin fix
(``process_after_generation`` for the regex filter) stays — pre-batch
row-count change is intentionally rejected by the async engine.
Test improvements
- Throttle no-deadline tests are parametrized over ``(timeout=0.0, raises)``
and ``(timeout=None, waits)``, pinning that ``None`` is genuinely distinct
from any finite default. Sync and async counterparts mirror.
- New regression tests for ``first_non_retryable_error`` surfacing covering
both load-raises and load-returns-empty paths, asserting the original
exception is chained via ``__cause__`` and that the typed
``DataDesignerEarlyShutdownError`` doesn't fire in this branch.
- New parametrized regression test for ``_resolve_client_concurrency_mode``
covering all four (env × allow_resize) combinations.
- New parametrized test for the per-call ``timeout=`` override flowing into
the bridge timeout calculation.
- Bridge formula tests extended with ``max_conversation_restarts`` cases.
* test: trim redundant parametrize cases in async-default tests
Three parametrize cases were duplicating coverage already provided by
existing standalone tests:
- ``test_acquire_*_timeout_branches`` parametrized over ``(0.0, raises)``
and ``(None, waits)``. The ``raises`` half duplicates
``test_acquire_*_raises_timeout_when_at_capacity``. Replaced with two
focused ``..._default_no_deadline_waits_for_release`` tests covering
only the no-deadline branch.
- ``test_resolve_client_concurrency_mode_matches_engine_choice`` had four
cases. The ``async-off + allow-resize`` case asserts ``SYNC`` because the
env var alone forces it; the allow_resize input is moot. Dropped.
- ``test_async_bridge_honors_per_call_timeout`` had three cases. The
"override below floor" case cross-products the per-call override flow
with the floor-clamping behavior already covered by
``test_compute_bridge_timeout``. Dropped.
Net: -25 lines of test code with no loss of essential coverage.
* docs: fold migration page into existing concept docs
The standalone ``Migrating to the async default`` page didn't fit the
existing docs style — present tense, behavior over comparisons, content
in the natural concept home. Folding it in:
- ``architecture-and-performance.md`` gets a new ``Async Engine`` section
covering per-model timeouts, run outcomes (partial completion +
``DataDesignerEarlyShutdownError``), and the transitional opt-out.
Three stale ``async engine is landing soon`` callouts updated to
reflect the flip.
- ``custom_columns.md`` gets two short notes: a thread-safety callout
near Generation Strategies, and a mocking-with-spec note in
Development Testing.
- ``async-all-the-way-down.md`` Update section now points at the new
arch-and-perf section.
- README heads-up links to the same anchor.
- ``migration-async-default.md`` removed; mkdocs.yml entry dropped.
* docs: frame Execution Model as sync-engine specifics
Small targeted edits to make the user-facing concept docs consistent
with the post-flip state. No restructuring.
- ``architecture-and-performance.md``: the ``Execution Model`` callout
now opens with two engines, links to the new ``Async Engine`` section,
and frames the existing column-at-a-time description as sync-engine
semantics. The ``Step 2: Process columns sequentially`` paragraph notes
the async engine relaxes this. The ``Key Concepts`` table differentiates
per-engine for ``Batching`` and ``Sequential columns``; ``Parallel cells``
is the same on both.
- ``processors.md``: added a warning callout about the async engine's
row-count invariance in pre- and post-batch stages, with the guidance
to use ``process_after_generation()`` for row-filtering or expansion.
* fix: address review nits from PR #592 (Nabin)
Four targeted fixes from the review.
Worth-addressing (warning):
- ``test_acquire_async_default_no_deadline_waits_for_release`` was
spawning the release task without holding a strong reference. The
loop's weak-ref bookkeeping could GC it before the inner ``await``
observes the release, producing a CI flake. Hold the task and
``await`` it in ``finally``.
Take-it-or-leave-it (applied):
- Root-cause error surfacing now includes the exception type name:
``f"🛑 {type(root_cause).__name__}: {root_cause}"`` so users see
``ValueError: ...`` instead of just the message string. The
``__cause__`` chain is preserved either way.
- Drop the defensive ``getattr(c, "allow_resize", False)`` in
``_resolve_client_concurrency_mode`` — every member of
``ColumnConfigT`` inherits ``allow_resize: bool = False`` from
``SingleColumnConfig``.
- One-line comment near the root-cause surfacing branch noting that
``actual_num_records == 0`` is async-only (sync runs leave it at
``-1``), so the branch is async-only by construction.
Not addressed in this PR (filing as follow-ups):
- ``SYNC_BRIDGE_TIMEOUT = 300`` still hardcoded in
``column_generators/generators/base.py:_run_coroutine_sync``. That
bridge has no model-facade context to derive a timeout from, so the
fix is a structural refactor outside this PR's scope.
- First-error capture loses subsequent-error context. The "first wins"
heuristic is documented; richer aggregation is a follow-up.
* fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync
This was the third hardcoded 300s timeout (Nabin flagged it on PR #592).
The path is the generic sync→async bridge in ``ColumnGenerator.generate()``:
when a subclass overrides only ``agenerate()``, the sync entry point runs
the coroutine in a background thread.
Same philosophy we applied to the throttle queue wait elsewhere in the
PR: a defensive deadline on top of work that's already bounded by the
HTTP timeout doesn't add safety, it just produces spurious failures on
slow self-hosted endpoints. Drop the constant, the timeout exception
handling, and the ``timed_out`` bookkeeping. ``pool.shutdown(wait=True)``
becomes the simple cleanup.
Tests in ``test_async_generators.py`` exercise the happy path only and
don't depend on the timeout firing.
* Revert "fix: drop SYNC_BRIDGE_TIMEOUT in _run_coroutine_sync"
This reverts commit
|
||
|
|
47c72b3d87
|
fix(async): pack of fixes for async engine under degraded providers (#585)
Some checks are pending
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Coverage Check (Python 3.11) (push) Waiting to run
CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run
CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run
CI / Lint and Format Check (push) Waiting to run
CI / Check License Headers (push) Waiting to run
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
* fix(async): exclude all retryable errors from early-shutdown gate The gate previously only excluded `ModelRateLimitError`, leaving `ModelTimeoutError`, `ModelInternalServerError`, and `ModelAPIConnectionError` to count toward the sliding-window error rate. Under provider degradation these errors cluster in time (concurrent in-flight requests time out together), so 5/10 in a row is easy and trips the gate even when salvage could recover the rows. Refs #575. * feat(async): WARN log when provider showing degraded performance Diagnostic A/Bs against build.nvidia.com showed runs failing silently under provider degradation - no log indication that retryable errors were piling up until the early-shutdown gate fired (or, post-fix, until salvage exhaustion). Surfacing this earlier helps users distinguish "DataDesigner is broken" from "the upstream provider is slow today." Tracks a separate sliding window over retryable-vs-not for every task outcome (independent of the early-shutdown gate's window) and emits a throttled WARN when the rolling fraction crosses the threshold. Refs #575. * fix(async): salvage partial row groups on early shutdown Before: when the early-shutdown gate fired, any row group still in flight stayed in `_rg_states` un-checkpointed. The buffer manager later raised `FileNotFoundError` when the builder tried to read the finalized parquet. User-visible result: `0 records produced`. After: a new `_finalize_after_shutdown` step runs in `run()`'s finally block, after `_cancel_workers` has drained in-flight tasks (Codex caveat: in-flight `from_scratch`/`batch` tasks must not be allowed to write into a buffer that's being finalized). For each remaining row group it drops rows that aren't fully complete, then delegates to the existing `_checkpoint_completed_row_groups` so the buffer manager's zero-survivor handling (skip empty parquet, free buffer) kicks in unchanged. Also surfaces partial completion as a structured signal: scheduler exposes `early_shutdown: bool` and `partial_row_groups: tuple[int, ...]` properties so callers can detect partial completion programmatically rather than parsing log lines. Builder uses this to emit a more specific WARN distinguishing early shutdown from non-shutdown drops. Refs #575. * fix(throttle): reset consecutive_429s on non-rate-limit failure In `release_failure`, the cascade counter wasn't reset, so a sequence like 429 → 500 → 429 was treated as 2 consecutive 429s. The cascade counter feeds AIMD's reduce-once-per-cascade logic; the second 429 should start a fresh cascade and trigger another concurrency reduction, but currently doesn't. Standalone bug surfaced during #575 investigation; not on the failure path that drives the gate-trip outcome but worth fixing while we're in this code. * fix(custom): preserve retryability through CustomColumnGenerator wrap A real-workload run of #575 showed the early-shutdown gate still trips even with the gate-exclusion fix in place: the trigger is 10 timeouts inside Anonymizer's QA-repair custom columns, all wrapped in CustomColumnGenerationError (non-retryable) by the catch-all in CustomColumnGenerator. Two fixes here: 1. Re-raise RETRYABLE_MODEL_ERRORS unchanged before the wrap so the scheduler's _is_retryable correctly classifies them. 2. Surface _AsyncBridgedModelFacade timeouts as ModelTimeoutError instead of stdlib TimeoutError. Without this the sync bridge times out as the wrong exception type and is still classified non-retryable even after fix #1. Also moves _RETRYABLE_MODEL_ERRORS from async_scheduler to models/errors as the public RETRYABLE_MODEL_ERRORS tuple - both the scheduler and the wrap site need it, and models/errors is the appropriate home alongside the error class definitions. Refs #575. * feat(interface): typed DataDesignerEarlyShutdownError on zero-record runs When the async scheduler hits early shutdown and produces zero records, the buffer manager skips writing parquet (correctly), so ArtifactStorage.load_dataset_with_dropped_columns() raises FileNotFoundError. Previously this surfaced as a generic DataDesignerGenerationError wrapping the FileNotFoundError, which is ambiguous (could be missing files for any reason). This commit: - Adds DataDesignerEarlyShutdownError as a subclass of DataDesignerGenerationError so existing handlers still match while callers that want to react programmatically (retry on different alias, surface a degraded-provider message, etc.) can catch the specific type. - Plumbs the scheduler's structured signals (early_shutdown, partial_row_groups) up through the builder so they're available at data_designer.create() time without re-introspecting the scheduler. - create() raises the typed error in both failure modes (load fails or empty DataFrame returned) when builder.early_shutdown is True. Refs #575. * fix(async): emit first degraded-provider WARN regardless of clock state Initialize _last_degraded_warn_at to -inf so the first WARN is always emitted. The previous initialization to 0.0 suppressed the first WARN on fresh CI runners where time.monotonic() returns a small value (system boot uptime), making the throttle interval check (now - 0.0 < interval) true on the first attempt. * fix(async): address review findings on early-shutdown salvage PR Five real correctness issues caught in review of the original PR, plus a few smaller cleanups and test simplifications. Throttle - cascade reset (regression of existing AIMD invariant): release_failure() now resets consecutive_429s only when in_flight == 0. Resetting unconditionally broke "reduce once per cascade" when 429/500/429 arrived interleaved within a single in-flight burst - the second 429 was treated as a new cascade and the limit got halved twice for what was effectively one rate-limit event. Interface - typed-error gating: DataDesignerEarlyShutdownError now fires only when early_shutdown is true AND actual_num_records == 0. Without this, a partial-salvage run that fails to load for unrelated reasons (corrupt parquet, schema drift, disk hiccup) was misdiagnosed as "zero records produced," hiding the real cause. Async - WARN window scope: the degraded-provider warning was fed by every task outcome, including samplers and non-LLM customs. In realistic pipelines (one model column, several non-model columns) the rate stayed under threshold even when every model call was failing, silencing the WARN exactly when it mattered. Now gated on is_llm. Async/builder - signal preservation across raises: scheduler.early_shutdown and partial_row_groups are captured in a try/finally around future.result(), so a processor failure during the salvage path doesn't drop the structured signal. Both build() and build_preview() now reset per-run state at the start so reused builders don't leak prior-run flags. Async - dead code: dispatch_error capture in run() was unread (the post- finally check is unreachable on the exception path). Removed. Smaller cleanups: - early-shutdown WARN says "non-retryable error rate exceeded threshold" - bridge timeout WARN demoted to debug (ModelTimeoutError already surfaces it; the throttled degraded-provider WARN is the user-facing signal) - TODO note for threading degraded_warn_* through RunConfig - doc note in _finalize_after_shutdown clarifying that pre-batch processor isn't re-run on partial-salvage row groups Tests: - new regression tests for the cascade burst case, partial-salvage error gating, and LLM-only WARN window - direct unit test for _reset_run_state - dedup via _make_storage / _seed_plus_cell_setup helpers - WARN emission cases parametrized into a single test - shared parametrize lists hoisted to module-level constants - redundant cascade test dropped in favor of the more thorough drain variant; redundant healthy-baseline test folded into the zero-survivor test * chore(async): address Nabin's review comments Style cleanups, parametrization, docstring polish, and one consistency fix in the typed-error path. All non-blocking ("Ship it (with nits)"). interface/data_designer.py: - preview() now raises DataDesignerEarlyShutdownError when shutdown produced zero records (parity with create()), and also gates on actual_num_records == 0 so partial-salvage runs that fail to load don't get misdiagnosed - create()'s defensive empty-DF guard mirrors the load-failure guard with the same actual_num_records == 0 check async_scheduler.py: - _record_retryable_outcome docstring clarifies that the call site filters by is_llm; the function alone reads as if every outcome feeds the window dataset_builder.py: - moved _reset_run_state() down past the public methods to match the project's public-before-private convention test_custom.py: - flattened TestAsyncBridgedModelFacade class into module-level test functions (matches the rest of the file) - hoisted inline imports (asyncio, threading, patch, _AsyncBridgedModelFacade, SyncClientUnavailableError) to top of file - driven retryable-error parametrize off RETRYABLE_MODEL_ERRORS directly instead of the hand-rolled factory list, so new retryable types pick up coverage automatically - dropped the redundant "Sanity" block in test_async_bridge_timeout_raises_ model_timeout_error - pytest.raises already enforces the type, the duplicate block was running the same slow scenario twice test_async_scheduler.py: - parametrize over RETRYABLE_MODEL_ERRORS directly (same as above) test_data_designer.py: - added preview-path tests for the typed-error and partial-salvage fall-through cases - updated the existing empty-DF test to also patch actual_num_records=0 (otherwise the new gating in the empty-DF guard skips the typed error) * test(interface): consolidate create() error-dispatch tests into a matrix Five separate tests (two existing, three new from earlier in this PR) all probed the same dispatch logic in create(): "given a load outcome and a builder state, which error type should fire?" Pulled them into a single parametrized matrix indexed by (load_side_effect, early_shutdown, actual_num_records). Net result: 5 named tests → 1 parametrized test with 6 cells, and the previously-missing empty_df + shutdown + partial salvage cell is now covered. Test names retain readable IDs (load_fails_shutdown_zero_records etc.) so failures still pinpoint the exact case in pytest output. |
||
|
|
05c2e8df2e
|
fix: normalize image_url blocks to OpenAI-compliant dict format (#577)
* fix: normalize image_url blocks to OpenAI-compliant dict format (#576) ImageContext.get_contexts() produced bare-string and non-standard dict shapes for image_url content blocks, which broke the native OpenAI adapter (passes blocks through as-is) and only worked with Anthropic by accident via defensive handling in the translation layer. - Wrap all image_url values in {"url": ...} dict (OpenAI spec) - Remove non-standard "format" key from base64 dicts - Tighten Anthropic translate_image_url_block to require dict input Fixes #576 Made-with: Cursor * fix: reject malformed image_url blocks instead of silently dropping them translate_image_url_block now raises TypeError when image_url is not a dict. Since all image_url blocks are constructed internally, a bare string indicates an internal bug and should fail loudly. Made-with: Cursor * address review: tighten return type, add OpenAI + data-URI tests - Narrow _auto_resolve_context_value return type to dict[str, str] - Add OpenAI-client regression tests for image_url dict passthrough - Cover both bare-URL and bare-data-URI rejection in Anthropic tests Made-with: Cursor |
||
|
|
bfa7a46c9a
|
chore: async engine readiness - blockers and polish before default (#553)
* chore: async engine readiness blockers (#462) - Processor callback failures (pre-batch and post-batch) now raise DatasetGenerationError instead of silently dropping row groups - Early shutdown and all error paths drain in-flight workers via a finally block in AsyncTaskScheduler.run() - Pre-batch and post-batch processors that change row count in async mode raise immediately (strict_row_count guard) - Partial completion logs a warning when actual < target records - allow_resize=True auto-falls back to sync engine with a deprecation warning instead of raising, using a per-run _use_async flag - Preview path mirrors the trace check from the full build path; PreviewResults exposes task_traces Closes #462 * fix: address review findings for async engine readiness - Prevent double-wrapping of DatasetGenerationError in scheduler callbacks - Fix stacklevel in allow_resize DeprecationWarning to point at user code - Update stale comment to reflect fail-fast behavior - Rename misleading test and remove unused caplog fixture - Add zero-warnings assertion for happy-path case - Move warnings import to module level * fix: address review comments on async engine readiness - Extract _is_async_trace_enabled() helper to deduplicate trace check - Post-batch row-count guard now raises DatasetProcessingError (not DatasetGenerationError) so the scheduler wraps it with rg_id symmetrically with the pre-batch path - Add test_dropped_rows_reduce_actual_record_count for partial completion path * fix: address second-round review feedback on async engine readiness - DeprecationWarning no longer swallowed by interface error wrapper - Incomplete-RG log only fires on clean scheduler exits - Post-batch row-count guard moved into ProcessorRunner (strict_row_count) - Expose active_worker_count property on AsyncTaskScheduler - Drop unused monkeypatch fixture and pytest import * test: fold metadata-count test into dropped-rows test Remove test_write_metadata_records_actual_and_target_counts (poked _actual_num_records directly) and assert metadata counts in test_dropped_rows_reduce_actual_record_count instead, which exercises the same path through the public API. |
||
|
|
64f31bc581
|
feat: add generic and OpenRouter attribution headers (#542) | ||
|
|
d43ac1cb2e
|
test: add transport-wiring regression tests for #459 (#485)
* test: add transport-wiring regression tests for #459 The existing test_client_limits_respect_max_parallel_requests only checks the limits property, which was always computed correctly even before the fix. Add mock-capture tests that patch HTTPTransport and AsyncHTTPTransport constructors, trigger lazy init via completion() / acompletion(), and assert the constructors received the correct limits. These tests fail on the pre-fix code (assert_called_once fails because the old code never explicitly constructed the transport). Made-with: Cursor * test: address review — parametrize over AnthropicClient, use helpers Parametrize all three #459 regression tests over both OpenAICompatibleClient and AnthropicClient (wiring lives in HttpModelClient, so both subclasses need coverage). Use the existing _make_openai_client / _make_anthropic_client helpers with **kwargs instead of constructing clients directly. Move transport patch constants up with the other module-level constants. Made-with: Cursor |
||
|
|
97119863da
|
fix: respect max_parallel_requests in HTTP connection pool size (#460)
* fix: respect max_parallel_requests in HTTP connection pool size Pass a pre-configured HTTPTransport/AsyncHTTPTransport with the correct limits into RetryTransport instead of letting it create its own pool with httpx defaults (100 connections). Previously, the limits calculated from max_parallel_requests were passed to httpx.Client(limits=...) which silently ignores them when a custom transport is provided. Fixes #459 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * fix: document transport param and annotate private attr chains in tests Address Greptile P2 review comments on PR #460: - Add docstring entry for the new `transport` parameter in `create_retry_transport` explaining accepted types and None default - Add inline comments in pool-size regression tests explaining the private attribute chain (_sync/_async_transport → _pool → _max_connections) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * refactor: address review feedback — public limits property, clean tests - Drop three tests from test_retry.py that reached into private attributes of third-party RetryTransport (_sync_transport, _async_transport); the end-to-end contract is covered by the pool-size regression test - Expose a public `limits` property on HttpModelClient so tests and diagnostic code can assert the pool configuration without walking private attribute chains across three libraries - Replace two private-chain pool assertions with a single `client.limits.max_connections == 600` check against the new property - Trim "inner" from the transport docstring entry (nabinchha suggestion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> --------- Signed-off-by: Przemysław <przemekboruta@interia.pl> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com> |
||
|
|
c146af5146
|
chore: async engine follow-up - rename, preview, lifecycle, progress (#456)
* chore: rename ColumnWiseDatasetBuilder, wire async preview, unify row-group lifecycle - Rename ColumnWiseDatasetBuilder to DatasetBuilder and column_wise_builder.py to dataset_builder.py, update all references - Extract _prepare_async_run() factory shared by build and preview paths - Add _build_async_preview() for async preview with no disk checkpoints - Replace on_row_group_complete/on_checkpoint_complete with single on_finalize_row_group callback; caller handles checkpointing - Add free_row_group() on RowGroupBufferManager for discard-without-write - Free fully-dropped row groups instead of finalizing them - Add consolidated AsyncProgressReporter for async generation logging Closes #437, closes #442, closes #444 * feat: add consolidated async progress reporting with row group context - Add AsyncProgressReporter: groups per-column progress into a single log block emitted at configurable intervals (default 5s) - Add quiet mode to ProgressTracker to suppress per-tracker logging when used with the consolidated reporter - Add ContextVar-based row group tagging (RG1, RG2, ...) for log messages emitted inside async tasks (samplers, expressions, seeds) - Add progress_interval to RunConfig for user-configurable reporting - Remove log_start_async from ProgressTracker (superseded by reporter) Closes #443 * fix async preview and progress reporting * feat: add opt-in sticky ANSI progress bars for generation Add RunConfig.progress_bar setting that replaces periodic log-line progress with sticky terminal bars that stay at the bottom while logs scroll above. Pure ANSI escape codes, no new dependencies. Disabled by default - existing log-based output unchanged. * docs: add missing progress_interval docstring in RunConfig * fix: update progress bar on every completion in async path Skip the time gate when the progress bar is active so the bar redraws on every record instead of every progress_interval seconds. * fix: resolve row-group semaphore deadlock when all tasks are deferred When all tasks for admitted row groups fail with transient errors, the row-group semaphore never releases, blocking admission of new row groups. Fix by salvaging stalled row groups inline - retrying deferred tasks immediately so row groups can checkpoint and free their semaphore slots. Also updates row group log format to (x/X) with leading zeros. * fix: eagerly salvage stalled row groups to avoid wasting semaphore slots Run inline salvage after every checkpoint pass instead of only when globally stalled. Row groups with 0 in-flight and only deferred tasks are salvaged immediately, freeing their semaphore slot for new work. * fix: address review findings from greptile and codex - Use `with` statement for progress bar context (safe __exit__ on error) - Check bar.is_active instead of bar is not None (non-TTY fallback) - Record failures (not skips) for tasks that exhaust salvage retries - Record skipped tasks when pre-batch filtering drops rows * fix: stable progress bar width and accurate failure counts - Pre-compute fixed stats width at bar creation to prevent bar resizing when failed count appears - Cap displayed completed at total to avoid >100% on retries - Exclude already-failed columns from skip recording to prevent double-counting in progress reporter * fix: address Nabin's review - exclude_columns, dead code, docstring - Add exclude_columns={task.column} on non-retryable batch/from_scratch drop path to prevent double-counting (same pattern as cell path) - Simplify salvage drop to per-task exclude (is_dropped guard handles multi-column case) - Remove dead _in_flight_for_rg method - Fix context.py docstring to match actual (x/X) format * fix: _drain_frontier exits before dispatching ready salvage tasks After salvage discards a cell task from dispatched (making it available in the frontier), _drain_frontier broke immediately because nothing was in-flight yet. The task and its downstream were never re-dispatched, leaving the row group incomplete. Fix: only break when both ready and in-flight are empty. * fix: salvage edge cases found by code review - _drain_frontier: only break when both ready and in-flight are empty - _salvage_rounds: re-mark sibling columns as dispatched after from_scratch retry to prevent duplicate dispatch - _salvage_stalled_row_groups: separate exhausted tasks from new drain failures to avoid treating non-stalled tasks as permanent - _checkpoint_completed_row_groups: clean up deferred tasks for checkpointed row groups - Early shutdown: salvage stalled row groups before exiting * fix: skip record_failure for already-dropped rows in salvage When multiple columns fail for the same row, the first drop records a skip for the other column. Without this guard, record_failure fires again for the second column, double-counting it. |
||
|
|
1356408c6e
|
feat: remove litellm dependency and bridge path (#455)
* adjust plan * feat: remove litellm dependency and bridge path (PR-7) - Delete litellm_bridge.py adapter, litellm_overrides.py, and their tests - Remove LiteLLM fallback branch and DATA_DESIGNER_MODEL_BACKEND env var from clients/factory.py; unknown provider_type now raises ValueError - Remove apply_litellm_patches() call from models/factory.py - Remove LiteLLM exception match arms and DownstreamLLMExceptionMessageParser from models/errors.py; port context window detail extraction to _extract_context_window_detail for native ProviderError path - Remove litellm from lazy_heavy_imports.py and pyproject.toml runtime deps - Remove flatten_extra_body parameter from TransportKwargs.from_request - Clean up LiteLLM references in docstrings, comments, and AGENTS.md - Add full ProviderErrorKind test coverage to test_model_errors.py - Update benchmark script to patch OpenAICompatibleClient instead of CustomRouter Made-with: Cursor * fix: forward tools to _fake_response in benchmark patch The old CustomRouter patch forwarded **kwargs (including tools) to _fake_response, but the new OpenAICompatibleClient patch only passed model and messages — silently disabling tool-call simulation in benchmark scenarios that exercise allow_tools. Made-with: Cursor * fix: address PR-7 review feedback - Return ChatCompletionResponse from benchmark fakes instead of FakeResponse to match the native client contract (facade expects .message, not .choices[0].message) - Add ids= to parametrize block in test_model_errors.py for readability - Remove unnecessary try/except from _extract_context_window_detail; the `if marker in` guard is sufficient - Make context window marker match case-insensitive - Replace stale httpx.AsyncClient callout in async_concurrency.py docstring with generic "async-stateful resources" Made-with: Cursor |
||
|
|
f24e88d0f8
|
feat: wire ThrottledModelClient and dual-semaphore scheduler (#449)
* feat: constrain HttpModelClient to single concurrency mode Constrain each HttpModelClient instance to sync or async at construction time, eliminating dual-mode lifecycle complexity that caused transport leaks and cross-mode teardown bugs. - Add ClientConcurrencyMode StrEnum replacing Literal type alias - Add concurrency_mode constructor param with mode enforcement guards on _get_sync_client / _get_async_client - Simplify close()/aclose() to single-mode teardown (cross-mode calls are no-ops) - Thread client_concurrency_mode through factory chain from DATA_DESIGNER_ASYNC_ENGINE env var - Add ModelRegistry.arun_health_check() async mirror and wire async dispatch in ColumnWiseDatasetBuilder - Make ensure_async_engine_loop public (used cross-module) - Fix test helpers to derive concurrency mode from injected client - Add PR-5 architecture notes * fix: address review findings on HttpModelClient lifecycle - Fix transport double-close in close()/aclose() by delegating teardown to the httpx client when one exists (if/elif pattern); only close transport directly if no client was ever created - Reject mismatched client/mode injection in constructor (e.g. async_client on a sync-mode instance raises ValueError) - Add 5-minute wall-clock timeout to future.result() in async health check dispatch - Add constructor validation tests for both mismatch directions - Update PR-5 architecture notes Made-with: Cursor * fix: address PR-5 review feedback on HttpModelClient lifecycle - Type _transport as RetryTransport | None, removing type: ignore suppressions in close()/aclose() - Make transport fully lazy (None by default) and accept optional transport constructor param so injected-client paths don't eagerly allocate an unused RetryTransport - Cancel future on TimeoutError in async health check dispatch so timed-out coroutines don't linger on the shared event loop - Set health check timeout to 180s (3 min) matching architecture notes - Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the list parametrizes both sync and async mode tests - Update architecture notes timeout from 180→300 back to 180 to match implementation Made-with: Cursor * add plan for pr-6 * update plan * feat: wire ThrottledModelClient and dual-semaphore scheduler for AIMD concurrency control Introduces per-request AIMD throttle feedback by wrapping every ModelClient with ThrottledModelClient (acquire/release around each HTTP call) and adding a dual-semaphore scheme to AsyncTaskScheduler that separates submission slots from LLM-wait slots via a one-way handoff. Key changes: - ThrottledModelClient: decorator that acquires/releases ThrottleManager permits around sync and async completion, embedding, and image calls, routing 429s to release_rate_limited for AIMD backoff. - RetryConfig: removes 429 from retryable_status_codes so rate-limit signals propagate to the AIMD feedback loop instead of being silently retried at the transport layer. - AsyncTaskScheduler: adds TrackingSemaphore, LLM-wait semaphore, and is_llm_bound-driven handoff; extracts _main_dispatch_loop and _salvage_rounds; tracks worker tasks for clean cancellation. - ColumnGenerator.is_llm_bound property on base, model-registry, and custom generators to classify which columns need the handoff. - ThrottleConfig (Pydantic model on RunConfig) for user-tunable AIMD knobs (reduce_factor, additive_increase, success_window, block_seconds). - ModelRegistry.get_aggregate_max_parallel_requests() to size the LLM-wait semaphore from registered model configs. - Renames throttle.py -> throttle_manager.py for clarity. Made-with: Cursor * fix: address PR-6 review feedback on concurrency safety - Shield _cancel_workers() with asyncio.shield in the CancelledError handler to prevent double-cancellation from leaking semaphore permits - Guard ThrottleManager release calls in _throttled_sync/_athrottled with try/except so a failing release doesn't mask the original error or permanently leak the in-flight permit counter - Add registration guard in ThrottleManager.try_acquire() that raises RuntimeError if called before register(), making the ordering invariant explicit rather than relying on temporal coupling - Add is_registered() public query method on ThrottleManager - Expand TrackingSemaphore docstring noting CPython _value dependency Made-with: Cursor * fix test * feat: AIMD throttle refinements — cascade dampening, ceiling stabilization, early shutdown exclusion Dampen aggressive cascade 429 reduction (only the first 429 in a burst reduces the limit; subsequent cascade 429s release permits without further reduction). Add rate_limit_ceiling with configurable overshoot to stabilize concurrency instead of sawtooth recovery. Exclude ModelRateLimitError from early shutdown error rate. Consolidate AIMD defaults into ThrottleConfig ClassVar constants as single source of truth, and update ThrottleManager to accept ThrottleConfig directly. Update architecture notes for PR-6. Made-with: Cursor * fix: address PR-6 review feedback on retry boundary, capacity polling, and test coverage - Keep 429 in transport retry list for sync-mode clients (no salvage queue); only strip for async-mode where AIMD + salvage handles it. Add strip_rate_limit_codes kwarg to create_retry_transport; wire through HttpModelClient based on concurrency mode. - Return CAPACITY_POLL_INTERVAL (50ms) instead of default_block_seconds (2s) from try_acquire when at capacity but not rate-limited, so callers poll responsively when a slot could free in milliseconds. - Document registration ordering invariant on create_model_client's throttle_manager param. - Add semaphore permit assertions to test_scheduler_llm_bound_one_way_handoff matching the non-LLM counterpart test. - Update architecture notes for all changes. Made-with: Cursor * add missing test * fix: consolidate ThrottleConfig and improve naming Move ceiling_overshoot from ThrottleManager constructor kwarg into ThrottleConfig (single source of truth for all AIMD tuning knobs). Rename block_seconds → cooldown_seconds and block_duration → cooldown_duration throughout for clarity. Remove DEFAULT_CEILING_OVERSHOOT module constant from throttle_manager.py. Made-with: Cursor * fix: annotate state capture invariant in acquire_sync/acquire_async Add inline comment explaining why the captured DomainThrottleState reference is safe to reuse in the finally block (objects are never replaced after creation). Made-with: Cursor |
||
|
|
015d4f188f
|
feat: Constrain HttpModelClient to single concurrency mode... (#439)
* feat: constrain HttpModelClient to single concurrency mode Constrain each HttpModelClient instance to sync or async at construction time, eliminating dual-mode lifecycle complexity that caused transport leaks and cross-mode teardown bugs. - Add ClientConcurrencyMode StrEnum replacing Literal type alias - Add concurrency_mode constructor param with mode enforcement guards on _get_sync_client / _get_async_client - Simplify close()/aclose() to single-mode teardown (cross-mode calls are no-ops) - Thread client_concurrency_mode through factory chain from DATA_DESIGNER_ASYNC_ENGINE env var - Add ModelRegistry.arun_health_check() async mirror and wire async dispatch in ColumnWiseDatasetBuilder - Make ensure_async_engine_loop public (used cross-module) - Fix test helpers to derive concurrency mode from injected client - Add PR-5 architecture notes * fix: address review findings on HttpModelClient lifecycle - Fix transport double-close in close()/aclose() by delegating teardown to the httpx client when one exists (if/elif pattern); only close transport directly if no client was ever created - Reject mismatched client/mode injection in constructor (e.g. async_client on a sync-mode instance raises ValueError) - Add 5-minute wall-clock timeout to future.result() in async health check dispatch - Add constructor validation tests for both mismatch directions - Update PR-5 architecture notes Made-with: Cursor * fix: address PR-5 review feedback on HttpModelClient lifecycle - Type _transport as RetryTransport | None, removing type: ignore suppressions in close()/aclose() - Make transport fully lazy (None by default) and accept optional transport constructor param so injected-client paths don't eagerly allocate an unused RetryTransport - Cancel future on TimeoutError in async health check dispatch so timed-out coroutines don't linger on the shared event loop - Set health check timeout to 180s (3 min) matching architecture notes - Rename _SYNC_CLIENT_CASES to _CLIENT_FACTORY_CASES since the list parametrizes both sync and async mode tests - Update architecture notes timeout from 180→300 back to 180 to match implementation Made-with: Cursor * fix: address Greptile review findings on health check methods - Use bare `raise` instead of `raise e` in both run_health_check and arun_health_check to preserve original traceback frames - Add GenerationType.IMAGE test coverage for sync and async health checks (stub-image config + generate_image/agenerate_image patches) Made-with: Cursor * Update packages/data-designer-engine/tests/engine/models/test_model_registry.py |
||
|
|
a9a16ae61a
|
feat: Native Anthropic adapter with shared HTTP client infrastructure (#426)
* plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @dataclass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * add native OpenAI adapter with retry and throttle infrastructure - Implement OpenAICompatibleClient using httpx with RetryTransport - Add ThrottleManager with AIMD concurrency control and structured logging - Route provider_type=openai to native adapter in client factory - Add extract_reasoning_content helper for vLLM field migration - Make ModelRegistry own ThrottleManager and RetryConfig explicitly - Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override Made-with: Cursor * Self CR * fix claude slop * Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler * wrap facade close in try/catch * clean up stray params * fix: address review findings from model facade overhaul PR3 - Fix metadata bug: drop unknown kwargs instead of passing them to ChatCompletionRequest (which has no metadata field), preventing a runtime TypeError. - Lazy-init httpx clients: sync and async clients are created on first use instead of eagerly in the constructor, with constructor injection for testability. - Remove defensive getattr on httpx.Response.status_code (always present). - Add comment clarifying throttle manager wiring is deferred. - Refactor tests to use constructor injection instead of private attribute mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix stray inclusion of metadata * small regression fix * address more feedback * self review * Fixes * new test for aimd lifecycle * update plan docs * update plans with refs to prs * fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test lay init * fix timeout for openaicompatibleadapter * remove unused attr * fix: address review findings from PR #402 - Guard reasoning_content fallback with isinstance(str) check to prevent non-string provider values from violating the return type contract - Normalize provider_type comparison to lowercase so mixed-case configs (e.g. "OpenAI") route to the native adapter instead of silently falling through to the bridge - Document register-before-acquire ordering invariant on ThrottleManager - Add TODO to remove 429 from retryable_status_codes once ThrottleManager is wired via AsyncTaskScheduler (plan 346) Made-with: Cursor * Address pr feedback * fix method order * feat: native Anthropic adapter with image block translation Implement AnthropicClient as a native httpx-based adapter for the Anthropic Messages API, following the same patterns as OpenAICompatibleClient. Route provider_type="anthropic" to the native adapter in the client factory, with unknown providers falling through to LiteLLMBridgeClient. Key adapter behaviors: - System message extraction to top-level `system` parameter - OpenAI image_url content blocks translated to Anthropic image/source format - tool_use / thinking content blocks normalized to canonical types - x-api-key auth (not Bearer), anthropic-version header - max_tokens defaults to 4096 when omitted - stop mapped to stop_sequences - Embeddings and image generation raise unsupported capability errors Made-with: Cursor * fix: exclude OpenAI-specific params from Anthropic payload and drop unnecessary re.DOTALL Expand _ANTHROPIC_EXCLUDE to filter response_format, frequency_penalty, presence_penalty, and seed from TransportKwargs forwarding. Remove the unnecessary re.DOTALL flag from _DATA_URI_RE since base64 data never contains newlines. Made-with: Cursor * Fix failing test * fix: address PR #402 review findings - Add POST to RetryTransport allowed_methods so retries actually fire for all adapter endpoints (chat, embeddings, images are all POST) - Guard close()/aclose() with _init_lock to prevent TOCTOU race with lazy client initialization - Detect "unsupported parameter" patterns in 400 responses so the native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST - Return None from coerce_message_content for image-only content lists instead of leaking the Python repr via str(content) - Restore per-request timeout forwarding for LiteLLMBridgeClient via _with_timeout helper (broken when "timeout" was added to _META_FIELDS) - Normalize ModelProvider.provider_type to lowercase via field_validator so consumers don't need per-site .lower() calls - Fix unclosed httpx.AsyncClient in lazy-init test Made-with: Cursor * updates to DRY out code between the two adapters * refactor code to introduce HttpModelClient * update tests * fix: address PR #426 review findings - Release _aclient in close() to prevent async connection pool leak when both sync and async clients were initialized - Drop malformed image_url blocks (missing image_url key) instead of forwarding them unchanged to the Anthropic API - Preserve image blocks in system messages by returning Anthropic block-list format when non-text content is present - Rename extract_system_text to extract_system_content and add merge_system_parts helper for mixed string/block system parts Made-with: Cursor * fix: improve error classification and surface provider messages for 400s - Handle /v1 in Anthropic endpoint gracefully to avoid path duplication - Add QUOTA_EXCEEDED provider error kind for credit/billing failures - Extend UNSUPPORTED_PARAMS detection for mutually exclusive params - Surface raw provider message in formatted errors for 400 status codes - Consolidate provider message helpers into single _attach_provider_message Made-with: Cursor * fix: address PR #426 review findings (round 2) - Fix close() double-close of shared transport by closing self._transport directly instead of accessing private aclient._transport (critical) - Add TODO for threading.Lock → asyncio.Lock split (plan-346) - Remove unused _model_id from HttpModelClient and all callers - Export AnthropicClient from adapters __init__.py - Filter empty text blocks in translate_tool_result_content join - Move mock helpers to conftest.py with consistent json.dumps text default - Add __init__.py files to enable absolute imports from test conftest - Add bridge env override test for anthropic provider - Add ConnectionError and non-JSON response tests for AnthropicClient - Assert secret_resolver.resolve called with correct key ref in factory tests Made-with: Cursor * Update license headers * Fix anthropic tool call flow * fix: add explicit UNSUPPORTED_CAPABILITY error mapping - Add ModelUnsupportedCapabilityError so unsupported operations (e.g. Anthropic embeddings/image-generation) surface a specific error instead of falling through to generic ModelAPIError - Forward the provider's operation-specific message in the cause - Add parametrized test case for the new error path |
||
|
|
255e78da83
|
feat: Native OpenAI adapter with retry and AIMD throttle infrastructure (#402)
* plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @dataclass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * add native OpenAI adapter with retry and throttle infrastructure - Implement OpenAICompatibleClient using httpx with RetryTransport - Add ThrottleManager with AIMD concurrency control and structured logging - Route provider_type=openai to native adapter in client factory - Add extract_reasoning_content helper for vLLM field migration - Make ModelRegistry own ThrottleManager and RetryConfig explicitly - Support DATA_DESIGNER_MODEL_BACKEND=litellm_bridge env var override Made-with: Cursor * Self CR * fix claude slop * Updates after self-review. Simplify use of ThrottleManager in light of plan 346 scheduler * wrap facade close in try/catch * clean up stray params * fix: address review findings from model facade overhaul PR3 - Fix metadata bug: drop unknown kwargs instead of passing them to ChatCompletionRequest (which has no metadata field), preventing a runtime TypeError. - Lazy-init httpx clients: sync and async clients are created on first use instead of eagerly in the constructor, with constructor injection for testability. - Remove defensive getattr on httpx.Response.status_code (always present). - Add comment clarifying throttle manager wiring is deferred. - Refactor tests to use constructor injection instead of private attribute mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix stray inclusion of metadata * small regression fix * address more feedback * self review * Fixes * new test for aimd lifecycle * update plan docs * update plans with refs to prs * fix: cap acquire_sync/acquire_async sleep to remaining budget to prevent timeout overshoot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test lay init * fix timeout for openaicompatibleadapter * remove unused attr * fix: address review findings from PR #402 - Guard reasoning_content fallback with isinstance(str) check to prevent non-string provider values from violating the return type contract - Normalize provider_type comparison to lowercase so mixed-case configs (e.g. "OpenAI") route to the native adapter instead of silently falling through to the bridge - Document register-before-acquire ordering invariant on ThrottleManager - Add TODO to remove 429 from retryable_status_codes once ThrottleManager is wired via AsyncTaskScheduler (plan 346) Made-with: Cursor * Address pr feedback * fix method order * Fix failing test * fix: address PR #402 review findings - Add POST to RetryTransport allowed_methods so retries actually fire for all adapter endpoints (chat, embeddings, images are all POST) - Guard close()/aclose() with _init_lock to prevent TOCTOU race with lazy client initialization - Detect "unsupported parameter" patterns in 400 responses so the native path returns UNSUPPORTED_PARAMS instead of generic BAD_REQUEST - Return None from coerce_message_content for image-only content lists instead of leaking the Python repr via str(content) - Restore per-request timeout forwarding for LiteLLMBridgeClient via _with_timeout helper (broken when "timeout" was added to _META_FIELDS) - Normalize ModelProvider.provider_type to lowercase via field_validator so consumers don't need per-site .lower() calls - Fix unclosed httpx.AsyncClient in lazy-init test |
||
|
|
5783435979
|
feat: Improve generation failure reporting for schema and timeout failures (#416) | ||
|
|
d6b8433e25
|
fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) (#412)
* fix: preserve extra_body for LiteLLM to avoid UnsupportedParamsError (#409) TransportKwargs.from_request() flattened extra_body keys into top-level kwargs, causing LiteLLM to reject provider-specific params like reasoning_effort via its per-provider allowlist validation. Add a flatten_extra_body flag (default True for backward compat) so the LiteLLM bridge can opt out and preserve extra_body as a distinct kwarg that LiteLLM forwards without validation. Made-with: Cursor * fix: address PR #412 review comments Update stale docstrings in TransportKwargs and facade.py to reflect the new flatten_extra_body flag, and add an edge-case test for empty extra_body with flatten disabled. |
||
|
|
ebd33227c3
|
refactor: Decouple ModelFacade from LiteLLM via ModelClient adapter (#373)
* plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * add PR-2 architecture notes for model facade overhaul * save progress on pr2 * small refactor * address feedback * Address greptile comment in pr1 * refactor ProviderError from dataclass to regular Exception - Replace @dataclass + __post_init__ with explicit __init__ that calls super().__init__ properly, avoiding brittle field-ordering dependency - Store cause via __cause__ only, removing the redundant .cause attr - Update match pattern in handle_llm_exceptions for non-dataclass type - Rename shadowed local `fields` to `optional_fields` in TransportKwargs * Address greptile feedback * PR feedback * track usage tracking in finally block for images * pr feedback * wrap facade close in try/catch * clean up stray params * fix stray inclusion of metadata * small regression fix * address more feedback --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
351d70173c
|
fix: patch litellm ImageURLListItem to make index field optional (#384) (#385)
* monkey patch litellm to make index optional * address greptile feedbaack * fix tests * fix: add model_rebuild calls for proper test isolation Rebuild the Pydantic Message model after restoring the TypedDict to unpatched state and in the finally block so cached schemas don't leak across tests. Also assert that Message construction fails without the patch to prove it is necessary. Made-with: Cursor * fix greptile's weak suggestion |
||
|
|
68a7f7120f
|
feat: canonical model client types, protocols, and LiteLLM bridge adapter (#359)
* plans for model facade overhaul * update plan * add review * address feedback + add more details after several self reviews * update plan doc * address nits * Add cannonical objects * self-review feedback + address * add LiteLLMRouter protocol to strongly type bridge router param Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * simplify some things * add a protol for http response like object * move HttpResponse * update PR-1 architecture notes for lifecycle and router protocol Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR #359 feedback: exception wrapping, shared parsing, test improvements - Wrap all LiteLLM router calls in try/except to normalize raw exceptions into canonical ProviderError at the bridge boundary (blocking review item) - Extract reusable response-parsing helpers into clients/parsing.py for shared use across future native adapters - Add async image parsing path using httpx.AsyncClient to avoid blocking the event loop in agenerate_image - Add retry_after field to ProviderError for future retry engine support - Fix _to_int_or_none to parse numeric strings from providers - Create test conftest.py with shared mock_router/bridge_client fixtures - Parametrize duplicate image generation and error mapping tests - Add tests for exception wrapping across all bridge methods * Use contextlib to dry out some code * Address Greptile feedback: HTTP-date retry-after parsing, docstring clarity - Parse RFC 7231 HTTP-date strings in Retry-After header (used by Azure and Anthropic during rate-limiting) in addition to numeric delay-seconds - Clarify collect_non_none_optional_fields docstring explaining why f.default is None is the correct check for optional field forwarding - Add tests for HTTP-date and garbage Retry-After values * Address Greptile feedback: FastAPI detail parsing, comment fixes - Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE - Handle list-format detail arrays in _extract_structured_message for FastAPI/Pydantic validation errors - Document scope boundary for vision content in collect_raw_image_candidates * address feedback --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
1439bbea7e
|
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost. Key changes: - Defer controller imports to inside command functions - Remove eager re-export chains from CLI package __init__ files - Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time - Add lazy __getattr__ exports in interface/__init__.py - Replace module-level tokenizer init with cached lazy getter - Fix ModelProvider import to use config layer instead of engine - Update test mock paths to match new import locations Reduces CLI import-time from ~1.67s to ~0.46s. * perf: defer pandas/numpy in io_helpers and add config_list benchmark - Replace eager `from lazy_heavy_imports import pd, np` in io_helpers with module-level __getattr__ (for backwards-compatible external access / test mocks) and function-level imports in the 3 functions that actually use them (read_parquet_dataset, smart_load_dataframe, _convert_to_serializable). Importing io_helpers no longer triggers pandas/numpy loading. - Defer heavy imports in list and reset CLI commands into function bodies to avoid loading repositories, Rich, and prompt_toolkit at module import time. - Add `config_list` (data-designer config list) measurement to the CLI startup benchmark with isolated cold measurement in a separate venv and a --skip-config-list-check flag. - Update test mock paths to match new import locations. * Refine lazy import usage and TYPE_CHECKING cleanup * Run license header updater on PR-touched files * fix: update sqlfluff mock target for lazy imports in test_sql * perf: cache globals() in lazy __getattr__ to avoid repeated lookups Add globals() caching and explanatory comment to all three lazy __getattr__ implementations (lazy_heavy_imports, config/__init__, interface/__init__) so subsequent attribute accesses bypass __getattr__. * perf: lazy CLI command loading and deferred heavy import evaluations - Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files - Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes - Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks - Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use - Update test mock targets to patch at usage-site for module-level imports * refactor: use direct pandas import in seed_source_dataframe Drop lazy-loading for pandas in DataFrameSeedSource; use direct import for simplicity. * update lazy import pattern * update tests to use lazy import namespace Switch test modules to import data_designer.lazy_heavy_imports as lazy and reference heavy libraries through that namespace. This keeps heavy imports deferred during module import and aligns tests with the new lazy-import usage pattern. * tighten import perf test thresholds Document recent baseline timings and lower the allowed average import time and timeout so regressions are detected sooner. * document pandas import requirement Clarify that Pydantic needs DataFrame resolved at module load and that keeping the direct import preserves IDE typing support. * increase timeout time * use lazy pandas imports in visualization tests - replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports - add TYPE_CHECKING pandas import and keep CLI controller imports sorted * fix lazy pandas runtime usage and preview mocks Switch sample-record handling to lazy pandas types so runtime paths no longer depend on TYPE_CHECKING imports. Align preview controller tests to patch the module-local DataDesigner symbol, preventing real engine invocation in save results scenarios. |
||
|
|
8a28232640
|
feat(engine): env-var switch for async-first models experiment (#280) | ||
|
|
8e2fd3286f
|
feat: add image generation support with multi-modal context (#317) | ||
|
|
631f1f970e
|
fix: trim LLM response content before parsing (#322)
Strip leading and trailing whitespace from final LLM response content so parsers receive normalized text. Add a parametrized unit test that covers multiple whitespace patterns. |
||
|
|
f74f25872c
|
chore: quiet tool call logs and add tool usage statistics (#293)
* add tool usage statistics tracking - Add ToolUsageStats class with metrics for tool calls, turns, and statistical aggregates (mean/stddev per generation) - Extend ModelUsageStats to include tool_usage tracking - Update ModelFacade.generate() to track total tool calls and turns - Update tests with tool_call_count method and new assertions * silence noisy mcp logs * log message updates * add tools enabled info message * exclude empty tool_usage from usage stats output * add tool usage summary logging after column generation - Track tool usage snapshots before/after column processing - Log mean tool calls per generation for columns with tools enabled - Add get_tool_usage_snapshot/get_tool_usage_delta methods to ModelRegistry - Remove unused extra_info parameter from progress_tracker.log_start() - Add comprehensive tests for ToolUsageStats * pretty format model usage logs * reuse stubs and fixtures * add merge method to ToolUsageStats for accurate stats aggregation The previous implementation used extend() to combine tool usage stats, but extend() is designed for single generation data. This caused incorrect stddev calculations when merging stats from multiple sources. - Add ToolUsageStats.merge() that properly combines sum-of-squares - Update ModelUsageStats.extend() to use merge() for tool usage - Add tests verifying stddev accuracy after merging * fix tool usage stats missing generations_with_tools count When tracking tool usage after generation, the ToolUsageStats was created without setting generations_with_tools, causing the usage summary to report zeros for calls/gen and turns/gen metrics. * fix tool usage delta objects returning incorrect stddev values - Simplify facade API to use tool_usage.extend() directly - Return NaN for stddev when sum of squares wasn't tracked - Add docstring to get_tool_usage_delta explaining NaN behavior - Add comprehensive tests for stddev variance calculation * fix tool usage delta stddev by including sum of squares in deltas Convert sum_of_squares_turns and sum_of_squares_calls from private attributes to public fields, enabling them to be included in delta calculations. This allows get_tool_usage_delta to return objects that compute accurate stddev values instead of NaN. * fix test to use get_tool_usage_snapshot for accurate stddev tracking The test was manually constructing a ToolUsageStats snapshot without sum_of_squares fields, causing stddev to be NaN. Now uses the proper snapshot method that includes all fields needed for delta calculations. * use nvidia-reasoning by default * mean -> average in log message * refactor log indentation to use centralized LOG_INDENT constant - Add LOG_INDENT constant to logging.py for consistent indentation - Replace hardcoded " |-- " strings across all log statements - Add tool alias and MCP provider info to pre-generation logs - Improve model usage log format for better consistency - Update tests to match new log formats * simplify usage stats dict access in model registry Remove defensive .get() calls and unnecessary type casts since the usage statistics dictionary structure is now guaranteed. * walrus baby * simplify tool usage tracking and reduce log verbosity - Remove mean/stddev calculations from ToolUsageStats in favor of simple counts and generation ratios - Add total_generations field to track all tool-enabled generations - Simplify registry log output to show generations ratio (with_tools/total) - Remove per-column tool usage snapshot/delta logging from column builder - Track tool usage for all tool-enabled generations, not just those with calls * format inference parameters as multi-line log output - Add get_formatted_params() method to BaseInferenceParams - Add LOG_DOUBLE_INDENT constant for nested indentation - Update log_pre_generation() to display each parameter on its own line * update tests to use LOG_INDENT constants Align test assertions with the centralized log indentation constants introduced in the logging module refactor. * two-space consistency |
||
|
|
e6e58e692e
|
feat: MCP (Model Context Protocol) tool calling integration for LLM columns (#248) | ||
|
|
0d51539aa6
|
feat: add message trace support for LLM generation (#272)
Add support for capturing full conversation traces during LLM generation, enabling debugging and fine-tuning dataset creation. Changes: - Add `with_trace` field to LLMTextColumnConfig for per-column trace control - Add `debug_override_save_all_column_traces` to RunConfig for global trace - Introduce ChatMessage dataclass for structured message representation - Update ModelFacade.generate() to return full message trace - Rename trace column postfix from `__reasoning_trace` to `__trace` - Add comprehensive traces documentation Traces capture system/user/assistant messages in order, enabling visibility into the full generation conversation including correction retries. |
||
|
|
3d86a38dad
|
feat: support multiple images per column in image context (#257)
* allow image context column to have multiple images * pack multimodal context at the front before user text messages * Fix edge case with numpy array |
||
|
|
b238d06880
|
feat: allow skipping health checks (#244) | ||
|
|
c19f35639f
|
chore: add publish script and update license headers (#253) | ||
|
|
ae0665fa16
|
refactor: slim package refactor into three subpackages (#240)
* remove old structure * major shuffle * streamline project configs * update make commands * updates to make commands * remove essentials * initialize logger in interface * uv lock * ignore notepad * update workflows * fix e2e project config * generate colab notebooks * resolve default model settings in interface * fix build commands * update perf import make command * cleaning up some slop * update recipes * move conftest files to tests/ * update subpackage readmes * streamline config_logging * use exports * update perf import usage pattern * update for IDE behavior with ruff * remove engine's fixtures file * add note to about lazy imports * update dependencies * update docs * doc fixes * uv lock * updates to catch up with main * clean up makefile * remove package gitignores * define deps only once * isolate tests * add test for protetion rule * create temp dirs for isolated tests * catch up to main * update headers * re apply changes * better result summaries for isolated tests * move exports into top-level init * fix client importlib version syntax * catch up with main |