Elgato_dark/DataDesigner

Fork 0

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Nabin Mulepati f73da1975c

CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled

Details

CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled

Details

CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled

Details

CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled

Details

CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled

Details

CI / Lint and Format Check (push) Has been cancelled

Details

CI / Check License Headers (push) Has been cancelled

Details

CI / Test Config (Python 3.10 on macos-latest) (push) Has been cancelled

Details

CI / Test Config (Python 3.11 on macos-latest) (push) Has been cancelled

Details

CI / Test Config (Python 3.12 on macos-latest) (push) Has been cancelled

Details

CI / Test Config (Python 3.13 on macos-latest) (push) Has been cancelled

Details

CI / Test Config (Python 3.10 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Config (Python 3.11 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Config (Python 3.12 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Config (Python 3.13 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.10 on macos-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.11 on macos-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.12 on macos-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.13 on macos-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled

Details

CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled

Details

CI / Coverage Check (Python 3.11) (push) Has been cancelled

Details

CI / End to end test (Python 3.10 on macos-latest) (push) Has been cancelled

Details

CI / End to end test (Python 3.11 on macos-latest) (push) Has been cancelled

Details

CI / End to end test (Python 3.12 on macos-latest) (push) Has been cancelled

Details

CI / End to end test (Python 3.13 on macos-latest) (push) Has been cancelled

Details

CI / End to end test (Python 3.10 on ubuntu-latest) (push) Has been cancelled

Details

CI / End to end test (Python 3.11 on ubuntu-latest) (push) Has been cancelled

Details

CI / End to end test (Python 3.12 on ubuntu-latest) (push) Has been cancelled

Details

CI / End to end test (Python 3.13 on ubuntu-latest) (push) Has been cancelled

Details

feat(models): deprecate implicit default provider routing (#594 )

* feat(models): deprecate implicit default provider routing

Emit DeprecationWarning whenever the legacy "implicit default
provider" path is exercised: `ModelConfig.provider=None`, the
registry-level `ModelProviderRegistry.default`, the YAML
`default:` key in `~/.data-designer/model_providers.yaml`, and
the CLI's "Change default provider" workflow.

`resolve_model_provider_registry` skips passing `default=` in the
single-provider case so the common construction path stays quiet.
Multi-provider registries still pass `default` (per
`check_implicit_default`) and warn accordingly.

Update docs, the package README, and test fixtures to specify
`provider=` explicitly on every `ModelConfig`. New tests cover
each warning entry point and pin the post-deprecation happy paths.

Refs #589

Made-with: Cursor

* fix(models): address PR #594 review feedback

Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.

Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).

johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.

johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.

johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.

johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.

Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
  roundtrip stops exercising the deprecated YAML-default path
  unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
  `test_get_provider`, and
  `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
  in `pytest.warns` so the suite stays green under
  `-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
  to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
  pin the cascade-dedup behavior.

Refs #589

Made-with: Cursor

* fix(models): make deprecation warnings visible under default filters

andreatgretel (PR #594): the YAML-default warning in
`get_default_provider_name` and the registry-default warning emitted
from inside DataDesigner helpers were attributing to data_designer
library frames, not user code. Python's default filter chain includes
`ignore::DeprecationWarning`, so library-attributed entries are
silenced — meaning a normal `DataDesigner()` call with a YAML
`default:` set showed nothing, and `resolve_model_provider_registry`
warnings were similarly invisible. Two related changes:

1. `warn_at_caller`: extend the default skip-list from `("pydantic",)`
   to `("pydantic", "pydantic_core", "data_designer")` so the walk
   escapes both pydantic's validator-dispatch frames and data_designer
   helper frames before attributing. Also tighten the prefix predicate
   to exact-or-dotted-prefix matching (`name == p or
   name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not
   falsely matched as part of `pydantic` (johnnygreco nit). Allow
   callers to pass a custom `skip_prefixes` for flexibility. Drop the
   "skip frame 0+1 unconditionally" guard now that prefix matching
   covers it.

2. `get_default_provider_name`: switch from
   `warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous
   stacklevel pointed into `default_model_settings.py`, which is a
   library file → silenced under default filters. Verified the fix
   empirically with `python -W default`: warning is now attributed to
   the user's call site and rendered.

johnnygreco (PR #594): add the missing
`test_explicit_default_none_does_not_emit_deprecation_warning`
regression for the `self.default is not None` predicate landed in
the prior round.

Tests:
- New `test_warning_helpers.py` pins prefix-matching precision
  (rejects `pydantic_helpers` / `data_designer_other`), default
  skip-list contents, attribution past skip-prefix frames, and
  per-call-site dedup behavior.
- `test_get_default_provider_name_warning_attributes_to_user_frame`
  pins andreatgretel's repro for the YAML-default site.
- `test_explicit_default_warning_attributes_to_user_frame` pins the
  multi-frame case: construction goes through
  `resolve_model_provider_registry`, so the walk has to escape both
  pydantic and data_designer before landing on the test file.
- `test_explicit_default_none_does_not_emit_deprecation_warning`
  pins johnnygreco's predicate-tightening regression.

3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net
from this round).

Refs #589

Made-with: Cursor

* fix(models): apply warn_at_caller to remaining deprecation sites

greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s
YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`,
which attributes to whichever data_designer frame called `load()` —
controllers, services, list/reset commands, agent introspection. Every
real call path lands on `data_designer.cli.*`, which falls under
Python's default `ignore::DeprecationWarning` filter and is silenced.
Audit found two more sites with the same problem:

- `DatasetBuilder._resolve_async_compatibility` (`allow_resize` /
  issue #552) — was using `stacklevel=4` to walk past
  `_resolve_async_compatibility -> build/build_preview -> interface ->
  user`. Brittle: any added frame (decorator, async wrapping, the
  `try/except DeprecationWarning: raise` boundary) shifts attribution
  silently. The existing test passed only because it used
  `simplefilter("always") + record=True`, which records warnings
  regardless of attribution.
- `ProviderController._handle_change_default` — was using
  `stacklevel=2`, which lands on the menu dispatcher in the same
  controller module. `print_warning` already shows the message
  visually, but programmatic observers (`pytest.warns`,
  `filterwarnings("error", ...)`) saw a library-attributed entry that
  default filters silenced.

All three migrated to `warn_at_caller` (the helper from 247fa30) so
attribution lands on the user's call site regardless of internal
chain shape. `data_designer` is already in
`DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library
in one pass.

Added attribution regression tests at each site asserting
`warning.filename == __file__`. A future regression to
`warnings.warn(stacklevel=N)` now fails CI instead of silently
silencing the user-facing nudge:

- `test_load_with_yaml_default_attributes_warning_to_caller`
  (test_provider_repository.py)
- `test_resolve_async_compatibility` extended with the same assertion
- `test_handle_change_default_emits_deprecation_warning` rewritten
  from `pytest.warns(...)` to a `catch_warnings(record=True)` block
  that filters for the message and asserts `filename == __file__`
  (`pytest.warns` does not check attribution, so the rewrite is
  required to actually catch the regression).

3,125 tests pass (548 config + 1,923 engine + 654 interface).

Refs #589

2026-05-05 13:39:12 -06:00

4.5 KiB

Raw Permalink Blame History

Model Providers

Model providers are external services that host and serve models. Data Designer uses the ModelProvider class to configure connections to these services.

Overview

A ModelProvider defines how Data Designer connects to a provider's API endpoint. When you create a ModelConfig, you reference a provider by name, and Data Designer uses that provider's settings to make API calls to the appropriate endpoint.

!!! warning "Deprecated: implicit default provider routing" Earlier versions of Data Designer let you omit provider= on ModelConfig and fall back to a registry-level default — including the default: key in ~/.data-designer/model_providers.yaml. That implicit routing is deprecated and will be removed in a future release. Always reference a provider by name on every ModelConfig. A DeprecationWarning is now emitted when the legacy path is exercised. See issue #589.

ModelProvider Configuration

The ModelProvider class has the following fields:

Field	Type	Required	Description
`name`	`str`	Yes	Unique identifier for the provider (e.g., `"nvidia"`, `"openai"`, `"openrouter"`)
`endpoint`	`str`	Yes	API endpoint URL (e.g., `"https://integrate.api.nvidia.com/v1"`)
`provider_type`	`str`	No	Provider type: `"openai"` (default) or `"anthropic"`. See Supported Provider Types below
`api_key`	`str`	No	API key or environment variable name (e.g., `"NVIDIA_API_KEY"`)
`extra_body`	`dict[str, Any]`	No	Additional parameters to include in the request body of all API requests to the provider.
`extra_headers`	`dict[str, str]`	No	Additional headers to include in all API requests to the provider.

Supported Provider Types

Data Designer supports two provider types:

Type	Description
`"openai"`	OpenAI-compatible chat completion API. This is the default and works with most providers, including NVIDIA NIM, vLLM, TGI, OpenRouter, Together AI, and OpenAI itself.
`"anthropic"`	Anthropic's native Messages API for Claude models. Use this when connecting directly to Anthropic's API.

Most self-hosted and third-party endpoints expose an OpenAI-compatible API, so provider_type="openai" is the right choice in the majority of cases. Only use "anthropic" when connecting directly to Anthropic's API at https://api.anthropic.com.

Note: Previous versions of Data Designer supported additional provider types (e.g., "azure", "bedrock", "vertex_ai") via a LiteLLM bridge. These are no longer supported. If you were using one of these types, switch to provider_type="openai" and point the endpoint to an OpenAI-compatible proxy or gateway for that service.

API Key Configuration

The api_key field can be specified in two ways:

Environment variable name (recommended): Set api_key to the name of an environment variable (e.g., "NVIDIA_API_KEY"). Data Designer will automatically resolve it at runtime.
Plain-text value: Set api_key to the actual API key string. This is less secure and not recommended for production use.

# Method 1: Environment variable (recommended)
provider = ModelProvider(
    name="nvidia",
    endpoint="https://integrate.api.nvidia.com/v1",
    api_key="NVIDIA_API_KEY",  # Will be resolved from environment
)

# Method 2: Direct value (not recommended)
provider = ModelProvider(
    name="nvidia",
    endpoint="https://integrate.api.nvidia.com/v1",
    api_key="nvapi-abc123...",  # Direct API key
)

4.5 KiB Raw Permalink Blame History

Model Providers

Overview

ModelProvider Configuration

Supported Provider Types

API Key Configuration

See Also

4.5 KiB

Raw Permalink Blame History