* feat(models): deprecate implicit default provider routing
Emit DeprecationWarning whenever the legacy "implicit default
provider" path is exercised: `ModelConfig.provider=None`, the
registry-level `ModelProviderRegistry.default`, the YAML
`default:` key in `~/.data-designer/model_providers.yaml`, and
the CLI's "Change default provider" workflow.
`resolve_model_provider_registry` skips passing `default=` in the
single-provider case so the common construction path stays quiet.
Multi-provider registries still pass `default` (per
`check_implicit_default`) and warn accordingly.
Update docs, the package README, and test fixtures to specify
`provider=` explicitly on every `ModelConfig`. New tests cover
each warning entry point and pin the post-deprecation happy paths.
Refs #589
Made-with: Cursor
* fix(models): address PR #594 review feedback
Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.
Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).
johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.
johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.
johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.
johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.
Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
roundtrip stops exercising the deprecated YAML-default path
unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
`test_get_provider`, and
`test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
in `pytest.warns` so the suite stays green under
`-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
pin the cascade-dedup behavior.
Refs #589
Made-with: Cursor
* fix(models): make deprecation warnings visible under default filters
andreatgretel (PR #594): the YAML-default warning in
`get_default_provider_name` and the registry-default warning emitted
from inside DataDesigner helpers were attributing to data_designer
library frames, not user code. Python's default filter chain includes
`ignore::DeprecationWarning`, so library-attributed entries are
silenced — meaning a normal `DataDesigner()` call with a YAML
`default:` set showed nothing, and `resolve_model_provider_registry`
warnings were similarly invisible. Two related changes:
1. `warn_at_caller`: extend the default skip-list from `("pydantic",)`
to `("pydantic", "pydantic_core", "data_designer")` so the walk
escapes both pydantic's validator-dispatch frames and data_designer
helper frames before attributing. Also tighten the prefix predicate
to exact-or-dotted-prefix matching (`name == p or
name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not
falsely matched as part of `pydantic` (johnnygreco nit). Allow
callers to pass a custom `skip_prefixes` for flexibility. Drop the
"skip frame 0+1 unconditionally" guard now that prefix matching
covers it.
2. `get_default_provider_name`: switch from
`warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous
stacklevel pointed into `default_model_settings.py`, which is a
library file → silenced under default filters. Verified the fix
empirically with `python -W default`: warning is now attributed to
the user's call site and rendered.
johnnygreco (PR #594): add the missing
`test_explicit_default_none_does_not_emit_deprecation_warning`
regression for the `self.default is not None` predicate landed in
the prior round.
Tests:
- New `test_warning_helpers.py` pins prefix-matching precision
(rejects `pydantic_helpers` / `data_designer_other`), default
skip-list contents, attribution past skip-prefix frames, and
per-call-site dedup behavior.
- `test_get_default_provider_name_warning_attributes_to_user_frame`
pins andreatgretel's repro for the YAML-default site.
- `test_explicit_default_warning_attributes_to_user_frame` pins the
multi-frame case: construction goes through
`resolve_model_provider_registry`, so the walk has to escape both
pydantic and data_designer before landing on the test file.
- `test_explicit_default_none_does_not_emit_deprecation_warning`
pins johnnygreco's predicate-tightening regression.
3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net
from this round).
Refs #589
Made-with: Cursor
* fix(models): apply warn_at_caller to remaining deprecation sites
greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s
YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`,
which attributes to whichever data_designer frame called `load()` —
controllers, services, list/reset commands, agent introspection. Every
real call path lands on `data_designer.cli.*`, which falls under
Python's default `ignore::DeprecationWarning` filter and is silenced.
Audit found two more sites with the same problem:
- `DatasetBuilder._resolve_async_compatibility` (`allow_resize` /
issue #552) — was using `stacklevel=4` to walk past
`_resolve_async_compatibility -> build/build_preview -> interface ->
user`. Brittle: any added frame (decorator, async wrapping, the
`try/except DeprecationWarning: raise` boundary) shifts attribution
silently. The existing test passed only because it used
`simplefilter("always") + record=True`, which records warnings
regardless of attribution.
- `ProviderController._handle_change_default` — was using
`stacklevel=2`, which lands on the menu dispatcher in the same
controller module. `print_warning` already shows the message
visually, but programmatic observers (`pytest.warns`,
`filterwarnings("error", ...)`) saw a library-attributed entry that
default filters silenced.
All three migrated to `warn_at_caller` (the helper from 247fa30) so
attribution lands on the user's call site regardless of internal
chain shape. `data_designer` is already in
`DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library
in one pass.
Added attribution regression tests at each site asserting
`warning.filename == __file__`. A future regression to
`warnings.warn(stacklevel=N)` now fails CI instead of silently
silencing the user-facing nudge:
- `test_load_with_yaml_default_attributes_warning_to_caller`
(test_provider_repository.py)
- `test_resolve_async_compatibility` extended with the same assertion
- `test_handle_change_default_emits_deprecation_warning` rewritten
from `pytest.warns(...)` to a `catch_warnings(record=True)` block
that filters for the message and asserts `filename == __file__`
(`pytest.warns` does not check attribution, so the rewrite is
required to actually catch the regression).
3,125 tests pass (548 config + 1,923 engine + 654 interface).
Refs #589
4.5 KiB
Model Providers
Model providers are external services that host and serve models. Data Designer uses the ModelProvider class to configure connections to these services.
Overview
A ModelProvider defines how Data Designer connects to a provider's API endpoint. When you create a ModelConfig, you reference a provider by name, and Data Designer uses that provider's settings to make API calls to the appropriate endpoint.
!!! warning "Deprecated: implicit default provider routing"
Earlier versions of Data Designer let you omit provider= on ModelConfig and
fall back to a registry-level default — including the default: key in
~/.data-designer/model_providers.yaml. That implicit routing is deprecated
and will be removed in a future release. Always reference a provider by name on
every ModelConfig. A DeprecationWarning is now emitted when the legacy path
is exercised. See issue #589.
ModelProvider Configuration
The ModelProvider class has the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
name |
str |
Yes | Unique identifier for the provider (e.g., "nvidia", "openai", "openrouter") |
endpoint |
str |
Yes | API endpoint URL (e.g., "https://integrate.api.nvidia.com/v1") |
provider_type |
str |
No | Provider type: "openai" (default) or "anthropic". See Supported Provider Types below |
api_key |
str |
No | API key or environment variable name (e.g., "NVIDIA_API_KEY") |
extra_body |
dict[str, Any] |
No | Additional parameters to include in the request body of all API requests to the provider. |
extra_headers |
dict[str, str] |
No | Additional headers to include in all API requests to the provider. |
Supported Provider Types
Data Designer supports two provider types:
| Type | Description |
|---|---|
"openai" |
OpenAI-compatible chat completion API. This is the default and works with most providers, including NVIDIA NIM, vLLM, TGI, OpenRouter, Together AI, and OpenAI itself. |
"anthropic" |
Anthropic's native Messages API for Claude models. Use this when connecting directly to Anthropic's API. |
Most self-hosted and third-party endpoints expose an OpenAI-compatible API, so provider_type="openai" is the right choice in the majority of cases. Only use "anthropic" when connecting directly to Anthropic's API at https://api.anthropic.com.
Note: Previous versions of Data Designer supported additional provider types (e.g.,
"azure","bedrock","vertex_ai") via a LiteLLM bridge. These are no longer supported. If you were using one of these types, switch toprovider_type="openai"and point theendpointto an OpenAI-compatible proxy or gateway for that service.
API Key Configuration
The api_key field can be specified in two ways:
-
Environment variable name (recommended): Set
api_keyto the name of an environment variable (e.g.,"NVIDIA_API_KEY"). Data Designer will automatically resolve it at runtime. -
Plain-text value: Set
api_keyto the actual API key string. This is less secure and not recommended for production use.
# Method 1: Environment variable (recommended)
provider = ModelProvider(
name="nvidia",
endpoint="https://integrate.api.nvidia.com/v1",
api_key="NVIDIA_API_KEY", # Will be resolved from environment
)
# Method 2: Direct value (not recommended)
provider = ModelProvider(
name="nvidia",
endpoint="https://integrate.api.nvidia.com/v1",
api_key="nvapi-abc123...", # Direct API key
)
See Also
- Model Configurations: Learn about configuring models
- Inference Parameters: Detailed guide to inference parameters and how to configure them
- Default Model Settings: Pre-configured providers and model settings included with Data Designer
- Custom Model Settings: Learn how to create custom providers and model configurations
- Model Configurations: Learn about configuring models
- Inference Parameters: Detailed guide to inference parameters and how to configure them
- Configure Model Settings With the CLI: Use the CLI to manage providers and model settings
- Getting Started: Installation and basic usage example