* feat(models): deprecate implicit default provider routing
Emit DeprecationWarning whenever the legacy "implicit default
provider" path is exercised: `ModelConfig.provider=None`, the
registry-level `ModelProviderRegistry.default`, the YAML
`default:` key in `~/.data-designer/model_providers.yaml`, and
the CLI's "Change default provider" workflow.
`resolve_model_provider_registry` skips passing `default=` in the
single-provider case so the common construction path stays quiet.
Multi-provider registries still pass `default` (per
`check_implicit_default`) and warn accordingly.
Update docs, the package README, and test fixtures to specify
`provider=` explicitly on every `ModelConfig`. New tests cover
each warning entry point and pin the post-deprecation happy paths.
Refs #589
Made-with: Cursor
* fix(models): address PR #594 review feedback
Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.
Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).
johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.
johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.
johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.
johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.
Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
roundtrip stops exercising the deprecated YAML-default path
unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
`test_get_provider`, and
`test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
in `pytest.warns` so the suite stays green under
`-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
pin the cascade-dedup behavior.
Refs #589
Made-with: Cursor
* fix(models): make deprecation warnings visible under default filters
andreatgretel (PR #594): the YAML-default warning in
`get_default_provider_name` and the registry-default warning emitted
from inside DataDesigner helpers were attributing to data_designer
library frames, not user code. Python's default filter chain includes
`ignore::DeprecationWarning`, so library-attributed entries are
silenced — meaning a normal `DataDesigner()` call with a YAML
`default:` set showed nothing, and `resolve_model_provider_registry`
warnings were similarly invisible. Two related changes:
1. `warn_at_caller`: extend the default skip-list from `("pydantic",)`
to `("pydantic", "pydantic_core", "data_designer")` so the walk
escapes both pydantic's validator-dispatch frames and data_designer
helper frames before attributing. Also tighten the prefix predicate
to exact-or-dotted-prefix matching (`name == p or
name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not
falsely matched as part of `pydantic` (johnnygreco nit). Allow
callers to pass a custom `skip_prefixes` for flexibility. Drop the
"skip frame 0+1 unconditionally" guard now that prefix matching
covers it.
2. `get_default_provider_name`: switch from
`warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous
stacklevel pointed into `default_model_settings.py`, which is a
library file → silenced under default filters. Verified the fix
empirically with `python -W default`: warning is now attributed to
the user's call site and rendered.
johnnygreco (PR #594): add the missing
`test_explicit_default_none_does_not_emit_deprecation_warning`
regression for the `self.default is not None` predicate landed in
the prior round.
Tests:
- New `test_warning_helpers.py` pins prefix-matching precision
(rejects `pydantic_helpers` / `data_designer_other`), default
skip-list contents, attribution past skip-prefix frames, and
per-call-site dedup behavior.
- `test_get_default_provider_name_warning_attributes_to_user_frame`
pins andreatgretel's repro for the YAML-default site.
- `test_explicit_default_warning_attributes_to_user_frame` pins the
multi-frame case: construction goes through
`resolve_model_provider_registry`, so the walk has to escape both
pydantic and data_designer before landing on the test file.
- `test_explicit_default_none_does_not_emit_deprecation_warning`
pins johnnygreco's predicate-tightening regression.
3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net
from this round).
Refs #589
Made-with: Cursor
* fix(models): apply warn_at_caller to remaining deprecation sites
greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s
YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`,
which attributes to whichever data_designer frame called `load()` —
controllers, services, list/reset commands, agent introspection. Every
real call path lands on `data_designer.cli.*`, which falls under
Python's default `ignore::DeprecationWarning` filter and is silenced.
Audit found two more sites with the same problem:
- `DatasetBuilder._resolve_async_compatibility` (`allow_resize` /
issue #552) — was using `stacklevel=4` to walk past
`_resolve_async_compatibility -> build/build_preview -> interface ->
user`. Brittle: any added frame (decorator, async wrapping, the
`try/except DeprecationWarning: raise` boundary) shifts attribution
silently. The existing test passed only because it used
`simplefilter("always") + record=True`, which records warnings
regardless of attribution.
- `ProviderController._handle_change_default` — was using
`stacklevel=2`, which lands on the menu dispatcher in the same
controller module. `print_warning` already shows the message
visually, but programmatic observers (`pytest.warns`,
`filterwarnings("error", ...)`) saw a library-attributed entry that
default filters silenced.
All three migrated to `warn_at_caller` (the helper from 247fa30) so
attribution lands on the user's call site regardless of internal
chain shape. `data_designer` is already in
`DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library
in one pass.
Added attribution regression tests at each site asserting
`warning.filename == __file__`. A future regression to
`warnings.warn(stacklevel=N)` now fails CI instead of silently
silencing the user-facing nudge:
- `test_load_with_yaml_default_attributes_warning_to_caller`
(test_provider_repository.py)
- `test_resolve_async_compatibility` extended with the same assertion
- `test_handle_change_default_emits_deprecation_warning` rewritten
from `pytest.warns(...)` to a `catch_warnings(record=True)` block
that filters for the message and asserts `filename == __file__`
(`pytest.warns` does not check attribution, so the rewrite is
required to actually catch the regression).
3,125 tests pass (548 config + 1,923 engine + 654 interface).
Refs #589
8.3 KiB
Inference Parameters
Inference parameters control how models generate responses during synthetic data generation. Data Designer provides three types of inference parameters: ChatCompletionInferenceParams for text/code/structured generation, EmbeddingInferenceParams for embedding generation, and ImageInferenceParams for image generation.
Overview
When you create a ModelConfig, you can specify inference parameters to adjust model behavior. These parameters control aspects like randomness (temperature), diversity (top_p), context size (max_tokens), and more. Data Designer supports both static values and dynamic distribution-based sampling for certain parameters.
Chat Completion Inference Parameters
The ChatCompletionInferenceParams class controls how models generate text completions (for text, code, and structured data generation). It provides fine-grained control over generation behavior and supports both static values and dynamic distribution-based sampling.
Fields
| Field | Type | Required | Description |
|---|---|---|---|
temperature |
float or Distribution |
No | Controls randomness in generation (0.0 to 2.0). Higher values = more creative/random |
top_p |
float or Distribution |
No | Nucleus sampling parameter (0.0 to 1.0). Controls diversity by filtering low-probability tokens |
max_tokens |
int |
No | Maximum number of tokens to generate in the response (≥ 1) |
max_parallel_requests |
int |
No | Maximum concurrent API requests to this model (default: 4, ≥ 1). See Concurrency Control below. |
timeout |
int |
No | API request timeout in seconds (≥ 1) |
extra_body |
dict[str, Any] |
No | Additional parameters to include in the API request body |
!!! note "Default Values"
If temperature, top_p, or max_tokens are not provided, the model provider's default values will be used. Different providers and models may have different defaults.
!!! tip "Controlling Reasoning Effort for Reasoning Models"
For reasoning models like Nemotron 3 Super (nvidia/nemotron-3-super-120b-a12b) and GPT-OSS (gpt-oss-20b, gpt-oss-120b), you can control the reasoning effort using the extra_body parameter:
```python
import data_designer.config as dd
# High reasoning effort (more thorough, slower)
inference_parameters = dd.ChatCompletionInferenceParams(
extra_body={"reasoning_effort": "high"}
)
# Medium reasoning effort (balanced)
inference_parameters = dd.ChatCompletionInferenceParams(
extra_body={"reasoning_effort": "medium"}
)
# Low reasoning effort (faster, less thorough)
inference_parameters = dd.ChatCompletionInferenceParams(
extra_body={"reasoning_effort": "low"}
)
```
Temperature and Top P Guidelines
-
Temperature:
0.0-0.3: Highly deterministic, focused outputs (ideal for structured/reasoning tasks)0.4-0.7: Balanced creativity and coherence (general purpose)0.8-1.0: Creative, diverse outputs (ideal for creative writing)1.0+: Highly random and experimental
-
Top P:
0.1-0.5: Very focused, only most likely tokens0.6-0.9: Balanced diversity0.95-1.0: Maximum diversity, including less likely tokens
!!! tip "Adjusting Temperature and Top P Together" When tuning both parameters simultaneously, consider these combinations:
- **For deterministic/structured outputs**: Low temperature (`0.0-0.3`) + moderate-to-high top_p (`0.8-0.95`)
- The low temperature ensures focus, while top_p allows some token diversity
- **For balanced generation**: Moderate temperature (`0.5-0.7`) + high top_p (`0.9-0.95`)
- This is a good starting point for most use cases
- **For creative outputs**: Higher temperature (`0.8-1.0`) + high top_p (`0.95-1.0`)
- Both parameters work together to maximize diversity
**Avoid**: Setting both very low (overly restrictive) or adjusting both dramatically at once. When experimenting, adjust one parameter at a time to understand its individual effect.
Distribution-Based Inference Parameters
For temperature and top_p in ChatCompletionInferenceParams, you can specify distributions instead of fixed values. This allows Data Designer to sample different values for each generation request, introducing controlled variability into your synthetic data.
Uniform Distribution
Samples values uniformly between a low and high bound:
import data_designer.config as dd
inference_params = dd.ChatCompletionInferenceParams(
temperature=dd.UniformDistribution(
params=dd.UniformDistributionParams(low=0.7, high=1.0)
),
)
Manual Distribution
Samples from a discrete set of values with optional weights:
import data_designer.config as dd
# Equal probability for each value
inference_params = dd.ChatCompletionInferenceParams(
temperature=dd.ManualDistribution(
params=dd.ManualDistributionParams(values=[0.5, 0.7, 0.9])
),
)
# Weighted probabilities (normalized automatically)
inference_params = dd.ChatCompletionInferenceParams(
top_p=dd.ManualDistribution(
params=dd.ManualDistributionParams(
values=[0.8, 0.9, 0.95],
weights=[0.2, 0.5, 0.3] # 20%, 50%, 30% probability
)
),
)
Concurrency Control
The max_parallel_requests parameter controls how many concurrent API calls Data Designer makes to a specific model. This directly impacts throughput and should be tuned to match your inference server's capacity.
!!! tip "Performance Tuning" For recommended values by deployment type (NVIDIA API Catalog, vLLM, OpenAI, NIMs) and detailed optimization strategies, see the Architecture & Performance guide.
Embedding Inference Parameters
The EmbeddingInferenceParams class controls how models generate embeddings. This is used when working with embedding models for tasks like semantic search or similarity analysis.
Fields
| Field | Type | Required | Description |
|---|---|---|---|
encoding_format |
Literal["float", "base64"] |
No | Format of the embedding encoding (default: "float") |
dimensions |
int |
No | Number of dimensions for the embedding |
max_parallel_requests |
int |
No | Maximum concurrent API requests (default: 4, ≥ 1) |
timeout |
int |
No | API request timeout in seconds (≥ 1) |
extra_body |
dict[str, Any] |
No | Additional parameters to include in the API request body |
Image Inference Parameters
The ImageInferenceParams class is used for image generation models, including both diffusion models (DALL·E, Stable Diffusion, Imagen) and autoregressive models (Gemini image, GPT image). Unlike text models, image-specific options are passed entirely via extra_body, since they vary significantly between providers.
Fields
| Field | Type | Required | Description |
|---|---|---|---|
max_parallel_requests |
int |
No | Maximum concurrent API requests (default: 4, ≥ 1) |
timeout |
int |
No | API request timeout in seconds (≥ 1) |
extra_body |
dict[str, Any] |
No | Model-specific image options (size, quality, aspect ratio, etc.) |
Examples
import data_designer.config as dd
# Autoregressive model (chat completions API, supports image context)
dd.ModelConfig(
alias="image-model",
model="black-forest-labs/flux.2-pro",
provider="openrouter",
inference_parameters=dd.ImageInferenceParams(
extra_body={"height": 512, "width": 512}
),
)
# Diffusion model (e.g., DALL·E, Stable Diffusion)
dd.ModelConfig(
alias="dalle",
model="dall-e-3",
provider="openai",
inference_parameters=dd.ImageInferenceParams(
extra_body={"size": "1024x1024", "quality": "hd"}
),
)
See Also
- Default Model Settings: Pre-configured model settings included with Data Designer
- Custom Model Settings: Learn how to create custom providers and model configurations
- Model Configurations: Learn about configuring model settings
- Model Providers: Learn about configuring model providers
- Architecture & Performance: Understanding separation of concerns and optimizing concurrency