DataDesigner/docs/plugins/models.md

195 lines
10 KiB
Markdown
Raw Permalink Normal View History

docs: graduate plugins out of experimental mode (#603) * chore: add __init__.py to engine namespace subpackages Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc. * docs: flesh out docstrings on plugin extension-point classes Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations. * docs: graduate plugins out of experimental mode Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md. * docs: simplify plugin docs structure Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference. * docs: scroll wide tables horizontally instead of wrapping Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable. * docs: address PR #603 review feedback - Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers. * docs: split code reference by package * docs: add interface code reference * docs: add code reference overviews * docs: refine code reference pages * docs: improve code reference tables * docs: correct reference docstrings * docs: embed plugin catalog table * docs: note plugin discovery restart caveat * docs: explain generator base class choice * docs: mention async cell generator examples * docs: clarify plugin model usage * docs: clarify plugin model aliases * docs: address plugin review feedback * docs: update available plugins page
2026-05-06 22:12:44 +00:00
# Using Models in Plugins
Model access belongs in column generator implementations, not config objects. Keep the config declarative by asking users for model aliases, then resolve those aliases at runtime through the model registry.
Do not construct model clients in plugin configs, read API keys in configs, or bypass Data Designer's model providers. The engine builds a `ResourceProvider` and exposes its model registry to every generator at:
```python
self.resource_provider.model_registry
```
## Access the registry
Use a model-aware column generator base whenever your plugin needs the registry:
| Need | Base class | Registry access |
|------|------------|-----------------|
| Primary model alias | `ColumnGeneratorWithModel` | Use `self.model`, `self.model_config`, and `self.inference_parameters`. |
| Multiple aliases or provider inspection | `ColumnGeneratorWithModelRegistry` | Use `self.get_model(alias)`, `self.get_model_config(alias)`, and `self.get_model_provider_name(alias)`. |
`ColumnGeneratorWithModel` is a convenience subclass of `ColumnGeneratorWithModelRegistry`. It expects the config to have a `model_alias` field and resolves that one alias for you. For independent model calls, return `GenerationStrategy.CELL_BY_CELL` so the runtime can fan out rows like the built-in LLM, embedding, and image generators. Use full-column generation only when your plugin intentionally calls a batched API for the whole DataFrame.
```python
from __future__ import annotations
from data_designer.config.column_configs import GenerationStrategy
from data_designer.engine.column_generators.generators.base import ColumnGeneratorWithModel
from data_designer.engine.models.parsers.errors import ParserException
from data_designer_sentiment_label.config import SentimentLabelColumnConfig
def parse_sentiment_label(response: str) -> str:
label = response.strip().lower()
if label not in {"positive", "neutral", "negative"}:
raise ParserException("Expected exactly one of: positive, neutral, negative.", source=response)
return label
class SentimentLabelColumnGenerator(ColumnGeneratorWithModel[SentimentLabelColumnConfig]):
@staticmethod
def get_generation_strategy() -> GenerationStrategy:
return GenerationStrategy.CELL_BY_CELL
async def agenerate(self, data: dict) -> dict:
label, _ = await self.model.agenerate(
prompt=f"Classify the sentiment of this text: {data[self.config.source_column]}",
system_prompt="Return exactly one label: positive, neutral, or negative.",
parser=parse_sentiment_label,
max_correction_steps=self.resource_provider.run_config.max_conversation_correction_steps,
max_conversation_restarts=self.resource_provider.run_config.max_conversation_restarts,
purpose=f"running generation for column '{self.config.name}'",
)
data[self.config.name] = label
return data
```
The matching config must include `model_alias: str` as a normal user-facing field:
```python
from __future__ import annotations
from typing import Literal
from data_designer.config.base import SingleColumnConfig
class SentimentLabelColumnConfig(SingleColumnConfig):
column_type: Literal["sentiment-label"] = "sentiment-label"
source_column: str
model_alias: str
@property
def required_columns(self) -> list[str]:
return [self.source_column]
@property
def side_effect_columns(self) -> list[str]:
return []
```
Users set that alias from default model settings or from `DataDesignerConfigBuilder(model_configs=...)`.
## Use multiple models
If your plugin uses multiple model aliases, inherit from `ColumnGeneratorWithModelRegistry` and resolve each alias explicitly with `self.get_model(...)`.
feat: let column configs declare all model aliases for the startup health check (#626) * feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-11 17:33:50 +00:00
The startup model health check pings every alias your column declares. By default, `SingleColumnConfig.get_model_aliases()` returns the primary `model_alias` field, which covers single-model plugins for free. A config for this pattern might also define `judge_model_alias`, `critic_model_alias`, or another task-specific alias. Override `get_model_aliases()` to return every alias the column depends on so a typo, missing API key, or unreachable endpoint surfaces at startup instead of at first generation.
docs: graduate plugins out of experimental mode (#603) * chore: add __init__.py to engine namespace subpackages Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc. * docs: flesh out docstrings on plugin extension-point classes Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations. * docs: graduate plugins out of experimental mode Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md. * docs: simplify plugin docs structure Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference. * docs: scroll wide tables horizontally instead of wrapping Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable. * docs: address PR #603 review feedback - Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers. * docs: split code reference by package * docs: add interface code reference * docs: add code reference overviews * docs: refine code reference pages * docs: improve code reference tables * docs: correct reference docstrings * docs: embed plugin catalog table * docs: note plugin discovery restart caveat * docs: explain generator base class choice * docs: mention async cell generator examples * docs: clarify plugin model usage * docs: clarify plugin model aliases * docs: address plugin review feedback * docs: update available plugins page
2026-05-06 22:12:44 +00:00
feat: let column configs declare all model aliases for the startup health check (#626) * feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-11 17:33:50 +00:00
The matching config opts every alias into the standard startup health check by listing them all in `get_model_aliases()`:
docs: graduate plugins out of experimental mode (#603) * chore: add __init__.py to engine namespace subpackages Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc. * docs: flesh out docstrings on plugin extension-point classes Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations. * docs: graduate plugins out of experimental mode Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md. * docs: simplify plugin docs structure Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference. * docs: scroll wide tables horizontally instead of wrapping Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable. * docs: address PR #603 review feedback - Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers. * docs: split code reference by package * docs: add interface code reference * docs: add code reference overviews * docs: refine code reference pages * docs: improve code reference tables * docs: correct reference docstrings * docs: embed plugin catalog table * docs: note plugin discovery restart caveat * docs: explain generator base class choice * docs: mention async cell generator examples * docs: clarify plugin model usage * docs: clarify plugin model aliases * docs: address plugin review feedback * docs: update available plugins page
2026-05-06 22:12:44 +00:00
```python
from __future__ import annotations
from typing import Literal
from data_designer.config.base import SingleColumnConfig
class PairwiseJudgeColumnConfig(SingleColumnConfig):
column_type: Literal["pairwise-judge"] = "pairwise-judge"
question_column: str
model_alias: str
judge_model_alias: str
@property
def required_columns(self) -> list[str]:
return [self.question_column]
@property
def side_effect_columns(self) -> list[str]:
return []
feat: let column configs declare all model aliases for the startup health check (#626) * feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-11 17:33:50 +00:00
def get_model_aliases(self) -> list[str]:
return [self.model_alias, self.judge_model_alias]
docs: graduate plugins out of experimental mode (#603) * chore: add __init__.py to engine namespace subpackages Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc. * docs: flesh out docstrings on plugin extension-point classes Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations. * docs: graduate plugins out of experimental mode Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md. * docs: simplify plugin docs structure Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference. * docs: scroll wide tables horizontally instead of wrapping Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable. * docs: address PR #603 review feedback - Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers. * docs: split code reference by package * docs: add interface code reference * docs: add code reference overviews * docs: refine code reference pages * docs: improve code reference tables * docs: correct reference docstrings * docs: embed plugin catalog table * docs: note plugin discovery restart caveat * docs: explain generator base class choice * docs: mention async cell generator examples * docs: clarify plugin model usage * docs: clarify plugin model aliases * docs: address plugin review feedback * docs: update available plugins page
2026-05-06 22:12:44 +00:00
```
```python
from __future__ import annotations
from data_designer.config.column_configs import GenerationStrategy
from data_designer.engine.column_generators.generators.base import ColumnGeneratorWithModelRegistry
from data_designer.engine.models.parsers.errors import ParserException
from data_designer_pairwise_judge.config import PairwiseJudgeColumnConfig
def parse_score(response: str) -> int:
text = response.strip()
if text not in {"1", "2", "3", "4", "5"}:
raise ParserException("Expected an integer score from 1 to 5.", source=response)
return int(text)
class PairwiseJudgeColumnGenerator(ColumnGeneratorWithModelRegistry[PairwiseJudgeColumnConfig]):
@staticmethod
def get_generation_strategy() -> GenerationStrategy:
return GenerationStrategy.CELL_BY_CELL
async def agenerate(self, data: dict) -> dict:
generator_model = self.get_model(self.config.model_alias)
judge_model = self.get_model(self.config.judge_model_alias)
retry_kwargs = {
"max_correction_steps": self.resource_provider.run_config.max_conversation_correction_steps,
"max_conversation_restarts": self.resource_provider.run_config.max_conversation_restarts,
}
draft, _ = await generator_model.agenerate(
prompt=f"Draft an answer for: {data[self.config.question_column]}",
purpose=f"drafting an answer for column '{self.config.name}'",
**retry_kwargs,
)
score, _ = await judge_model.agenerate(
prompt=f"Score this answer from 1 to 5: {draft}",
system_prompt="Return exactly one integer from 1 to 5.",
parser=parse_score,
purpose=f"judging an answer for column '{self.config.name}'",
**retry_kwargs,
)
data[self.config.name] = {"draft": draft, "score": score}
return data
```
feat: let column configs declare all model aliases for the startup health check (#626) * feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-11 17:33:50 +00:00
If your config has no `model_alias` field at all (uncommon but valid), override `get_model_aliases()` to return whichever fields name your model dependencies — the default implementation reads `model_alias` via `getattr` and returns an empty list when it is absent, so it will not crash on configs without it.
docs: graduate plugins out of experimental mode (#603) * chore: add __init__.py to engine namespace subpackages Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc. * docs: flesh out docstrings on plugin extension-point classes Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations. * docs: graduate plugins out of experimental mode Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md. * docs: simplify plugin docs structure Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference. * docs: scroll wide tables horizontally instead of wrapping Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable. * docs: address PR #603 review feedback - Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers. * docs: split code reference by package * docs: add interface code reference * docs: add code reference overviews * docs: refine code reference pages * docs: improve code reference tables * docs: correct reference docstrings * docs: embed plugin catalog table * docs: note plugin discovery restart caveat * docs: explain generator base class choice * docs: mention async cell generator examples * docs: clarify plugin model usage * docs: clarify plugin model aliases * docs: address plugin review feedback * docs: update available plugins page
2026-05-06 22:12:44 +00:00
## What the registry returns
`get_model(...)` returns a `ModelFacade`. Call the facade based on the modality your plugin needs:
- Chat completion aliases use `model.generate(...)` or `await model.agenerate(...)` and return `(parsed_output, trace)`.
- Embedding aliases use `model.generate_text_embeddings(...)` or `await model.agenerate_text_embeddings(...)` and return `list[list[float]]`.
- Image aliases use `model.generate_image(...)` or `await model.agenerate_image(...)` and return `list[str]` of base64-encoded image data.
Choose a model alias whose `ModelConfig.inference_parameters.generation_type` matches the facade method you call. The facade merges the alias's configured inference parameters into each request.
Pass runtime context such as `prompt`, `system_prompt`, `parser`, `tool_alias`, `multi_modal_context`, `max_correction_steps`, `max_conversation_restarts`, and `purpose` at the call site. Parser functions should raise `ParserException` for invalid model responses; that is what allows `ModelFacade.generate(...)` and `ModelFacade.agenerate(...)` to run correction turns and conversation restarts.
Prefer implementing `agenerate(...)` for model-backed plugins. The base `generate(...)` method can bridge to `agenerate(...)` for sync runs when the subclass only implements async generation. If your plugin has a sync-specific path, implement both `generate(...)` and `agenerate(...)`, as the built-in generators do.
## Health checks and scheduling
The model-aware bases mark the generator as LLM-bound, so the async scheduler treats the work like other model calls.
feat: let column configs declare all model aliases for the startup health check (#626) * feat(engine): let column configs declare all model aliases for the startup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> * fix(config): forward empty model_alias to startup health check SingleColumnConfig.get_model_aliases() used `if alias` to filter, which also dropped empty-string aliases. Empty model_alias values are accepted by the config model and previously reached run_health_check, where they failed fast with "No model config with alias '' found!". Treating them as "no model endpoints" silently delayed that error to first generation. Use `alias is not None` so only a truly missing attribute skips the health check, and add a regression test that exercises an empty-string model_alias on a built-in config. Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com> --------- Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-05-11 17:33:50 +00:00
Plugin discovery treats column generator implementations that inherit from `ColumnGeneratorWithModelRegistry` as model-generated column types for startup model health checks. The standard health-check collection calls `SingleColumnConfig.get_model_aliases()` on each column config and pings every alias it returns. The default implementation returns the column's primary `model_alias` (or an empty list for configs without one); configs with multiple model fields should override it so the startup check exercises every endpoint they depend on.
docs: graduate plugins out of experimental mode (#603) * chore: add __init__.py to engine namespace subpackages Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc. * docs: flesh out docstrings on plugin extension-point classes Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations. * docs: graduate plugins out of experimental mode Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md. * docs: simplify plugin docs structure Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference. * docs: scroll wide tables horizontally instead of wrapping Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable. * docs: address PR #603 review feedback - Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers. * docs: split code reference by package * docs: add interface code reference * docs: add code reference overviews * docs: refine code reference pages * docs: improve code reference tables * docs: correct reference docstrings * docs: embed plugin catalog table * docs: note plugin discovery restart caveat * docs: explain generator base class choice * docs: mention async cell generator examples * docs: clarify plugin model usage * docs: clarify plugin model aliases * docs: address plugin review feedback * docs: update available plugins page
2026-05-06 22:12:44 +00:00
## Built-in patterns
The built-in model-backed generators use these same hooks:
- `LLMTextCellGenerator`, `LLMCodeCellGenerator`, `LLMStructuredCellGenerator`, and `LLMJudgeCellGenerator` inherit through a chat-completion base that uses `ColumnGeneratorWithModel`. They render prompts from row data, call `self.model.generate(...)` or `self.model.agenerate(...)`, pass parsers into the `ModelFacade`, and store optional trace side-effect columns.
- `EmbeddingCellGenerator` uses `ColumnGeneratorWithModel` but calls the facade's embedding methods instead of chat completion.
- `ImageCellGenerator` uses `ColumnGeneratorWithModel`, renders a prompt, calls the facade's image methods, and writes generated media through the artifact storage supplied by the same `ResourceProvider`.
- `CustomColumnGenerator` is the inline-function counterpart: when users declare `model_aliases`, it builds a `models` dict from `resource_provider.model_registry`. Packaged plugins usually use `ColumnGeneratorWithModel` or `ColumnGeneratorWithModelRegistry` directly instead of recreating that dict.
See [Custom Model Settings](../concepts/models/custom-model-settings.md) for configuring model aliases.