--- date: 2026-05-05 authors: - jgreco - etramel --- # **Have It Your Way: Customizing Data Designer with Plugins**
A plugin framework for the custom pieces every real project ends up needing
{ .devnote-float-right .devnote-hide-in-index } Data Designer is built around a simple idea: describe the dataset you want, and let the framework handle execution. A config points to seed data, defines generated columns, picks models, and shapes the final records — no orchestration code required. [Data Designer plugins](../../plugins/overview.md) keep that promise when a project needs something custom. As of Data Designer [v0.6.0](https://github.com/NVIDIA-NeMo/DataDesigner/releases/tag/v0.6.0), plugins are out of experimental mode and stable. They are the supported path for turning reusable project-specific logic into normal Data Designer components. What does "something custom" actually look like? Picture a robotics team sitting on a pile of [Isaac Sim](https://developer.nvidia.com/isaac/sim)-generated warehouse runs, trying to turn robot poses, camera views, and event metadata into instruction data. With an internal simulation-log plugin, the user-facing part can still be this small: ```bash uv pip install data-designer-isaac-logs ``` ```python from data_designer_isaac_logs.config import IsaacRunSeedSource from data_designer_isaac_logs.config import WarehouseEventLabelColumnConfig from data_designer_isaac_logs.config import RobotSFTProcessor config_builder.with_seed_dataset( IsaacRunSeedSource( run_dir="s3://warehouse-sim/rare-events/", streams=("robot_pose", "overhead_rgb", "event_log"), max_events=10_000, ) ) config_builder.add_column( WarehouseEventLabelColumnConfig( name="safety_instruction", pose_column="robot_pose", event_log_column="event_log", ) ) config_builder.add_processor(RobotSFTProcessor(output_column="messages")) ``` That is the point of plugins: install a package, import its config classes, and keep the workflow declarative. The Isaac run reader, event labeler, and trainer-format processor own the project-specific parsing and trainer-facing shape. Data Designer still does the framework work, from component discovery and dependency ordering to model execution and output handling. --- ## **Customization Is the Normal Case** { .devnote-section-graphic } The mess usually starts innocently. A team defines a Data Designer config, then discovers that its seed data lives in an internal layout, its generated column needs a domain simulator, and its trainer expects a slightly different record shape. Someone writes a small reader beside the notebook. Someone patches a generator into a project folder. Someone adds a cleanup script after preview because the final export has one more organization-specific rule. Each choice is reasonable because every project brings a different corpus, policy model, domain vocabulary, or training stack. The problem is that the custom behavior now lives around Data Designer instead of inside the Data Designer workflow. It is harder to validate, harder to share, harder to version, and easier to lose. Plugins give that bespoke work a clean package boundary – a name, typed config, runtime implementation, entry point, and tests that travel together. Users still declare the dataset they want, but the local reader, domain generator, or trainer-format processor becomes a normal Data Designer component instead of another layer of glue. --- ## **Where Plugins Fit** The first plugin boundaries match the places where real projects most often need customization.📥 Seed reader plugins bring new source systems into Data Designer. Use them for databases, document stores, object stores, internal APIs, file collections, or corpus layouts that need custom hydration before generation can begin.
🧬 Column generator plugins create new column types. Use them when a value should be produced during generation and should participate in dependency ordering like any other column. This is the right place for simulators, domain libraries, retrieval-backed generation, deterministic rule systems, or custom model-backed generation.
🔧 Processor plugins transform records before or after generation. Use them for redaction, cleanup, deduplication, export views, organization-specific schemas, or training formats that should not be hidden inside prompts.