DataDesigner/architecture/cli.md
Johnny Greco d14c9b3ccc
feat(cli): add plugin catalog core (#618)
* feat(cli): add plugin catalog services

Add typed catalog and tap models, persistent tap storage, cached
catalog loading, compatibility evaluation, install plan generation,
and runtime plugin discovery helpers.

Refs #617

* feat(cli): add plugins command group

Wire list, search, info, install, installed, and tap management
commands through the existing command-controller CLI pattern.

Refs #617

* test(cli): cover plugin catalog workflows

Add regression coverage for tap caching, catalog compatibility,
installer command generation, local path resolution, and Typer command
delegation.

Refs #617

* fix(cli): align plugin taps with schema v2

Validate tap catalogs against the schema v2 contract used by
NVIDIA-NeMo/DataDesignerPlugins#36, including source union fields,
docs URLs, package paths, compatibility metadata, and unique runtime
plugin names.

Derive Git install targets as package-qualified PEP 508 direct
references so git tap entries install the package described by the
catalog source metadata.

Refs #617

* fix(cli): address plugin review feedback

- Invalidate import caches before post-install entry point verification
- Make tap aliases case-insensitive and cache catalogs by alias plus URL
- Prefer compatible catalog entries before falling back to forced installs
- Clarify unused --tap behavior and list installed entry points without imports
- Add direct controller coverage and update CLI plugin documentation

Refs #617

* fix(cli): gate incompatible plugin installs

Fetch install targets before compatibility filtering so the controller
owns the final --force decision and the incompatible install guard stays
reachable.

Refs #617

* style(cli): format plugin catalog files

Apply ruff formatting to the plugin command and tap repository tests so
CI format checks pass on the PR merge commit.

Refs #617

* fix(cli): reject duplicate plugin entry names

Key catalog duplicate detection by entry_point.name so distinct catalog
entries cannot register the same runtime plugin name.

Refs #617

* fix(cli): preserve GitHub tree tap paths

* fix(cli): verify plugin entry point names

* align plugin CLI with catalog schema

- adopt catalog terminology for plugin source aliases
- parse package-first plugin catalog metadata from the plugin repo
- install package requirements with optional catalog indexes

* tidy plugin catalog workflow docs

* align plugin catalog CLI with package contract

* add plugin package uninstall workflow

* test plugin package command targets

* document plugin package aliases

* address plugin catalog review feedback

* prefer runtime plugin lookup matches

* rename plugins command to plugin

* show plugin package descriptions

* rename plugin catalogs command

* add protected plugin package installs

* document plugin package install modes

* avoid building project during plugin installs

* harden plugin package installs

* tighten plugin catalog contracts

* fix no-args help exit code

* make plugin docs links robust

* document plugin CLI catalog workflows

* clarify plugin entry point verification

* simplify plugin CLI docs

* narrow plugin search fields

* hide plugin catalog cache ttl

* remove plugin catalog trust flag

* improve plugin CLI recovery UX

* polish plugin catalog table display

* stabilize plugin catalog table test

* tighten plugin catalog edge cases

* harden plugin catalog verification

- Escape catalog-provided Rich markup before rendering CLI output
- Reject runtime plugin names that collide after enum-key normalization
- Load installed runtime entry points in a subprocess before reporting success

* simplify plugin entry point verification

Load matching entry points directly after install instead of spawning a
separate Python process. This keeps the check package-scoped while still
catching broken entry-point targets and non-Plugin objects.

* require newer uv for plugin plans

Use uv >= 0.10.0 as the single supported uv requirement for
plugin package commands. Auto mode now falls back to a pip plan with
an upgrade warning when uv is unavailable or too old, while explicit
uv selection remains strict.

* verify pip fallback availability

* polish plugin CLI status markers

* clarify plugin compatibility labels

* simplify plugin info install details

* address plugin CLI review nits

* support versioned plugin package installs

* share plugin install metadata rendering

* show installed plugin packages

* harden versioned plugin installs

- Preserve catalog requirement constraints for versioned installs
- Remove stale install-plan metadata fields
- Expand parser, uv, controller, and local-catalog dry-run coverage

* harden plugin help tests

* show plugin package versions

Add package version metadata support for plugin catalogs and resolve current versions from exact requirements or simple indexes when catalog entries omit them.

Update plugin list/info/install metadata to show the plugin package version and Data Designer compatibility requirement while removing the separate Data Designer version line.

* format plugin catalog tests

* harden plugin package metadata checks

* harden plugin CLI test coverage

* add plugin discovery docs (#642)

Signed-off-by: Johnny Greco <jogreco@nvidia.com>

---------

Signed-off-by: Johnny Greco <jogreco@nvidia.com>
2026-05-13 12:26:58 -04:00

7.9 KiB

CLI

The CLI (data-designer) provides an interactive command-line interface for configuring models, providers, MCP providers, and tools, downloading managed persona datasets, discovering, installing, and uninstalling plugin packages from catalogs, and running dataset generation. It uses a layered architecture for setup workflows and delegates generation to the public DataDesigner API.

Source: packages/data-designer/src/data_designer/cli/

Overview

The CLI is built on Typer with lazy command loading to keep startup fast. Config management and plugin catalog commands follow a command → controller → service → repository layering pattern. Generation commands bypass this stack and use the public DataDesigner class directly.

Key Components

Entry Point

data-designer is registered as a console script pointing to data_designer.cli.main:main. On startup:

  1. ensure_cli_default_model_settings() initializes default model/provider configs
  2. app() launches the Typer application

Lazy Command Loading

create_lazy_typer_group and _LazyCommand stubs defer importing command modules until a command is actually invoked. This keeps data-designer --help fast — only the command names and descriptions are loaded eagerly; the full module (and its dependencies) loads on first use.

Layering Pattern (Setup Workflows)

Config management commands (models, providers, MCP providers, tools) follow a consistent four-layer pattern:

Layer Role Example
Command Thin Typer entry, wires DATA_DESIGNER_HOME models_commandModelController(DATA_DESIGNER_HOME).run()
Controller UX flow: menus, forms, success/error display ModelController composes repos + services + ModelFormBuilder
Service Domain rules: uniqueness, merge, delete-all ModelService.add/update/delete over ModelRepository
Repository File I/O for typed config registries ModelRepository extends ConfigRepository[ModelConfigRegistry]

Repositories: ModelRepository, ProviderRepository, MCPProviderRepository, and ToolRepository. PersonaRepository provides read-only locale metadata for managed persona dataset downloads.

Services mirror the repository domains with business logic (validation, conflict resolution).

Plugin catalog commands use the same layering shape:

Layer Role Example
Command Thin Typer entry, wires DATA_DESIGNER_HOME and command options plugin subcommands (list, search, info, install, uninstall, installed, catalog) → PluginCatalogController(DATA_DESIGNER_HOME)
Controller UX flow: catalog tables, package metadata, compatibility display, install/uninstall confirmations PluginCatalogController composes catalog + install services
Service Domain rules: package listing, compatibility checks, uv/pip install and uninstall commands, runtime entry-point checks PluginCatalogService, PluginInstallService
Repository File/cache I/O for catalog aliases and catalog documents PluginCatalogRepository

The built-in nvidia catalog points at https://nvidia-nemo.github.io/DataDesignerPlugins/catalog/plugins.json. NVIDIA-NeMo/DataDesignerPlugins defines the catalog format. Each catalog entry is an installable package with docs, install metadata, compatibility constraints, and one or more runtime plugins. Users install and uninstall packages, not individual runtime plugins. Commands that take a package name also accept the package alias from the data-designer-{alias} package-name pattern; for example, data-designer-calculator can be addressed as calculator. If a user passes a runtime plugin name where a package is required, the CLI reports the package that owns that runtime plugin.

Generation Commands

preview, create, and validate commands use GenerationController, which:

  1. Loads config via load_config_builder
  2. Calls DataDesigner.preview(), DataDesigner.create(), or DataDesigner.validate() directly
  3. Handles output display and error formatting

This keeps generation aligned with the public Python API — the CLI is a thin wrapper, not a separate code path.

UI Utilities

  • cli/ui.py — Rich console helpers for formatted output
  • cli/forms/ — interactive form builders for config creation/editing
  • cli/utils/config_loader.py — config file resolution and loading
  • sample_records_pager.py — paginated display of generated records

Data Flow

Config Management

User invokes command (e.g., `data-designer config models`)
  → Command function wires DATA_DESIGNER_HOME
  → Controller presents interactive menu
  → Service validates and applies changes
  → Repository reads/writes config files

Plugin Catalog Discovery

User invokes command (e.g., `data-designer plugin list`)
  → Command function wires DATA_DESIGNER_HOME and catalog options
  → PluginCatalogController resolves the catalog alias and chooses table or narrow-terminal layout
  → PluginCatalogService loads packages and filters out incompatible packages by default
  → PluginCatalogRepository reads local config and cached/remote catalog JSON

Plugin Install/Uninstall

User invokes command (e.g., `data-designer plugin install calculator`)
  → PluginCatalogController resolves the plugin package name or package alias
  → PluginCatalogService evaluates Python and Data Designer compatibility
  → PluginInstallService chooses uv or pip and builds the command.
    In active uv projects it uses `uv add` so the package is recorded in
    `pyproject.toml`; otherwise it installs into the current Python environment.
    Data Designer itself is already installed, so its packages are not reinstalled
    or replaced while installing plugin dependencies.
  → PluginInstallService verifies the package's runtime plugin entry points can load
User invokes command (e.g., `data-designer plugin uninstall calculator`)
  → PluginCatalogController resolves the plugin package name or package alias
  → PluginInstallService chooses uv or pip and builds the uninstall command.
    Active uv projects remove the dependency from project metadata and uninstall
    the package from the current environment.
  → PluginInstallService verifies the package's runtime plugin entry-point metadata is removed

Generation

User invokes command (e.g., `data-designer create config.yaml`)
  → GenerationController loads config
  → DataDesigner.create() runs the full pipeline
  → Results displayed via Rich console

Design Decisions

  • Lazy command loading keeps data-designer --help responsive: command modules (and their heavy dependencies, such as the engine and model stacks) load only when a command is invoked, not at process startup.
  • Controller/service/repo for setup workflows, direct API for generation — config and plugin catalog workflows benefit from the layered pattern (testable services, swappable repositories). Generation doesn't need this indirection; it delegates to the same DataDesigner class that Python users call directly.
  • DATA_DESIGNER_HOME centralizes CLI-managed state (model configs, provider configs, MCP provider configs, tool configs, managed assets, plugin catalog aliases, and catalog caches) in a single directory, defaulting to ~/.data-designer/.
  • Package-first plugin catalogs match how users install plugins: one package can provide one or more runtime plugins, but install and uninstall commands always target the package.
  • Rich-based UI provides formatted tables, progress bars, and interactive prompts without requiring a web interface.

Cross-References