Commit graph

16 commits

Author SHA1 Message Date
Johnny Greco
8b8d748446
docs: graduate plugins out of experimental mode (#603)
* chore: add __init__.py to engine namespace subpackages

Griffe (used by mkdocstrings) skips directories without __init__.py
when resolving module paths, which prevented the new plugins code
reference from rendering SeedReader, FileSystemSeedReader, and
Processor. Adding empty __init__.py files in engine/resources/,
engine/processing/, and engine/processing/processors/ aligns with
the convention already used in engine/mcp/, engine/models/, etc.

* docs: flesh out docstrings on plugin extension-point classes

Plugin authors now see meaningful descriptions for every field and
method on the bases rendered in the plugins code reference:

- Plugin and PluginType: class docstrings + Attributes tables for
  fields and enum members; fix typo in config_qualified_name field
  description.
- SingleColumnConfig: document allow_resize.
- ProcessorConfig: document processor_type discriminator.
- SeedSource: document seed_type discriminator.
- FileSystemSeedSource: add class docstring + Attributes table for
  path / file_pattern / recursive.
- ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add
  class docstrings explaining when to use each base, plus method
  docstrings on the abstract generate() implementations.

* docs: graduate plugins out of experimental mode

Restructures plugin documentation around the now-stable extension
points (column generator, seed reader, processor) and treats plugins
as a first-class story for customizing Data Designer.

- Add code_reference/plugins.md: single-stop reference for the Plugin
  object and the config + implementation base classes used by all
  three plugin types.
- Add code_reference/generators.md: column generator implementation
  base classes, separated from column configs.
- Surface SingleColumnConfig in code_reference/column_configs.md.
- Add plugins/implement.md ("Build Your Own"): per-type implementation
  instructions across column generators, seed readers, and processors.
- Add plugins/processor.md: complete processor plugin package example.
- Rewrite plugins/overview.md: open with why plugins exist, drop the
  internal-helpers note (PluginRegistry / PluginManager), and focus
  the guide on what plugin builders need.
- Refresh plugins/available.md (Catalog) and
  plugins/filesystem_seed_reader.md to match the new structure.
- Delete plugins/example.md (replaced by per-type guides).
- Reorder Code Reference nav alphabetically and add the new pages.
- Minor link / wording fixes in concepts/processors.md and
  concepts/deployment-options.md.

* docs: simplify plugin docs structure

Replace the overview's how-to walkthrough and the per-type plugin
guides with a single Build Your Own page that covers all three
plugin types side-by-side. Add a dedicated Using Models in Plugins
guide and a seed_readers code reference, and trim the overview down
to what the plugin types are, how to use one, and how discovery
works.

- Rename plugins/implement.md to plugins/build_your_own.md.
- Delete plugins/filesystem_seed_reader.md and plugins/processor.md
  (their content is now in build_your_own.md and the per-type code
  references).
- Add plugins/models.md for model-backed column generator authoring.
- Add code_reference/seed_readers.md for seed reader implementation
  base classes.
- Rewrite plugins/overview.md: shorter intro, type bullets link to
  the relevant code reference, drop the multi-step "How do you
  create plugins" walkthrough in favor of a single Build a Plugin
  pointer, tighten Discovery troubleshooting.
- Refresh plugins/available.md (Available Plugins): point to the
  DataDesignerPlugins catalog and explain how to request a community
  listing.
- Update cross-page links in concepts/processors.md,
  concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md,
  code_reference/plugins.md, and code_reference/generators.md to
  match the new structure.
- Update mkdocs.yml nav: rename to Build Your Own, add Using Models,
  add seed_readers code reference.

* docs: scroll wide tables horizontally instead of wrapping

Code-heavy reference tables (plugin bases, column generators, etc.)
were wrapping aggressively on narrow viewports, breaking long
identifiers across multiple lines. Switch the table container to
horizontal overflow and prevent code cells from wrapping so
identifiers stay readable.

* docs: address PR #603 review feedback

- Add an Implementation base section to code_reference/processors.md
  rendering the engine-side Processor class. This justifies the
  engine/processing/__init__.py files added earlier and gives
  processor plugin authors an auto-rendered API reference, matching
  the pattern used by code_reference/generators.md and seed_readers.md.
- build_your_own.md: replace the placeholder "x" emoji on the
  IndexMultiplier example with the actual multiplication sign.
- build_your_own.md: drop the manual `re.compile + apply(lambda)`
  pattern in the regex-filter processor in favor of the idiomatic
  `Series.str.contains(..., regex=True)`.
- build_your_own.md: add a kernel-restart caveat after the editable
  install instructions — PluginRegistry caches discovery on first
  import, so notebooks need a fresh kernel to pick up freshly
  installed plugins.
- build_your_own.md: state explicitly what `assert_valid_plugin`
  checks (config base + plugin-type-appropriate impl base).
- code_reference/plugins.md: link out to the processors code
  reference alongside generators and seed_readers.

* docs: split code reference by package

* docs: add interface code reference

* docs: add code reference overviews

* docs: refine code reference pages

* docs: improve code reference tables

* docs: correct reference docstrings

* docs: embed plugin catalog table

* docs: note plugin discovery restart caveat

* docs: explain generator base class choice

* docs: mention async cell generator examples

* docs: clarify plugin model usage

* docs: clarify plugin model aliases

* docs: address plugin review feedback

* docs: update available plugins page
2026-05-06 18:12:44 -04:00
Eric W. Tramel
8be4ff787f
feat: add RunConfig jinja rendering engine (#557) 2026-04-17 15:06:27 -04:00
Nabin Mulepati
e4857f62fa
feat: add Streamable HTTP transport support for remote MCP providers (#358)
* feat: add Streamable HTTP transport support for remote MCP providers (#357)

Add `streamable_http` as a supported transport type for `MCPProvider`,
enabling connections to MCP servers that use the Streamable HTTP protocol
(e.g. Tavily remote endpoints). Previously only SSE transport was supported,
causing silent 5-minute timeouts when connecting to incompatible endpoints.

- Expand `MCPProvider.provider_type` to `Literal["sse", "streamable_http"]`
  (default remains `"sse"` for backwards compatibility)
- Route `streamable_http` providers through `streamablehttp_client` from
  the MCP SDK in `MCPIOService._get_or_create_session()`
- Handle variable-length context manager results from MCP transport clients
- Add `DataDesigner.list_mcp_tool_names()` for discovering available tools
- Update CLI form builder and controller to support the new transport option
- Add tests for streamable_http config, session creation, and form builder

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* updates

* simplify import

* address greptile comments

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 08:11:54 -07:00
Nabin Mulepati
d8d1e668b0
docs: add image generation documentation and image-to-image editing tutorial (#319) 2026-02-12 14:38:52 -07:00
Johnny Greco
87119a545b
refactor: move SingleColumnConfig to config.base module (#287)
* create top-level base file

* add note

* update license header

* move exportable config and move base to config module

* update references in docs

* do not include single column config in init

* add inverse import order e2e test
2026-02-03 14:04:04 -05:00
Eric W. Tramel
e6e58e692e
feat: MCP (Model Context Protocol) tool calling integration for LLM columns (#248) 2026-02-02 09:41:58 -05:00
Johnny Greco
0d51539aa6
feat: add message trace support for LLM generation (#272)
Add support for capturing full conversation traces during LLM generation,
enabling debugging and fine-tuning dataset creation.

Changes:
- Add `with_trace` field to LLMTextColumnConfig for per-column trace control
- Add `debug_override_save_all_column_traces` to RunConfig for global trace
- Introduce ChatMessage dataclass for structured message representation
- Update ModelFacade.generate() to return full message trace
- Rename trace column postfix from `__reasoning_trace` to `__trace`
- Add comprehensive traces documentation

Traces capture system/user/assistant messages in order, enabling visibility
into the full generation conversation including correction retries.
2026-01-30 17:03:07 -05:00
Johnny Greco
50fc50efc7
docs: Fix mkdocs syntax and update person sampling documentation (#249)
* remove colon

* update person sampling docs
2026-01-27 10:18:42 -05:00
Eric W. Tramel
613509f323
feat: Elevate non-LLM concurrency limits to RunConfig (#242) 2026-01-26 11:11:36 -05:00
Nabin Mulepati
01f8d887f8
chore: deprecate InferenceParameters (#183)
* deprecate InferenceParameters

* update docs and references
2026-01-08 10:43:02 -07:00
Andre Manoel
d50a8aef95
docs: add processors (#147)
* first draft

* adding to code reference as well

* docstrings

* addressing comments

* forgot opening line

* docstring too
2025-12-17 15:47:33 -03:00
Johnny Greco
48fdc8c838
docs: add initial plugin documentation (#107)
* add docstrings

* add analysis modules

* include toc for plugins section

* add plugin docs

* remove scope creep

* Update docs/plugins/example.md

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>

* address feedback

---------

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
2025-12-11 16:05:11 -05:00
Nabin Mulepati
1de2262b94
docs: add models module to code reference (#101)
* Add example notebook showing how to use image contexts

* change 101 -> tutorial

* update _README.md with info on the new tutorial

* add reference in mkdocs.yml

* simplify vlm tutorial

* update num_records on tutorials. Update .gitignore

* update readme info

* add models module to code reference

* fix links to generated ipynb

* change vlm in example tutorial to llama4-scout
2025-12-05 10:41:43 -07:00
Johnny Greco
362ec51544
docs: sampler params code ref and more (#50)
* add sampler params code ref

* add persons section

* add person from faker sampler
2025-11-19 16:27:40 -05:00
Andre Manoel
01fbf4d848
docs: validators etc. (#45)
* got a little help from Claude, will still double check everything

* fixing, adding docstrings

* forgotten file + overview to tutorial

* minor

* applying suggestions

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
Co-authored-by: Johnny Greco <jogreco@nvidia.com>

* addressing comments pt1

* addressing comments pt2

* trying something out

* fix

* typo

* trying again

* rollback workflow, add download links

* minor

* adapting notebooks to use fakersampler

---------

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-11-19 17:39:10 -03:00
Johnny Greco
42b089e0f4
docs: establish doc templating, building, and strategy (#31)
* initial updates with jupyter tutorials and styling

* filling out some docs

* add blank index

* update docs workflow

* clean up style sheet
2025-11-12 17:04:50 -05:00