mirror of
https://github.com/NVIDIA-NeMo/DataDesigner
synced 2026-05-24 09:48:29 +00:00
Add comprehensive documentation for DirectorySeedSource, FileContentsSeedSource, and AgentRolloutSeedSource to the seed datasets concept page. Add FileSystemSeedReader plugin authoring guide and Markdown section seed reader recipe. Supersedes #425 and #452. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1.3 KiB
1.3 KiB
Markdown Section Seed Reader
Turn a directory of Markdown files into a seed dataset with one row per section. This recipe stays in the same single-file format as the other recipes: it creates sample files, defines an inline FileSystemSeedReader[DirectorySeedSource], and passes that reader to DataDesigner(seed_readers=[...]).
This keeps the example focused on the actual seed reader contract:
- implementing
build_manifest(...) - returning
1:Nhydrated rows fromhydrate_row(...) - declaring
output_columnsfor the hydrated schema - keeping
IndexRangeselection manifest-based
Because the example reuses DirectorySeedSource, it does not register a brand-new seed_type. If you later want to package the same reader as an installable plugin, see FileSystemSeedReader Plugins.
Run the Recipe
Run the script directly:
uv run markdown_seed_reader.py
The script prints two previews:
- the full section dataset across all Markdown files
- a manifest-only selection using
IndexRange(start=1, end=1)that still returns every section from the selected file
Download Code :octicons-download-24:{ .md-button download="markdown_seed_reader.py" }
--8<-- "assets/recipes/plugin_development/markdown_seed_reader.py"