mirror of
https://github.com/NVIDIA-NeMo/DataDesigner
synced 2026-05-24 09:48:29 +00:00
Preserves tree from previous docs-website head: 5e47d33ea8. This branch is a CI-managed publish artifact like gh-pages; source provenance is tracked in commit messages rather than Git ancestry.
1.3 KiB
1.3 KiB
Markdown Section Seed Reader
Turn a directory of Markdown files into a seed dataset with one row per section. This recipe stays in the same single-file format as the other recipes: it creates sample files, defines an inline FileSystemSeedReader[DirectorySeedSource], and passes that reader to DataDesigner(seed_readers=[...]).
This keeps the example focused on the actual seed reader contract:
- implementing
build_manifest(...) - returning
1:Nhydrated rows fromhydrate_row(...) - declaring
output_columnsfor the hydrated schema - keeping
IndexRangeselection manifest-based
Because the example reuses DirectorySeedSource, it does not register a brand-new seed_type. To package the same reader as an installable plugin, see Build Your Own.
Run the Recipe
Run the script directly:
uv run markdown_seed_reader.py
The script prints two previews:
- the full section dataset across all Markdown files
- a manifest-only selection using
IndexRange(start=1, end=1)that still returns every section from the selected file
Download Code :octicons-download-24:{ .md-button download="markdown_seed_reader.py" }
--8<-- "assets/recipes/plugin_development/markdown_seed_reader.py"