DataDesigner

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

History

dhruvnathawani 7de879acfa docs: add Enterprise Text-to-SQL and Search Agent recipes (#395 ) feat: add Nemotron Super Text-to-SQL and Search Agent recipes Add two new recipes derived from the Nemotron Super post-training pipelines: Nemotron Super Text-to-SQL: - Five-stage pipeline: seeding, prompt generation, schema with distractors, dialect-specific SQL, validation + quality scoring - 14 conditional samplers (10 industries, 50 topics, complexity-gated task types, data quality concepts, knowledge dependencies, 100 style combos) - Dialect-specific prompts for SQLite, MySQL, and PostgreSQL - 5 LLM judges (prompt, SQL, context, data quality, knowledge) with 15 scoring dimensions and flat score extraction columns - Per-dialect syntax validation via CodeValidatorParams Nemotron Super Search Agent: - Four-stage pipeline: Wikidata KG seed paths, two-stage riddle generation (draft + BrowseComp-style obfuscation), Tavily web search trajectories via MCP, structured JSON formatting - Tavily hosted MCP endpoint (streamable_http) -- no local server or extra dependencies beyond data-designer - Full tool-call trace capture (with_trace=ALL_MESSAGES) for SFT data - Built-in demo seeds (3 Wikidata paths) for quick testing Both recipes include ASCII pipeline diagrams, Nemotron Super context in docstrings, dev note links in the markdown pages, and follow existing recipe conventions (PEP 723 metadata, --model-alias/--num-records/ --artifact-path CLI args).	2026-03-11 11:19:58 -07:00
..
recipes	docs: add Enterprise Text-to-SQL and Search Agent recipes (#395 )	2026-03-11 11:19:58 -07:00
palette-favicon.png	docs: establish doc templating, building, and strategy (#31 )	2025-11-12 17:04:50 -05:00

docs: add Enterprise Text-to-SQL and Search Agent recipes (#395 )

feat: add Nemotron Super Text-to-SQL and Search Agent recipes
Add two new recipes derived from the Nemotron Super post-training pipelines:
Nemotron Super Text-to-SQL:
- Five-stage pipeline: seeding, prompt generation, schema with distractors,
  dialect-specific SQL, validation + quality scoring
- 14 conditional samplers (10 industries, 50 topics, complexity-gated task
  types, data quality concepts, knowledge dependencies, 100 style combos)
- Dialect-specific prompts for SQLite, MySQL, and PostgreSQL
- 5 LLM judges (prompt, SQL, context, data quality, knowledge) with 15
  scoring dimensions and flat score extraction columns
- Per-dialect syntax validation via CodeValidatorParams
Nemotron Super Search Agent:
- Four-stage pipeline: Wikidata KG seed paths, two-stage riddle generation
  (draft + BrowseComp-style obfuscation), Tavily web search trajectories
  via MCP, structured JSON formatting
- Tavily hosted MCP endpoint (streamable_http) -- no local server or extra
  dependencies beyond data-designer
- Full tool-call trace capture (with_trace=ALL_MESSAGES) for SFT data
- Built-in demo seeds (3 Wikidata paths) for quick testing
Both recipes include ASCII pipeline diagrams, Nemotron Super context in
docstrings, dev note links in the markdown pages, and follow existing
recipe conventions (PEP 723 metadata, --model-alias/--num-records/
--artifact-path CLI args).

2026-03-11 11:19:58 -07:00

recipes

docs: add Enterprise Text-to-SQL and Search Agent recipes (#395 )

2026-03-11 11:19:58 -07:00

palette-favicon.png

docs: establish doc templating, building, and strategy (#31 )

2025-11-12 17:04:50 -05:00