* docs: add text-to-sql devnote
* add diagram, update content
* correct inconsistencies
* docs: address PR #349 feedback and add BIRD benchmark results
PR feedback fixes:
- Fix Window Functions contradiction: Key Takeaway #1 now uses
"Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate)
- Fix score-0 truthiness bug: use `is not none` instead of truthy check
in Jinja2 expression columns (inline example + production pipeline)
- Soften Code Sandbox language: "A natural next step would be..." instead
of "We are actively implementing..."
- Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron
team description
- Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME,
ASCII diagram labels, Pipeline Overview prose
- Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck
- Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default
provider pattern (matches structured-outputs dev note), remove unused
explicit ModelConfig
- Remove placeholder dataset link (#), add "Dataset: Internal" note
New content:
- Add BIRD Benchmark Results section with bar chart (JPG), data table,
BIRD caveat paragraph, and Jocelyn Huang acknowledgement
(Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B)
- Replace "Looking Ahead: Code Sandbox" with broader "Next Steps":
Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0
- Add Project Summary table at end of post
* docs: address second round of PR #349 feedback
- Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1
to match the exact taxonomy string in the code example (greptile)
- Add admonition clarifying code snippets are illustrative, not
runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha)
- Add context before score extraction snippet referencing the five
LLMJudgeColumnConfig columns and linking to full recipe (nabinchha)
- Add companion file note and recipe link to production pipeline
details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha)
* docs: address round 2 PR #349 feedback, replace production block with recipe
- Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1
to match the exact taxonomy string in the code example (greptile)
- Add admonition clarifying inline code snippets are illustrative,
with link to runnable Enterprise Text-to-SQL Recipe (nabinchha)
- Add context before score extraction snippet referencing the five
LLMJudgeColumnConfig columns and linking to full recipe (nabinchha)
- Replace production pipeline <details> block (230 lines with phantom
imports from prompts.py, rubrics.py, text2sql_seed.json) with
snippet include of enterprise_text_to_sql.py recipe — self-contained
and runnable, consistent with other merged dev notes (nabinchha)
* docs: polish Try It Yourself and Summary sections
- Wrap minimal inline example in collapsible <details> dropdown
- Rename "A Team Effort" section to "Summary"
- Remove redundant Scale/Dialects/Dataset line
* docs: add missing sql_dialect sampler to Step 1 code snippet
The Step 3/4 prompt templates reference {{ sql_dialect }} but the
Step 1 seeding code never defined it, leaving an unresolved Jinja2
variable for readers following along. Add the sql_dialect sampler
with a comment explaining the pipeline runs once per dialect.
* fix ascii diagram
* docs: fix BIRD score framing and MySQL dialect wording
- Remove specific "60-70%" BIRD claim from intro to avoid contradiction
with the 41.80%/38.25% direct-generation results shown later (those
higher figures come from specialized systems with schema linking)
- Reword MySQL "forbids" to "prompts exclude" -- REGEXP_REPLACE and
CONVERT_TZ are valid MySQL functions; the pipeline excluded them for
portability, not because the dialect forbids them
* docs: move text-to-sql images to assets/ convention and update refs
* docs: address text-to-sql devnote review comments
- Add devnote to mkdocs nav after Async All the Way Down
- Swap Recursive CTEs to Advanced, CASE Expressions to Intermediate (matches recipe)
- Fix score extraction truthy check to use 'is not none' (preserves score-0 values)
- Drop REPLACE() vs regexp_replace from dialect takeaway (REPLACE is cross-dialect)
- Tighten prose: remove 'The key insight:', use actual BIRD number, trim X-not-Y
- Fix knowledge dependency count: 8 -> 9 concepts (3x3 in recipe)
---------
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Co-authored-by: Yev Meyer <ymeyer@nvidia.com>
feat: add Nemotron Super Text-to-SQL and Search Agent recipes
Add two new recipes derived from the Nemotron Super post-training pipelines:
Nemotron Super Text-to-SQL:
- Five-stage pipeline: seeding, prompt generation, schema with distractors,
dialect-specific SQL, validation + quality scoring
- 14 conditional samplers (10 industries, 50 topics, complexity-gated task
types, data quality concepts, knowledge dependencies, 100 style combos)
- Dialect-specific prompts for SQLite, MySQL, and PostgreSQL
- 5 LLM judges (prompt, SQL, context, data quality, knowledge) with 15
scoring dimensions and flat score extraction columns
- Per-dialect syntax validation via CodeValidatorParams
Nemotron Super Search Agent:
- Four-stage pipeline: Wikidata KG seed paths, two-stage riddle generation
(draft + BrowseComp-style obfuscation), Tavily web search trajectories
via MCP, structured JSON formatting
- Tavily hosted MCP endpoint (streamable_http) -- no local server or extra
dependencies beyond data-designer
- Full tool-call trace capture (with_trace=ALL_MESSAGES) for SFT data
- Built-in demo seeds (3 Wikidata paths) for quick testing
Both recipes include ASCII pipeline diagrams, Nemotron Super context in
docstrings, dev note links in the markdown pages, and follow existing
recipe conventions (PEP 723 metadata, --model-alias/--num-records/
--artifact-path CLI args).