Commit graph

6 commits

Author SHA1 Message Date
Nabin Mulepati
cebfb0e967
docs: Added starter dev notes on push to hugging face hub (#355)
* Added starter dev notes on push to huggingface hub

* fix: move excerpt marker to intro and remove redundant markers

Move the single <\!-- more --> to after the intro paragraph for a shorter
blog teaser and remove the 6 redundant markers throughout the post.

* Update docs/devnotes/posts/push-datasets-to-hugging-face-hub.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* docs: add HF ecosystem context to push-to-hub dev notes (#474)

* docs: add HF ecosystem context to push-to-hub dev notes

Add section on what datasets get on the Hub (Dataset Viewer, streaming,
Viewer API), link to Hub search for DataDesigner datasets, and note that
private datasets can be flipped to public.

* Update docs/devnotes/posts/push-datasets-to-hugging-face-hub.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: remove doubled library: prefix in Hub search URL

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update date

* fix date for text-to-sql

* update hero images"

* updates

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Daniel van Strien <davanstrien@users.noreply.github.com>
2026-04-16 11:29:33 -06:00
dhruvnathawani
1448f9cbda
docs: add text-to-sql dev note (#349)
* docs: add text-to-sql devnote

* add diagram, update content

* correct inconsistencies

* docs: address PR #349 feedback and add BIRD benchmark results
PR feedback fixes:
- Fix Window Functions contradiction: Key Takeaway #1 now uses
  "Geospatial SQL" (Advanced) instead of "Window Functions" (Intermediate)
- Fix score-0 truthiness bug: use `is not none` instead of truthy check
  in Jinja2 expression columns (inline example + production pipeline)
- Soften Code Sandbox language: "A natural next step would be..." instead
  of "We are actively implementing..."
- Cut Gretel reference per mvansegbroeck: replaced with NVIDIA/Nemotron
  team description
- Replace Qwen model references with Nemotron per mvansegbroeck: MODEL_NAME,
  ASCII diagram labels, Pipeline Overview prose
- Rename sdg_qwen_235b.py -> sdg_ndd_text2sql.py per mvansegbroeck
- Fix Try It Yourself: use MODEL_ALIAS = "nvidia-text" with default
  provider pattern (matches structured-outputs dev note), remove unused
  explicit ModelConfig
- Remove placeholder dataset link (#), add "Dataset: Internal" note
New content:
- Add BIRD Benchmark Results section with bar chart (JPG), data table,
  BIRD caveat paragraph, and Jocelyn Huang acknowledgement
  (Nemotron Super EX: 26.77% -> 41.80%, +15 pts, beats GPT-OSS-120B)
- Replace "Looking Ahead: Code Sandbox" with broader "Next Steps":
  Code Sandbox, RL on BIRD via NeMo Gym, schema representation, Spider 2.0
- Add Project Summary table at end of post

* docs: address second round of PR #349 feedback

- Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1
  to match the exact taxonomy string in the code example (greptile)
- Add admonition clarifying code snippets are illustrative, not
  runnable, with link to Enterprise Text-to-SQL Recipe (nabinchha)
- Add context before score extraction snippet referencing the five
  LLMJudgeColumnConfig columns and linking to full recipe (nabinchha)
- Add companion file note and recipe link to production pipeline
  details block for prompts.py, rubrics.py, text2sql_seed.json (nabinchha)

* docs: address round 2 PR #349 feedback, replace production block with recipe
- Fix "EHR Systems" -> "Electronic Health Records" in Key Takeaway #1
  to match the exact taxonomy string in the code example (greptile)
- Add admonition clarifying inline code snippets are illustrative,
  with link to runnable Enterprise Text-to-SQL Recipe (nabinchha)
- Add context before score extraction snippet referencing the five
  LLMJudgeColumnConfig columns and linking to full recipe (nabinchha)
- Replace production pipeline <details> block (230 lines with phantom
  imports from prompts.py, rubrics.py, text2sql_seed.json) with
  snippet include of enterprise_text_to_sql.py recipe — self-contained
  and runnable, consistent with other merged dev notes (nabinchha)

* docs: polish Try It Yourself and Summary sections
- Wrap minimal inline example in collapsible <details> dropdown
- Rename "A Team Effort" section to "Summary"
- Remove redundant Scale/Dialects/Dataset line

* docs: add missing sql_dialect sampler to Step 1 code snippet

The Step 3/4 prompt templates reference {{ sql_dialect }} but the
Step 1 seeding code never defined it, leaving an unresolved Jinja2
variable for readers following along. Add the sql_dialect sampler
with a comment explaining the pipeline runs once per dialect.

* fix ascii diagram

* docs: fix BIRD score framing and MySQL dialect wording
- Remove specific "60-70%" BIRD claim from intro to avoid contradiction
  with the 41.80%/38.25% direct-generation results shown later (those
  higher figures come from specialized systems with schema linking)
- Reword MySQL "forbids" to "prompts exclude" -- REGEXP_REPLACE and
  CONVERT_TZ are valid MySQL functions; the pipeline excluded them for
  portability, not because the dialect forbids them

* docs: move text-to-sql images to assets/ convention and update refs

* docs: address text-to-sql devnote review comments

  - Add devnote to mkdocs nav after Async All the Way Down
  - Swap Recursive CTEs to Advanced, CASE Expressions to Intermediate (matches recipe)
  - Fix score extraction truthy check to use 'is not none' (preserves score-0 values)
  - Drop REPLACE() vs regexp_replace from dialect takeaway (REPLACE is cross-dialect)
  - Tighten prose: remove 'The key insight:', use actual BIRD number, trim X-not-Y
  - Fix knowledge dependency count: 8 -> 9 concepts (3x3 in recipe)

---------

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Co-authored-by: Yev Meyer <ymeyer@nvidia.com>
2026-04-14 11:10:14 -07:00
Andre Manoel
0e90ea644b
docs: add async engine dev note (#490)
* fix: address review feedback on async engine dev note

- Fix wall-clock claim: 41% -> 22% to match benchmark table
- Fix dual-model speedup rounding: 1.7x -> 1.6x (10.0/6.1 = 1.64)
- Fix run_config API: use dd.set_run_config() instead of passing to create()

* docs: add async engine dev note

Add "Async All the Way Down" dev note covering the async task-queue
scheduler built across PRs #356, #378, #404, #429, #456. Includes
benchmark results, architecture diagrams, and DAG shape illustrations.

* feat: add docs preview workflow for PRs

Build MkDocs site on PRs that touch docs and deploy to Cloudflare
Pages. Each PR gets a browseable preview URL posted as a comment.
Notebook tutorials use placeholder stubs since they require API
keys to execute.

Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID repo secrets.

* fix: update speedup chart alt text from 1.7x to 1.6x

* docs: improve timeline figure context and labeling

Add DAG subtitle to sync-vs-async timeline figure and bridge the
surrounding text to explain which workload shape is being shown.

* edits+additions to async-all-the-way-down dev notes

* clarify two semaphore dance

* remove dead link

* replace hero image

* docs: update scale figures with nginx-accurate data and adjust sizing

Regenerate scale-model-timeline and scale-boxplot from nginx access
logs (column_progress.csv, sync/summary.json) instead of buffered
execution logs. Optimize both PNGs to palette mode. Adjust figure
widths and update model timeline commentary.

* add link from owning-the-model-stack to async-dev-node

* docs: address review feedback on async blog post

- Tighten intro to a concise abstract, move pipeline narrative into
  "The Bottleneck Was Structural" section
- Remove multi-column generators / seed readers paragraph (TMI)
- Clarify sync engine ran columns sequentially within each batch

---------

Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
2026-04-08 15:51:04 -03:00
Nabin Mulepati
f78c4e0cf7
Fix repeated header/footer on native-model-client-hero image (#492) 2026-04-06 09:45:27 -06:00
Nabin Mulepati
a1eb244321
docs: add native model client dev note (#465)
* add images

* re-ran slopguard

* update dev notes

* address greptile comments

* update example model name

* add info on throttlemanager

* address pr feedback

* Add link to model aliases

* address pr feedback

* update key resources

* update key resources

* crop image for better fit

* Fix max_parallel_requests

* refine concluding paragraph
2026-03-31 15:45:56 -06:00
Johnny Greco
0a7b9e0d6d
docs: Data Designer Got Skills dev note (#457)
* docs: add skeleton for "Data Designer Got Skills" dev note

* create assets folder and add blog directory name

* docs: add Claude Code plugin marketplace configuration

Register the repo as a Claude Code plugin marketplace so users can
install the data-designer skill via `/plugin marketplace add`.

* docs: write first draft of "Data Designer Got Skills" dev note

Full prose for all sections: intro with hero benchmark figure,
agents as first-class users, baseline trace walkthrough, CLI and
skill design, benchmark results (228 sessions), getting started
with marketplace and npx install paths, and what's next.

* docs: add error breakdown table and minor refinements

* docs: add sdg and data-designer keywords to plugin metadata

* docs: refine CLI framing, reduce em dashes, slop guard pass

* docs: fix grammar in dev note (serial comma, double-which clause)

* update hero image

* docs: swap hero image, move benchmark figure, minor wording tweaks

* docs: add narrative lead-in to skill trace summary

* docs: refine quality bullet, streamline getting started modes

* remove old image

* slope-guard tweaks
2026-03-24 21:03:00 -04:00