Commit graph

30 commits

Author SHA1 Message Date
Andre Manoel
b6de38d894
docs: remove docs code reference (#674)
Some checks failed
CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
CI / Lint and Format Check (push) Blocked by required conditions
CI / Check License Headers (push) Blocked by required conditions
CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions
CI / Coverage Check (Python 3.11) (push) Blocked by required conditions
CI / End to end test (Python 3.10 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.11 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.12 on macos-latest) (push) Blocked by required conditions
CI / End to end test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions
CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions
CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions
Publish Fern devnotes / deploy (push) Has been cancelled
2026-05-21 18:29:18 -04:00
Andre Manoel
765fccfcb0
docs: fix Fern versioned publishing (#656)
* docs: fix Fern versioned plugin docs

* docs: guard Fern release version content

* docs: dedupe latest Fern release pages

* ci: require latest Fern nav on release

* docs: document Fern release prep

* ci: automate Fern release sync

* ci: publish Fern snapshots from docs branch

* docs: keep Fern archive on docs branch

* docs: harden Fern docs branch publishing

* ci: preview Fern docs from archive branch

* docs: include utility modules in Fern API reference

* ci: harden Fern devnotes publishing

* docs: keep Fern latest label stable

* docs: normalize Fern latest preview label

* docs: align Fern code reference nav

* docs: sync Fern code reference across versions

* docs: materialize Fern version pages

* ci: record Fern publish provenance

* docs: fix Fern generated API MDX

* docs: escape generated Fern API example

* ci: use stable Fern preview URL

* docs: flatten Fern API nav roots

* docs: use generated API overview pages

* ci: allow branch-dispatched Fern publish tests

* docs: update Fern CLI pin

* docs: dedupe release nav validation paths

* docs: address Fern review nits
2026-05-15 17:09:59 -03:00
Andre Manoel
46dc8b232a
docs: prepare Fern docs workflow (#622)
* docs: prepare fern generated artifacts

* docs: update fern migration artifacts

* docs: leave colab notebooks unchanged

* docs: add VLM recipe cards to Fern

* docs: trim Dev Notes sidebar

* docs: collapse older Dev Notes in sidebar

* docs: add Fern publishing workflows

* docs: gate Fern publishing on check

* docs: restrict hosted previews for fork PRs

* docs: clean Fern preview URL

* docs: cancel stale preview runs

* docs: clarify devnotes notebook reuse

* docs: clean older versions route

* docs: document Fern versioning conventions

* docs: add Fern release version guard

* docs: harden Fern release tag handling

* ci: let docs preview continue after fern failure

* ci: split docs preview deploy

* docs: clarify fern make commands

* ci: harden fern deploy workflows

* docs: render preview notebooks without outputs

* ci: keep docs preview deploy inline

* docs: align notebook code highlighting

* docs: show notebook snippet scrollbars

* docs: isolate fern preview check failures

* ci: align fern release docs behavior
2026-05-12 18:18:26 -03:00
Lawrence Lane
7b5854ca36
docs: migrate documentation from MkDocs to Fern (#581)
* docs: migrate documentation from MkDocs to Fern

Adds a Fern Docs build under fern/ alongside the existing mkdocs site.
Production target docs.nvidia.com/nemo/datadesigner with floating-latest
pointer (latest.yml symlink) at v0.5.8. Migrated all concept, recipe, plugin,
dev-note, and tutorial pages to MDX with NVIDIA theme and custom components
(Authors, MetricsTable, TrajectoryViewer, NotebookViewer, BadgeLinks).
Tutorial notebooks now render via NotebookViewer with captured outputs (text,
DataFrames, inline images) - new make targets generate-fern-notebooks and
generate-fern-notebooks-with-outputs drive the .py -> executed .ipynb -> Fern
JSON+TS pipeline, pinning docs to Python 3.13 to dodge pyarrow wheel issues
on 3.14. Python API reference is configured via Fern libraries: pointing at
data-designer-config; output is gitignored and regenerated locally with
'fern docs md generate'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: add datadesigner-docs agent skill

Captures the patterns established in the Fern migration so agents (and humans)
can maintain fern/ confidently. Modeled after NVIDIA-NeMo/Gym's
nemo-gym-docs SKILL.md, adapted for our floating-latest versioning,
notebook-with-outputs pipeline, dev-notes kit components, and the MDX gotchas
hit during migration (pymdown attr_list, --8<-- snippet syntax, frontmatter
authors-as-JSX-scope-variable, etc.). Routes triggers like "edit docs", "add
doc page", "regenerate notebooks", "update dev note", "add API reference" to
this skill.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: address PR review for Fern migration

- Delete stale fern/versions/_nav_order.yml (references non-existent
  ./versions/latest/pages/ — paths were never updated when latest/ was
  renamed to v0.5.8/, no consumer found in docs.yml or v0.5.8.yml).
- Remove unused custom components: Tag.tsx, CustomCard.tsx, Include.tsx
  (had its own untested markdown parser), ExpandableCode.tsx (broken in
  Fern SSR runtime). Drop expandable-code.css from docs.yml. Authors,
  BadgeLinks, MetricsTable, NotebookViewer, TrajectoryViewer remain
  (each has at least one call site).
- BadgeLinks: remove DEFAULT_BADGES with placeholder URLs; make `badges`
  prop required so we can never accidentally ship 'your-org/your-repo'.
- NotebookViewer: document the XSS trust boundary on output cells of
  format: "html". Outputs flow .py source → jupytext --execute → committed
  *.ts (review boundary). Add an inline comment at the dangerouslySetInnerHTML
  call site pointing back to the trust-model section.
- README: add Windows caveat on the latest.yml symlink — Windows users need
  core.symlinks=true before clone or Fern will reject the version config.
- Makefile: tighten generate-fern-notebooks source probe from `ls .../*.ipynb`
  (which can return success on non-file errors) to `[ -f docs/notebooks/1-the-basics.ipynb ]`,
  matching the reviewer's suggestion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: address @aschilling-nv review on fern/docs.yml

Three suggestions from the Fern review, all matching Curator's docs.yml
conventions:

- instances[0].url: drop the https:// protocol prefix to match Curator's
  shape (e.g. nemo-curator.docs.buildwithfern.com/nemo/curator).
- logo.href: was '/'; now points at /nemo/datadesigner/getting-started/welcome
  (the actual landing page) so clicking the logo lands on real content
  instead of the bare basepath.
- experimental.basepath-aware: true — opts into Fern's basepath-aware
  routing so internal links don't double-prefix the /nemo/datadesigner
  segment.
- redirects: also fix /nemo/datadesigner/index.html → getting-started/welcome
  (was bouncing to /latest, which is just the version slug); add
  /getting-started → /getting-started/welcome to mirror Curator's
  /home → /home/welcome convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: put dev notes overview timestamps on separate lines

Signed-off-by: Kirit93 <kthadaka@nvidia.com>
Made-with: Cursor

* docs: redesign dev-notes index with BlogCard component

Replaces the generic <CardGroup>/<Card> grid (same green icon × 10, date
glued to bottom of description) with a purpose-built BlogCard for the
dev-notes landing page.

Each card now has:
- Hero image (16:9, lazy-loaded, click-to-zoom via Fern's rmiz wrapper)
- ALL-CAPS date eyebrow as proper subtitle styling
- Title, 3-line clamped description
- Author byline at the bottom: avatar stack (overlapping) + first author
  name + "+N", pulling from the existing devnotes/.authors.yml registry
- Hover: NVIDIA-green border + subtle lift

Posts without a hero image fall back to a deterministic hash-based
gradient placeholder + monogram (DJB2 hash of href → HSL hue, with the
muddy-yellow band 40–90° remapped). Same post always gets the same look.

Notes:
- Image prop is React.ReactNode (not string) — pass <img> JSX from MDX
  so Fern's link rewriter can resolve the src to /_local/... in dev and
  /nemo/datadesigner/assets/... in prod. Raw string props bypass the
  rewriter and 404 in dev.
- Card href runs through a small withBasepath() helper since the <a>
  also bypasses Fern's link rewriter.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: flush blog-card hero images to the top of the card

Fern's prose stylesheet applies a top margin to <img> tags, and the
click-to-zoom wrapper Fern injects around each image (<span data-rmiz>)
inherits that margin too. Result: a ~1rem gap between the card's top
edge and the hero image.

Reset margin/padding on the rmiz wrapper spans + the img itself inside
.blog-card__media so the image renders flush against the top border.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: stop blog-card hero from opening Fern's click-to-zoom modal

When an <img> appears in MDX, Fern auto-wraps it with a click-to-zoom
shell (<span data-rmiz>...). On the dev-notes index that shell intercepts
clicks meant for the card's <a> wrapper, so clicking a hero opens a
lightbox AND tries to navigate.

Set pointer-events: none on the rmiz spans + img inside .blog-card__media
so clicks bubble straight to the parent <a> and the card behaves as a
single, predictable link target. Hover still works because pointer-events
on children doesn't block :hover on the ancestor <a>.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: render notebook markdown at build time with markdown-it-py

Replaces NotebookViewer's hand-rolled JS markdown parser (the one with
the ^@BR^@ sentinel the reviewer flagged as fragile) with build-time
rendering in the converter.

ipynb-to-fern-json.py now uses markdown-it-py (CommonMark + tables +
strikethrough + raw HTML) to render each markdown cell's source into
source_html, mirroring how code cells already store Pygments-highlighted
source_html. NotebookViewer's markdown branch becomes a single
dangerouslySetInnerHTML on the pre-rendered HTML, with a plain-escape
fallback for old snapshots.

Removes the dead JS helpers (renderMarkdown, isSafeUrl, UL_CLASS,
OL_CLASS) — ~60 lines of brittle regex-based markdown parsing.

Fixes broken rendering of:
- Blockquotes (showed literal > characters before)
- Nested content inside blockquotes (e.g. blockquote with bullet list)
- Fenced code blocks
- Tables
- Multi-paragraph list items

Includes regenerated fern/components/notebooks/*.{json,ts} for all 6
tutorials.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: rewrite recipes index + replace octicons download links with Fern Info callouts

The recipes/cards.mdx page was still in MkDocs Material format:
- <div class="grid cards" markdown> wrapper (no-op in MDX)
- :material-snake:, :material-database:, :material-tools:, etc. (rendered
  as literal text — Fern uses Font Awesome, not Material icons)
- !!! tip Prerequisite (mkdocs admonition syntax)
- [:material-book-open-page-variant: View Recipe] / [Download Code
  :octicons-download-24:] links with embedded icon shortcodes

Rewrite using Fern's native components: <CardGroup cols={2}> with <Card
title icon href> grouped by category (Code Generation, QA and Chat,
Trace Ingestion, MCP and Tool Use, Plugin Development). Each card has
one primary action (the recipe page); download lives on the recipe page
itself.

Replace the trailing "Download Code :octicons-download-24:" link on
every recipe page (and 2 dev notes) with a <Info title="Download Recipe">
callout pointing at the GitHub blob URL — matching PR #215's
convention. 12 occurrences across 12 files.

Also fixes 6 recipe pages whose frontmatter title was "Untitled"
(unfilled placeholder from auto-migration): text_to_python, basic_mcp,
pdf_qa, multi_turn_chat, product_info_qa, agent_rollout_distillation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): mirror main's content updates into v0.5.8 MDX pages

Forward-port the doc changes that landed in main since this branch was
cut, translating MkDocs admonition syntax to Fern components. Three
product changes drove the updates:

PR #594 — deprecate implicit default-provider routing:
- concepts/models/configure-model-settings-with-the-cli.mdx: deprecate
  "Change default provider" workflow + inline mark on `data-designer
  config list` output
- concepts/models/custom-model-settings.mdx: warning that `provider=`
  is now required on every ModelConfig
- concepts/models/default-model-settings.mdx: warning that the
  registry-level default-provider concept is deprecated
- concepts/models/model-providers.mdx: same warning at the top of the
  ModelProvider overview
- concepts/models/inference-parameters.mdx: add explicit `provider=
  "openai"` to the dalle ModelConfig example

PR #592 — async engine becomes the default:
- concepts/architecture-and-performance.mdx: rewrite Execution Model
  intro to mention both engines, qualify "How It Works" as sync-engine
  semantics, update Concurrency Formula and Throttle notes from "Sync
  engine caveat" to "Engine paths", and add a full new "## Async
  Engine" section (per-model timeouts, run outcomes / Early Shutdown,
  opt-out via DATA_DESIGNER_ASYNC_ENGINE=0). Add `provider="nvidia"`
  to the my-model example.
- concepts/custom_columns.mdx: note that sync `cell_by_cell`
  generators dispatch concurrently under the async engine; mock with
  `MagicMock(spec=ModelFacade)` so async methods are auto-detected.
- concepts/processors.mdx: warning that the async engine enforces
  row-count invariance in process_before/after_batch.
- devnotes/posts/async-all-the-way-down.mdx: append an "Update" callout
  noting the engine is now default, with a link to the Architecture
  page anchor.

All `!!! warning|note|tip "Title"` admonitions converted to Fern
<Warning|Note|Tip title="..."> components. Internal links to mkdocs
relative paths (`../../concepts/foo.md#anchor`) rewritten to canonical
Fern URLs (`/concepts/foo#anchor`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): address @andreatgretel review comments

Four issues from Andre's review pass:

1. /devnotes 404 (index.mdx:23) — section slug is /dev-notes, page slug
   is /dev-notes/overview. Fix the link in the landing page so visitors
   actually reach the dev notes index.

2. TrajectoryViewer.tsx final-answer body shown as literal markdown
   (line 66) — the renderer uses dangerouslySetInnerHTML but
   example-marcia.ts shipped raw markdown (**bold**, \n\n breaks). Visible
   on the deep-research devnote where the trajectory is defaultOpen.
   Pre-render body to HTML in the fixture (matches the original hand-coded
   format pre-migration); document the convention in the ToolCall.body
   doc comment so future fixtures don't regress.

3. Tutorials 5/6 (image generation/editing) ship with 0 captured outputs
   because Flux runs through OpenRouter and OPENROUTER_API_KEY isn't set
   at build time. Cannot regenerate without the key, so add a <Note> at
   the top of each wrapper page pointing readers at the Colab link to
   execute the cells live and see the generated images. Maintainers with
   the key in their environment should re-run
   `make generate-fern-notebooks-with-outputs` before merge to capture
   the snapshots.

4. Legacy nvidia-nemo.github.io/DataDesigner/* URLs in MDX prose (8
   occurrences across 5 files) rewritten to canonical Fern paths so
   visitors don't get sent back to the legacy GitHub Pages site once
   docs.nvidia.com/nemo/datadesigner becomes the production URL:
   - The single deep link in data-designer-got-skills.mdx →
     /concepts/models/default-model-settings
   - All other "documentation home" links (CONTRIBUTING ×2,
     async-all-the-way-down ×2, owning-the-model-stack, design-principles
     ×2) → /getting-started/welcome (the canonical landing slug, matches
     logo.href in docs.yml)

   Notebook .py source URLs are tracked separately as part of the
   notebook-regen work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): regenerate notebook snapshots with Flux outputs captured

Re-ran make generate-fern-notebooks-with-outputs with NVIDIA_API_KEY +
OPENROUTER_API_KEY set, now that we have a NVIDIA key with permission
on nemotron-3-nano-30b-a3b. All 6 tutorials regenerated; the two image
tutorials (5 and 6) which had been shipping with 0 outputs now have
captured Flux generations:

  1 the-basics:                12/15 outputs
  2 structured-outputs:        13/17 outputs
  3 seeding-with-a-dataset:    10/13 outputs
  4 providing-images:          13/17 outputs (1 image)
  5 generating-images:          8/10 outputs (2 images) ← was 0/12
  6 image-to-image-editing:     9/12 outputs (10 images) ← was 0/14

The two `<Note title="Run in Colab to see ...">` workarounds I added on
the 5/6 wrapper pages are no longer needed — outputs render inline now.
NotebookViewer's own "Run in Google Colab" banner is still rendered
from the wrapper's `colabUrl` prop, so the live-execute path stays one
click away.

Bumps the diff size noticeably (notebook 6 .ts is ~22MB of base64-
encoded PNGs from 10 edited images), but that's intentional — these
images are the proof points for what the Flux/MCP image-context
tutorials actually produce.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): unbreak SSR — shrink notebook image outputs + fix BlogCard React import

Two server-side render bugs surfaced when running `fern docs md generate &&
fern docs dev` (the static-preview path):

1. The 22 MB notebook 6 .ts module (full-resolution Flux PNGs from 10 edited
   images) tripped Fern's SSR module-evaluation step. Once that module
   failed to evaluate, the shared component bundle failed to load on every
   page, replacing each MDX body with `<span data-intent="error">Something
   went wrong!</span>` while the layout chrome continued to render.

   Fix in fern/scripts/ipynb-to-fern-json.py: after extracting an
   image/png output, pass it through Pillow to (a) downscale so the
   longest edge is at most 800 px, (b) re-encode as JPEG q=82 progressive
   (Flux outputs are photographic — JPEG compresses 5–10× better than PNG
   for this content). NotebookViewer's CellOutput interface gains a
   `mime` field so the data URL uses the actual encoded MIME type. Result:

       notebook 6: 22 MB → 4.6 MB
       notebook 5: 3.8 MB → 1.8 MB
       notebook 4: 514 KB → 116 KB
       (notebooks 1–3 unaffected — no image outputs)

2. fern/components/BlogCard.tsx referenced `React.ReactNode` twice without
   importing React. Other components in the kit use `import type
   { ReactNode } from "react"`; BlogCard was the outlier. Aligned the
   import style — even though this didn't end up being the trigger, leaving
   the dangling reference would have eventually caused a strict-mode SSR
   regression.

Sweep test against http://localhost:3000/nemo/datadesigner/* — landing,
concepts, tutorials (including 5/6 image notebooks), dev notes, recipes,
and code-reference topic pages all render with their content; no error
spans.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): add MkDocs-shape redirects for legacy URLs

The legacy site at https://nvidia-nemo.github.io/DataDesigner/ used
MkDocs-Material conventions (mkdocstrings + blog plugin + mkdocs-jupyter
+ directory URLs). Several path segments and page slugs differ from
Fern's slugified-title routing — search-engine indexed links and
copy-pasted bookmarks land on 404 without redirects.

Adds 30+ specific redirect rules covering every renamed surface:

- Tutorials: /notebooks/<filename>/ -> /tutorials/<title-slug>
  (page-title slugs differ from .ipynb filenames; one rule per notebook
   plus a README -> overview alias).

- Recipes: /recipes/<snake_subsection>/<snake_page>/ ->
  /recipes/<kebab-subsection>/<kebab-page>. Per-page rules for each of
  the 10 recipes (page titles diverged from .py filenames — e.g.
  basic_mcp -> basic-mcp-tool-use, search_agent -> nemotron-super-search-agent),
  followed by subsection :rest* fallbacks.

- Concepts: /concepts/mcp/* -> /concepts/tool-use-mcp/* (subsection
  rename, with & dropped, not -and-). Per-page rules for safety-and-limits
  -> safety-limits and configure-mcp-cli -> cli-configuration where
  page titles diverged from filenames.

- Code Reference: /code_reference/<module>/ ->
  /code-reference/topic-overviews/<module>. Per-page rules for the six
  underscored modules (column_configs, config_builder, run_config,
  sampler_params, validator_params, data_designer_config) since Fern's
  page-slug rule kebabs underscores.

- Plugins: filesystem_seed_reader -> file-system-seed-reader-plugins
  (Fern inserts hyphens between CamelCase words). example -> example-plugin,
  available -> available-plugin-list (page-title slugs).

- Dev Notes: blog plugin's /devnotes/posts/<slug>/ -> /dev-notes/<slug>.
  Per-page rules for text-to-sql -> text-to-sql-for-nemotron-super and
  rqa -> rqa-dataset (post titles diverged from filenames).

- /devnotes -> /dev-notes/overview (section landing).

MkDocs's directory-URL trailing-slash convention is handled natively by
Fern's runtime (both /foo and /foo/ return the same page), so no
explicit slash-strip rule is needed.

Smoke-tested all 34 legacy URLs against http://localhost:3000 — every
one resolves to a 200 page on the new structure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Kirit93 <kthadaka@nvidia.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Kirit93 <kthadaka@nvidia.com>
Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
2026-05-07 14:12:58 -03:00
Andre Manoel
2564834a47
fix: cache notebook builds to avoid flaky upstream model failures (#370)
* fix: cache notebook builds to avoid failures from flaky upstream models

The build-notebooks CI executes all tutorial notebooks on every run.
When an upstream model (e.g. black-forest-labs/flux.2-pro) is down, the
entire docs build fails even if no notebooks changed.

Add per-notebook caching based on source file SHA-256 hashes. Unchanged
notebooks are served from cache, and only modified ones are re-executed.
On the first CI run (empty cache), the workflow seeds the cache from the
last successful build artifact.

Also add a minimal test script (test_flux_image_gen.py) to reproduce the
flux.2-pro health check failure locally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review comments on notebook caching

- Don't write .sha256 during seeding so changed notebooks are detected
- Rename TMPDIR to SEED_TMPDIR to avoid shadowing the POSIX env var
- Use portable sha256 helper (sha256sum with shasum fallback)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: only seed cache when truly empty, restore hash writing

Skip artifact seeding when a partial cache was restored (it already has
correct per-file hashes). Only seed + write current hashes when the
cache dir is completely empty (true bootstrapping).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restrict artifact seed lookup to main branch

Prevents seeding from feature branch runs that may have different
notebook sources.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add actions:read permission for artifact seeding

The seed step uses gh run list and gh run download which require
actions:read. Without it, these calls silently fail and the cold-start
cache bootstrapping never executes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: only use notebook cache when called from build-docs

Scheduled Monday runs and manual workflow_dispatch should execute all
notebooks to catch regressions (e.g. library changes that break a
notebook). Caching is only used via workflow_call (from build-docs)
where the goal is fast, resilient doc deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use jq // empty to avoid "null" string on empty run list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add use_cache input flag to notebook and docs workflows

Replace event_name-based cache logic with an explicit use_cache boolean
input. Defaults:
- build-notebooks: workflow_call=true, dispatch=false, schedule=false
- build-docs: dispatch=true (toggleable), release=false

This gives full control over caching from the GitHub Actions UI.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-05 12:30:14 -03:00
Johnny Greco
1439bbea7e
chore: Improve CLI startup with lazy heavy import cleanup (#330)
* perf: defer heavy imports to improve CLI startup time

Move expensive imports (engine, models, controllers) out of the module-level import path so that data-designer --help and other non-generation commands no longer pay the full startup cost.

Key changes:
- Defer controller imports to inside command functions
- Remove eager re-export chains from CLI package __init__ files
- Move default-settings bootstrap into load_config_builder() and DataDesigner.__init__() instead of running at import time
- Add lazy __getattr__ exports in interface/__init__.py
- Replace module-level tokenizer init with cached lazy getter
- Fix ModelProvider import to use config layer instead of engine
- Update test mock paths to match new import locations

Reduces CLI import-time from ~1.67s to ~0.46s.

* perf: defer pandas/numpy in io_helpers and add config_list benchmark

- Replace eager `from lazy_heavy_imports import pd, np` in io_helpers
  with module-level __getattr__ (for backwards-compatible external
  access / test mocks) and function-level imports in the 3 functions
  that actually use them (read_parquet_dataset, smart_load_dataframe,
  _convert_to_serializable). Importing io_helpers no longer triggers
  pandas/numpy loading.
- Defer heavy imports in list and reset CLI commands into function
  bodies to avoid loading repositories, Rich, and prompt_toolkit at
  module import time.
- Add `config_list` (data-designer config list) measurement to the
  CLI startup benchmark with isolated cold measurement in a separate
  venv and a --skip-config-list-check flag.
- Update test mock paths to match new import locations.

* Refine lazy import usage and TYPE_CHECKING cleanup

* Run license header updater on PR-touched files

* fix: update sqlfluff mock target for lazy imports in test_sql

* perf: cache globals() in lazy __getattr__ to avoid repeated lookups

Add globals() caching and explanatory comment to all three lazy
__getattr__ implementations (lazy_heavy_imports, config/__init__,
interface/__init__) so subsequent attribute accesses bypass __getattr__.

* perf: lazy CLI command loading and deferred heavy import evaluations

- Add LazyTyperGroup to defer command module loading until invocation, allowing module-level imports in all CLI command files

- Split DataFrameSeedSource into seed_source_dataframe.py to isolate pandas dependency from other seed source classes

- Move TypeVar/TypeAlias definitions (DataT, NumpyArray1dT, RadomStateT, EngineT) to TYPE_CHECKING blocks with runtime fallbacks

- Wrap module-level constants in lru_cache (phone_number parquet data, jsonschema validator) to defer I/O and heavy imports to first use

- Update test mock targets to patch at usage-site for module-level imports

* refactor: use direct pandas import in seed_source_dataframe

Drop lazy-loading for pandas in DataFrameSeedSource; use direct import
for simplicity.

* update lazy import pattern

* update tests to use lazy import namespace

Switch test modules to import data_designer.lazy_heavy_imports as lazy
and reference heavy libraries through that namespace. This keeps heavy
imports deferred during module import and aligns tests with the new
lazy-import usage pattern.

* tighten import perf test thresholds

Document recent baseline timings and lower the allowed average
import time and timeout so regressions are detected sooner.

* document pandas import requirement

Clarify that Pydantic needs DataFrame resolved at module load and
that keeping the direct import preserves IDE typing support.

* increase timeout time

* use lazy pandas imports in visualization tests

- replace direct pandas usage with lazy.pd in visualization tests to avoid eager imports
- add TYPE_CHECKING pandas import and keep CLI controller imports sorted

* fix lazy pandas runtime usage and preview mocks

Switch sample-record handling to lazy pandas types so runtime paths no longer
depend on TYPE_CHECKING imports. Align preview controller tests to patch the
module-local DataDesigner symbol, preventing real engine invocation in save
results scenarios.
2026-02-18 16:24:15 -05:00
Andre Manoel
58734d09f0
test: add provider health checks script and CI workflow (#301)
* test: add e2e health checks for default provider models

Add parametrized tests that verify model connectivity for all
default providers (nvidia, openai, openrouter). Tests check API
key availability and skip when not configured.

* chore: move health checks out of e2e tests

- Convert pytest test to standalone script at scripts/health_checks.py
- Add `make health-checks` target
- Add CI workflow (weekly + on release + manual dispatch)
- Remove test_health_checks.py from tests_e2e/

* chore: make health checks non-blocking in CI

* fix: print traceback to stdout to avoid interleaving

* chore: add all provider API keys to health checks CI

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: remove temporary push trigger from health checks

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-06 15:18:35 -03:00
Eric W. Tramel
e6e58e692e
feat: MCP (Model Context Protocol) tool calling integration for LLM columns (#248) 2026-02-02 09:41:58 -05:00
Andre Manoel
e46fbd0759
fix: automate README sync for data-designer package builds (#266)
* fix: uv sync or build requires copying README

* update header (script doesn't check it)

* changing path, ensuring proper checks
2026-01-29 13:10:26 -03:00
Johnny Greco
f6a2c57f20
this is annoying (#256) 2026-01-28 15:24:38 -05:00
Johnny Greco
c19f35639f
chore: add publish script and update license headers (#253) 2026-01-28 08:47:34 -05:00
Johnny Greco
ae0665fa16
refactor: slim package refactor into three subpackages (#240)
* remove old structure

* major shuffle

* streamline project configs

* update make commands

* updates to make commands

* remove essentials

* initialize logger in interface

* uv lock

* ignore notepad

* update workflows

* fix e2e project config

* generate colab notebooks

* resolve default model settings in interface

* fix build commands

* update perf import make command

* cleaning up some slop

* update recipes

* move conftest files to tests/

* update subpackage readmes

* streamline config_logging

* use exports

* update perf import usage pattern

* update for IDE behavior with ruff

* remove engine's fixtures file

* add note to about lazy imports

* update dependencies

* update docs

* doc fixes

* uv lock

* updates to catch up with main

* clean up makefile

* remove package gitignores

* define deps only once

* isolate tests

* add test for protetion rule

* create temp dirs for isolated tests

* catch up to main

* update headers

* re apply changes

* better result summaries for isolated tests

* move exports into top-level init

* fix client importlib version syntax

* catch up with main
2026-01-27 13:53:20 -05:00
Nabin Mulepati
7181db3eb7
chore: lazy 3rd party imports (#222) 2026-01-15 14:51:54 -07:00
Johnny Greco
367de1a063
rename (#214) 2026-01-14 15:26:46 -05:00
Johnny Greco
d962c86843
fix: update example runner command with notebooks dep group (#204)
* update dep groups; use in makefile

* add quotes to packages in pip command
2026-01-13 11:49:31 -05:00
Johnny Greco
910d22dfa0
chore: add make commands to run examples as e2e tests (#199)
* update makefile

* fix bug
2026-01-12 15:37:00 -05:00
Mike Knepper
2cfff52581
feat: Seed reader plugins (#191) 2026-01-09 13:50:47 -06:00
Mike Knepper
6bf7698bc2
refactor: Overhaul to seed datasets (#167) 2026-01-08 11:48:14 -06:00
Andre Manoel
7fa9a413ac
docs: add option to open notebook directly in Colab (#126) 2025-12-12 15:15:26 -03:00
Andre Manoel
5d4ad10b11
chore: moving notebooks to jupytext and cleaning up workflows (#91)
* adding basic jupytext structure

Co-authored-by: Johnny Greco <jogreco@nvidia.com>

* few fixes

* first test for ci

* adding error intentionally to check workflow behavior

* test calling from other workflows

* typo

* trying as job instead

* couple of fixes

* checking path

* trying to fix path

* wrapping up

---------

Co-authored-by: Johnny Greco <jogreco@nvidia.com>
2025-12-03 17:29:07 -03:00
Johnny Greco
d4f32456a9
docs: welcome and concepts/columns (#43)
* add mike

* meth -> method; mod -> module in TOC

* messing with dark/light mode default

* staging stuff

* remove code examples from docstrings

* writing

* add columns with style
2025-11-17 17:07:01 -05:00
Johnny Greco
62d80c0c33 missing quote 2025-11-03 16:08:00 -05:00
Andre Manoel
f55625096a adding group for docs dependencies 2025-11-03 13:48:41 -03:00
Johnny Greco
e9bb737d6c update makefile and add license to workflow 2025-10-30 17:11:29 -04:00
Johnny Greco
6d92fd708d exclude autogenerated _version.py file in lint/format checks 2025-10-29 15:51:09 -04:00
Johnny Greco
928de69fe5 add dev with notebooks dependency option 2025-10-28 14:42:44 -04:00
Johnny Greco
cde4f33ae4 add check headers option 2025-10-27 19:14:52 -04:00
Johnny Greco
37e2bd0741 makefile improvements 2025-10-27 18:38:30 -04:00
Johnny Greco
6d9836e2ee add and run pre-commit 2025-10-27 18:10:36 -04:00
Johnny Greco
7ed5e78741 initial port 2025-10-27 14:29:12 -04:00