DataDesigner

mirror of https://github.com/NVIDIA-NeMo/DataDesigner synced 2026-05-24 09:48:29 +00:00

Author	SHA1	Message	Date
Johnny Greco	1fa29ad940	Merge branch 'main' into andreatgretel/docs/remove-code-reference-docs	2026-05-21 15:02:43 -04:00
Andre Manoel	ff5277088d	fix(ci): trust generated Agentic CI PRs (#643 ) * fix(ci): trust generated agentic CI PRs Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix(ci): authorize generated PR checks Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix(ci): pin authorized agentic checks Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix(ci): narrow agentic CI trust * fix(ci): reject stale agentic authorizations * fix(ci): serialize agentic authorization --------- Signed-off-by: Andre Manoel <amanoel@nvidia.com>	2026-05-20 09:27:04 -03:00
Andre Manoel	08ccf3412d	docs: remove docs code reference	2026-05-18 21:34:15 +00:00
dependabot[bot]	387be6f07d	ci: bump the all-actions group across 1 directory with 2 updates (#664 ) Some checks are pending CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details Bumps the all-actions group with 2 updates in the / directory: [cloudflare/wrangler-action](https://github.com/cloudflare/wrangler-action) and [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates). Updates `cloudflare/wrangler-action` from 3.15.0 to 4.0.0 - [Release notes](https://github.com/cloudflare/wrangler-action/releases) - [Changelog](https://github.com/cloudflare/wrangler-action/blob/main/CHANGELOG.md) - [Commits](`9acf94ace1...ebbaa15849`) Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 1.1.0 to 1.2.0 - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases) - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md) - [Commits](`2dee428461...e58924ea30`) --- updated-dependencies: - dependency-name: cloudflare/wrangler-action dependency-version: 4.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml dependency-version: 1.2.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-18 11:45:27 -03:00
Andre Manoel	cd604a57a4	ci: fix Fern devnotes artifact lookup (#667 ) Some checks failed CI / Test Config (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test Config (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test Config (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Config (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Config (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Coverage Check (Python 3.11) (push) Has been cancelled Details CI / Test (Python 3.11 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.11 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.10 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.13 on macos-latest) (push) Has been cancelled Details CI / Test (Python 3.10 on ubuntu-latest) (push) Has been cancelled Details CI / Test (Python 3.12 on ubuntu-latest) (push) Has been cancelled Details	2026-05-15 17:45:51 -03:00
Andre Manoel	765fccfcb0	docs: fix Fern versioned publishing (#656 ) * docs: fix Fern versioned plugin docs * docs: guard Fern release version content * docs: dedupe latest Fern release pages * ci: require latest Fern nav on release * docs: document Fern release prep * ci: automate Fern release sync * ci: publish Fern snapshots from docs branch * docs: keep Fern archive on docs branch * docs: harden Fern docs branch publishing * ci: preview Fern docs from archive branch * docs: include utility modules in Fern API reference * ci: harden Fern devnotes publishing * docs: keep Fern latest label stable * docs: normalize Fern latest preview label * docs: align Fern code reference nav * docs: sync Fern code reference across versions * docs: materialize Fern version pages * ci: record Fern publish provenance * docs: fix Fern generated API MDX * docs: escape generated Fern API example * ci: use stable Fern preview URL * docs: flatten Fern API nav roots * docs: use generated API overview pages * ci: allow branch-dispatched Fern publish tests * docs: update Fern CLI pin * docs: dedupe release nav validation paths * docs: address Fern review nits	2026-05-15 17:09:59 -03:00
Andre Manoel	1d203b1dda	feat(agentic-ci): decision-ready triage and daily PR fixes (#600 ) Some checks are pending CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details Publish Fern devnotes / deploy (push) Waiting to run Details * feat(agentic-ci): decision-ready triage and daily PR fixes Reorganize the weekly issue-triage report around recommended actions (close as resolved, close as duplicate, needs maintainer decision, ready for assignment, stuck PR, duplicate PRs, stale) so each flagged item carries action + evidence + rationale and can be resolved without opening it. Multi-comment split with i/N markers and orphan reconciliation when the report grows or shrinks. Flip the four daily audit suites with mechanical fix categories from read-only reports to opening one PR per run: - docs-and-references: broken-link, docstring-drift, arch-ref-rename - structure: missing-future, lazy-import - dependencies: transitive-gap, unused - code-quality: bare-except (draft until landing rate proven) test-health stays report-only (all candidates require inferring intent). The shared procedure - fix_backlog selection, finding-hash spec for stable cross-run identification, attempted_fixes lifecycle with two-strike escalation, allowlists, ranking, branch/PR conventions - lives in .agents/recipes/_fix-policy.md. Each suite recipe declares only its eligible categories, branch types, and test requirements. Workflow runs claude twice per suite (audit, then conditionally fix), each capped at the existing --max-turns 50. Fix call is gated on non-empty fix_backlog and skipped entirely for test-health. * fix(agentic-ci): address review findings before merge - Map per-package test targets explicitly in _fix-policy.md (Makefile exposes test-config/test-engine/test-interface, not test-<package>). - Use github-actions[bot] noreply identity for commits the recipes produce. - Refresh fix_backlog.data when an id already exists so the fix phase cannot drive a PR from stale data after the underlying file changed. - Stop time-pruning closed/abandoned attempted_fixes entries — pruning before the two-strike threshold erases the history needed to escalate. Single-strike entries now age out only via the 200-entry cap. - Disambiguate bare-except findings within the same function by including a try-body hash in the finding id. - Audit grep for code-quality now matches both `except:` and `except BaseException:`, in parity with the fix eligibility. - Restrict transitive-gap fix eligibility to cases where a sibling package already declares the dep (avoids inventing version specifiers from scratch). - Issue-triage workflow handles multi-part reports in both the fallback post step and the job summary; recipe always writes numbered parts. * fix(agentic-ci): close residuals from review pass 2 - Replace remaining `make test-<package>` references with pointers to the mapping table; only the table itself uses that placeholder now. - Fix `gh api --paginate \| jq \| length` returning per-page counts: slurp with `jq -s 'add // 0'` to get a single total. - Compare posted-comment count to expected part count so a partial post (agent posted part 1 but not 2/3) triggers the fallback instead of being silently treated as success. - Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're not at the mercy of the runner's default shell. - Disambiguate bare-except findings whose try-body hashes collide by adding a per-function ordinal to the canonical_key. - Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at` (the schema has no `first_seen` field). * fix(agentic-ci): identity-based partial-post detection in triage fallback Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an identity-based check that extracts every i/N marker seen in today-dated bot comments and verifies each expected i is present. A duplicate post of one part can no longer mask a missing other. * fix(agentic-ci): close remaining bot-review findings - Exempt two-strike attempted_fixes entries from the 200-entry cap eviction. Cap now evicts non-two-strike oldest-first by attempts[0].at; two-strike entries are silently-forgotten only in the pathological all-200-are-two-strike case (itself a signal). - Specify the attempted_fixes PR-marker reconciliation algorithm: scan open PR bodies for the `<!-- agentic-ci finding=<id> -->` marker and back-fill missing entries. - Tighten the daily workflow conditionals to gate on explicit step outcomes (steps.audit.outcome == 'success' rather than success()) so a future pre-audit gate cannot accidentally trip the fix step. * fix(agentic-ci): close Greptile pass-2 findings (timeout, re-verify wording) - Bump daily-suite job timeout from 20 to 40 minutes. The split into two sequential `claude --max-turns 50` invocations can saturate a 20-minute budget; a mid-fix SIGTERM would leave an orphaned branch and inconsistent runner-state. - Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids rebuilding fix_backlog from scratch but does NOT override the per-candidate re-verification step required by _fix-policy.md step 4.1 (re-grep / re-read the specific file the candidate points at). Single-candidate re-verification is required; whole-codebase re-scanning is forbidden. * fix(agentic-ci): close Greptile pass-3 P1s in triage fallback - Guard `jq capture()` with a `test()` select. `capture()` errors on non-match instead of returning empty, which would truncate SEEN_PARTS if any unrelated today-dated bot comment lacks the triage marker (e.g. from a sibling workflow). Adding the test() guard ensures capture() only runs on bodies that already match. - Iterate the MISSING[] array when posting fallback parts, not the full PARTS[] array. Posting all parts when only some were missing was creating duplicate comments for the parts the agent already successfully posted. * fix(agentic-ci): close johnnygreco review-pass warnings Address the five Warnings from the 2026-05-07 review focused on the trust boundary for autonomous PR generation. Five workflow/policy adjustments shrink the surface where agent compliance is load-bearing: - Workflow-level scope gate. After the fix step, re-derive the diff against `origin/main` and validate against the per-suite path allowlist (regex mirrored from `_fix-policy.md`), the 50-LOC cap, and the 3-file cap. On violation, close the PR with `--delete-branch` and flip the `attempted_fixes` entry from `open` to `abandoned` so two-strike logic still sees the failure. The recipe alone could not bind the agent's path choices; the workflow now does. - Dependencies install-dev verification. For the dependencies suite only, re-run `make install-dev` after the scope gate so the agent's pyproject edit is exercised against the lockfile resolver. Closes the PR if `install-dev` fails — catches the failure mode where the per-package test target passed against the old cached lockfile. - Flip matrix-job `cancel-in-progress` from true to false. A cancellation between the agent's git push and `gh pr create` would leave an orphaned branch with no `attempted_fixes` record; reconciliation only covers PRs that were opened. Queueing a duplicate run is the lesser evil. `_fix-policy.md` Atomicity section now documents the trade-off. - Allow `/tmp/audit-{{suite}}.md` in `_phase-audit.md`'s "do not modify outside `{{memory_path}}/`" directive. A literal-minded agent could refuse to write the report file, which would break the job summary, artifact upload, and the fix phase's audit context. - Always upload the agent log artifact (was `if: failure()` only) and include `runner-state.json`. For autonomous mode, the most interesting failure is "the workflow succeeded but the PR was wrong"; the stream-json log is the only way to look back days later. Also takes johnnygreco's Suggestion 2: spell out in the policy doc that the `draft_until_proven` flip is the sole human-gated promotion step in the fix policy and must not be automated. Greptile and the github-actions auto-reviewer's findings were already closed in the prior pass-2/pass-3 commits; no action needed on those. * fix(agentic-ci): close Codex review-pass-2 findings on workflow gates Codex flagged five issues in the prior commit's scope/lockfile gates. This commit closes all five: - HIGH: Wrong-PR targeting. Both gates selected the last globally-open attempted_fixes entry, which could match a stale orphan from a prior crashed run rather than the PR opened by this run. Adds a pre-fix snapshot step that captures `(id, attempts-length)` pairs before the fix runs, and changes the post-fix selectors to require that the entry's attempts count grew during this run. - HIGH: Docstring-only enforcement gap on the docs-and-references suite. The .py path allowlist was at workflow level but the docstring-only caveat was still policy-only. Adds an AST-based check: for each .py file changed, parse the post-change tree, collect docstring line ranges (module/class/function), then verify every added line in the diff is either inside a docstring, a comment, or whitespace. Verified locally with both pass and fail fixtures. - MEDIUM: Diff-ref mismatch. Gates diffed `origin/main...HEAD` rather than `origin/main...origin/$BRANCH`, so a misbehaving agent that left HEAD pointing elsewhere would have validated the wrong tree. Now fetches `origin/$BRANCH` first and prefers that ref. Falls back to HEAD only if fetch fails (with a warning). - MEDIUM: FILE_COUNT bug. `grep -c '.' \|\| echo 0` produced "0\n0" on empty diff, breaking the downstream integer comparison. Replaces with `mapfile -t FILE_ARR` + `${#FILE_ARR[@]}`, which is correct for any input including empty. - LOW: Non-atomic JSON writes. The runner-state mutations could leave the file half-written if the workflow was cancelled mid-write. Switches both gates to the temp-file + os.replace pattern. Also: dependencies-lockfile gate now does an explicit `git checkout --detach origin/$BRANCH` before re-running install-dev, so verification runs against what was actually pushed rather than relying on local working-tree state. * fix(agentic-ci): gate fix + scope_gate steps on snapshot.outcome Greptile review on `872d5617` flagged that the fix step's custom `if:` expression bypasses GitHub Actions' implicit success() check. Without explicitly referencing steps.snapshot.outcome, a snapshot failure (corrupt runner-state, disk error) would let the fix step run anyway. The scope gate's `jq --slurpfile prior /tmp/prior-attempted-fixes.json` would then exit non-zero on the missing file, leave OPEN empty, and hit the "nothing to validate" early-exit — silently approving whatever the agent pushed. Adds steps.snapshot.outcome == 'success' to both the fix step's condition (the actual fix) and the scope_gate step's condition (belt-and-suspenders against future refactors). * fix(agentic-ci): harden daily fix gates Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix(agentic-ci): validate all grown fix attempts * fix(agentic-ci): harden post-fix gates --------- Signed-off-by: Andre Manoel <amanoel@nvidia.com>	2026-05-12 18:54:01 -03:00
Andre Manoel	46dc8b232a	docs: prepare Fern docs workflow (#622 ) * docs: prepare fern generated artifacts * docs: update fern migration artifacts * docs: leave colab notebooks unchanged * docs: add VLM recipe cards to Fern * docs: trim Dev Notes sidebar * docs: collapse older Dev Notes in sidebar * docs: add Fern publishing workflows * docs: gate Fern publishing on check * docs: restrict hosted previews for fork PRs * docs: clean Fern preview URL * docs: cancel stale preview runs * docs: clarify devnotes notebook reuse * docs: clean older versions route * docs: document Fern versioning conventions * docs: add Fern release version guard * docs: harden Fern release tag handling * ci: let docs preview continue after fern failure * ci: split docs preview deploy * docs: clarify fern make commands * ci: harden fern deploy workflows * docs: render preview notebooks without outputs * ci: keep docs preview deploy inline * docs: align notebook code highlighting * docs: show notebook snippet scrollbars * docs: isolate fern preview check failures * ci: align fern release docs behavior	2026-05-12 18:18:26 -03:00
dependabot[bot]	eb0b9d3226	ci: bump NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml (#621 ) Bumps the all-actions group with 1 update: [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates). Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 0.94.1 to 1.1.0 - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases) - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md) - [Commits](`211c302d64...2dee428461`) --- updated-dependencies: - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml dependency-version: 1.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-08 17:10:38 -03:00
Andre Manoel	b502dd3575	ci: add graphify structural impact analysis to PR review and structure audit (#567 ) * ci: add graphify structural impact analysis to PR review and structure audit Add a graphify-based AST analysis tool that builds a directed graph of the codebase (~2s, no LLM calls) to detect architectural impact. Integrates into both the PR review workflow (pre-computed before claude runs) and the Wednesday structure audit (with week-over-week diff). PR review: extracts changed files against the full codebase graph, reports risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction violations, and cross-package dependencies. Output saved to /tmp and read by the review agent. Structure audit: produces god node rankings, cross-package edge summary table, import violation detection, and graph diff against previous week's cached graph. Baselines saved for runner memory trend tracking. * fix: harden graphify integration - security, correctness, and CI weight - Fix KeyError: god_nodes() returns 'degree' not 'edges' (3 call sites) - Fix deduped vs raw violation count inconsistency in baselines.json - Security: run structural_impact.py from base-branch checkout so fork PRs cannot inject code that executes with GH_TOKEN in scope - Add --repo-root flag so the tool resolves package paths correctly when invoked from a different checkout directory - Replace make install-dev + .venv with lightweight /tmp/graphify-venv (only graphifyy needed, saves ~2min CI per PR review) - Add graphify-out/ to .gitignore (9MB graph cache is CI-only) * fix: pin graphifyy version and fix dedup truncation Pin graphifyy==0.4.23 in both CI workflows to prevent breaking changes from unpinned installs. Fix _dedup() label truncation at 30 chars that could merge distinct entities sharing a common prefix. * fix(ci): use array expansion for changed-files arg to handle special filenames Replace unquoted $CHANGED_PY word-split with mapfile + array expansion to prevent glob expansion and correctly handle filenames containing spaces or special characters. * fix: derive changed nodes from graph and improve MEDIUM risk reason Derive changed_node_ids from the already-built graph by matching source_file paths instead of running a separate extraction pass. Removes implicit dependency on graphify ID stability across independent extractions. Fix MEDIUM risk reason to reflect the actual trigger (cluster spread vs high-connectivity entity) instead of always reporting cluster count. * fix: address Codex review findings - security, edge coverage, dedup, stale artifacts Split the workflow step to isolate GH_TOKEN from graphifyy execution, preventing a compromised package release from exfiltrating write-scoped tokens. Scan both edge directions in _cross_package_edges so inbound dependents and violations where the changed node is the target are visible. Detect deleted files and report them as a risk signal. Include relation type in dedup key so distinct edge types between the same labels are not collapsed. Clean stale /tmp artifacts before running analysis to prevent reruns from reading old reports. * fix: address review feedback - type annotations, hoist imports, narrow except, isolate daily graphify - structural_impact.py: - replace bare _build_graph dict return with frozen _Analysis dataclass - add G: Any annotation on _cross_package_edges (STYLEGUIDE: all params typed) - hoist `from graphify.export import to_json` and `from networkx.readwrite import json_graph` to module top (no perf justification for deferred import) - narrow `except Exception` in graph-diff fallback to (JSONDecodeError, KeyError, TypeError, OSError) - agentic-ci-daily.yml: install graphifyy into /tmp/graphify-venv instead of the project .venv, matching agentic-ci-pr-review.yml. Keeps graphify's transitives (networkx) out of the project venv permanently. - structure/recipe.md: invoke the tool via /tmp/graphify-venv/bin/python to match the workflow change. * feat(ci): warn when changed files touch unknown packages A new package under packages/ that isn't in _PACKAGE_SUBDIRS is silently absent from the graph - the analyzer would falsely report LOW risk with 0 entities. Add a _Note line in the changed-files report when any changed or deleted file lives under packages/<unknown>/, so the failure mode the analyzer is supposed to surface isn't itself silent. _KNOWN_PACKAGE_DIRS is derived from _PACKAGE_SUBDIRS so future additions stay in sync without a second source of truth.	2026-05-05 14:47:52 -03:00
dependabot[bot]	1feb57ec03	ci: bump NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml (#596 ) Some checks are pending CI / Test Config (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.10 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details Bumps the all-actions group with 1 update: [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates). Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 0.93.0 to 0.94.1 - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases) - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md) - [Commits](`38cee3a372...211c302d64`) --- updated-dependencies: - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml dependency-version: 0.94.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>	2026-05-04 11:07:46 -03:00
Andre Manoel	482ab5a224	ci: raise agent audit turn limit and preserve logs (#571 ) Some checks failed CI / Test Engine (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Engine (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Engine (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on macos-latest) (push) Waiting to run Details CI / Test Interface (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / Test Interface (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Coverage Check (Python 3.11) (push) Waiting to run Details CI / End to end test (Python 3.12 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on macos-latest) (push) Waiting to run Details CI / End to end test (Python 3.12 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.13 on ubuntu-latest) (push) Waiting to run Details CI / Test (Python 3.10 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.10 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.11 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on ubuntu-latest) (push) Blocked by required conditions Details CI / Test (Python 3.13 on ubuntu-latest) (push) Blocked by required conditions Details CI / End to end test (Python 3.10 on ubuntu-latest) (push) Waiting to run Details CI / End to end test (Python 3.11 on ubuntu-latest) (push) Waiting to run Details CI / Lint and Format Check (push) Waiting to run Details CI / Test (Python 3.11 on macos-latest) (push) Blocked by required conditions Details CI / Test (Python 3.12 on macos-latest) (push) Blocked by required conditions Details Publish devnotes / deploy (push) Has been cancelled Details * ci: raise agent audit turn limit and preserve logs The Friday test-health audit hit the 30-turn cap on its first-ever run (2026-04-24) and the agent log was discarded with the self-hosted runner. Heavier recipes need more room, and the next failure should be diagnosable. - Raise --max-turns from 30 to 50 - Switch --output-format from text to stream-json so events are emitted during the run instead of only at process exit; prefix with stdbuf -oL -eL to line-buffer the pipe - Upload /tmp/claude-audit-log.txt and /tmp/audit-<suite>.md as an artifact (if: always(), 14-day retention) using the upload-artifact SHA already pinned in build-notebooks.yml Signed-off-by: Andre Manoel <amanoel@nvidia.com> * ci: disambiguate audit artifact name across run attempts actions/upload-artifact@v4+ rejects duplicate names within a workflow, and re-running a failed run reuses the same github.run_id. Append github.run_attempt so re-runs upload successfully instead of failing at the exact moment the artifact is most useful. Found by Codex review of #571. Signed-off-by: Andre Manoel <amanoel@nvidia.com> * ci: only upload agent log on failure Raise the bar for persisting the full verbose stream-json event log: we only need it when we're actually debugging a failure, and the audit report itself still lands in the step summary on success. Shrinks the window where tool inputs, read file contents, or other verbose-stream detail could end up in a 14-day artifact. Addresses the minor privacy finding from Codex review of #571. Signed-off-by: Andre Manoel <amanoel@nvidia.com> * ci: drop raw agent log from job summary With --output-format stream-json the previous tail -100 of the agent log emitted raw NDJSON into the GH Actions UI summary, which is unreadable. The audit report itself (/tmp/audit-<suite>.md) already carries the human-readable payload, and the full event stream is available as an on-failure artifact, so the raw tail was redundant and worse than nothing for the summary surface. Also rewords the fallback message to point at the artifact when no report lands (typically a failure). Signed-off-by: Andre Manoel <amanoel@nvidia.com> --------- Signed-off-by: Andre Manoel <amanoel@nvidia.com>	2026-04-28 15:53:48 -03:00
dependabot[bot]	8266eb79a9	ci: bump the all-actions group across 1 directory with 5 updates (#558 ) * ci: bump the all-actions group with 5 updates Bumps the all-actions group with 5 updates: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [actions/checkout](https://github.com/actions/checkout) \| `4` \| `6` \| \| [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) \| `7.6.0` \| `8.1.0` \| \| [actions/cache](https://github.com/actions/cache) \| `5.0.4` \| `5.0.5` \| \| [cloudflare/wrangler-action](https://github.com/cloudflare/wrangler-action) \| `3.14.1` \| `3.15.0` \| \| [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates) \| `0.88.1` \| `0.93.0` \| Updates `actions/checkout` from 4 to 6 - [Release notes](https://github.com/actions/checkout/releases) - [Commits](https://github.com/actions/checkout/compare/v4...v6) Updates `astral-sh/setup-uv` from 7.6.0 to 8.1.0 - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](https://github.com/astral-sh/setup-uv/compare/v7.6...08807647e7069bb48b6ef5acd8ec9567f424441b) Updates `actions/cache` from 5.0.4 to 5.0.5 - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](`668228422a...27d5ce7f10`) Updates `cloudflare/wrangler-action` from 3.14.1 to 3.15.0 - [Release notes](https://github.com/cloudflare/wrangler-action/releases) - [Changelog](https://github.com/cloudflare/wrangler-action/blob/main/CHANGELOG.md) - [Commits](`da0e0dfe58...9acf94ace1`) Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 0.88.1 to 0.93.0 - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases) - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md) - [Commits](`2a49420d5a...38cee3a372`) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: astral-sh/setup-uv dependency-version: 8.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: actions/cache dependency-version: 5.0.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: all-actions - dependency-name: cloudflare/wrangler-action dependency-version: 3.15.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml dependency-version: 0.93.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions ... Signed-off-by: dependabot[bot] <support@github.com> * ci: pin actions/checkout to SHA in agentic-ci-issue-triage --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andre Manoel <amanoel@nvidia.com>	2026-04-21 15:21:19 -03:00
Andre Manoel	addece9828	fix(ci): grant permissions to reusable workflow calls in build-docs and pack-tutorials (#561 ) The top-level `permissions: {}` added in #517 restricts all jobs to zero permissions by default. The `build-notebooks` jobs that call the reusable workflow did not override this, so GitHub Actions refused to start them (startup_failure). Add the required `actions: read` and `contents: write` permissions to both calling jobs. Fixes the v0.5.7 release docs build failure.	2026-04-21 12:48:29 -03:00
Andre Manoel	b220f3697b	ci: add daily audit suites with 5 rotating recipes and scheduled workflow (#543 ) * ci: add daily audit suites with 5 recipes and scheduled workflow Add the daily maintenance infrastructure (Phase 2+3 of the agentic CI plan). A new workflow runs one audit suite per weekday via day-of-week rotation, with runner memory persisted via actions/cache. Recipes: docs-and-references (Mon), dependencies (Tue), structure (Wed), code-quality (Thu), test-health (Fri). Each targets gaps that CI and ruff don't cover: cross-reference validation, transitive dep analysis, lazy import compliance, complexity trends, and test-to-source mapping. Reports go to the Actions step summary. Code changes use /create-pr. * ci: add executable smoke checks and harden runner memory Add executable smoke checks to test-health and code-quality recipes that exercise real code paths (config build, validate, import timing, registry completeness, error hierarchy, input rejection) without needing an LLM provider. Checks are split into fixed canaries (same every run) and creative checks (agent varies inputs each run). Harden runner memory: define JSON schema in _runner.md with TTL and size rules, validate state file after agent runs, only update last_run on success, drop unused audit-log.md. Add make install-dev workflow step so recipes can run Python against the installed packages. * ci: fix codex review findings - test paths, provider check, step gating Fix issues found by Codex review: - Fix test paths: tests/ does not exist at repo root, use packages//tests/ and packages/data-designer/tests/test_import_perf.py - Remove DataDesigner(model_providers=[]) from smoke checks - raises NoModelProvidersError; keep config-layer checks only - Fix audit step gating: remove continue-on-error, use step outcome to gate runner memory update (\|\| true + continue-on-error made the step always "succeed", defeating the success() condition) ci: fix review findings - heredoc, state validation, lazy import wording Fix heredoc with indented EOF terminator that never terminates - replace with printf. Run state validation on all outcomes (not just success) so corrupted state from a failed audit is caught before caching. Only stamp last_run when audit succeeds. Align test-health lazy import section with its own Constraints (report count only, don't duplicate structure audit). Also fixes datetime.utcnow() deprecation and shell variable injection in Python string by using os.environ instead.	2026-04-17 14:48:55 -03:00
Andre Manoel	6ef49538a4	fix: use pull_request_target for agentic CI on fork PRs (#541 ) * fix: use pull_request_target for agentic CI on fork PRs * fix: read recipe files from base branch to prevent prompt injection Recipe files define the agent's prompt. When using pull_request_target, the fork's HEAD is checked out, so a malicious fork could craft recipe files to exfiltrate API secrets via prompt injection. Fix by adding a second sparse checkout from the base branch for .agents/recipes/ and reading prompts from there instead of the fork tree. * fix: align actions/checkout version for base-recipes checkout Match the base-branch recipe checkout to v6.0.2 (same SHA as the PR branch checkout) for consistency. * fix: move expression interpolations to env vars in gate and review jobs Replace direct ${{ }} interpolation in run: blocks with env vars. Most values are GitHub-controlled, but github.event.label.name can contain arbitrary characters and could break shell quoting. Moving everything to env: is consistent with the injection-hardening pattern applied in the rest of the workflow.	2026-04-15 19:11:29 -03:00
Andre Manoel	f267e19a60	fix(ci): replace yq with Python nav patching in publish-devnotes (#548 ) The yq JSON roundtrip was mangling the entire mkdocs.yml file (indentation, quoting, comments), causing mike deploy to fail. Extract a Python script that surgically replaces only the Dev Notes nav block, leaving all other content byte-identical.	2026-04-14 16:03:49 -03:00
Andre Manoel	1a237d95d0	fix: text-to-sql devnote date, images, and publish-devnotes nav (#546 ) - Update post date from 2026-03-11 to 2026-04-14 so it appears as the newest post on the devnotes page. - Replace raw <img> tags with markdown image syntax so mkdocs rewrites relative paths correctly for the blog plugin's slug-based URLs. - Overlay mkdocs.yml from HEAD in publish-devnotes workflow so new nav entries are included in devnotes-only rebuilds.	2026-04-14 15:48:23 -03:00
dependabot[bot]	abe5c2d177	ci: bump the all-actions group with 5 updates (#539 ) * ci: bump the all-actions group with 5 updates Bumps the all-actions group with 5 updates: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [actions/checkout](https://github.com/actions/checkout) \| `4.3.1` \| `6.0.2` \| \| [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) \| `7.6.0` \| `8.0.0` \| \| [actions/download-artifact](https://github.com/actions/download-artifact) \| `7.0.0` \| `8.0.1` \| \| [actions/upload-artifact](https://github.com/actions/upload-artifact) \| `6.0.0` \| `7.0.1` \| \| [NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml](https://github.com/nvidia-nemo/fw-ci-templates) \| `0.65.12` \| `0.88.1` \| Updates `actions/checkout` from 4.3.1 to 6.0.2 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v4.3.1...de0fac2e4500dabe0009e67214ff5f5447ce83dd) Updates `astral-sh/setup-uv` from 7.6.0 to 8.0.0 - [Release notes](https://github.com/astral-sh/setup-uv/releases) - [Commits](`37802adc94...cec208311d`) Updates `actions/download-artifact` from 7.0.0 to 8.0.1 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](`37930b1c2a...3e5f45b2cf`) Updates `actions/upload-artifact` from 6.0.0 to 7.0.1 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](`b7c566a772...043fb46d1a`) Updates `NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml` from 0.65.12 to 0.88.1 - [Release notes](https://github.com/nvidia-nemo/fw-ci-templates/releases) - [Changelog](https://github.com/NVIDIA-NeMo/FW-CI-templates/blob/main/CHANGELOG.md) - [Commits](`21f18ae8b6...2a49420d5a`) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.2 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: astral-sh/setup-uv dependency-version: 8.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: actions/download-artifact dependency-version: 8.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: actions/upload-artifact dependency-version: 7.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: all-actions - dependency-name: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_semantic_pull_request.yml dependency-version: 0.88.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: all-actions ... Signed-off-by: dependabot[bot] <support@github.com> * ci: skip docs preview deploy for Dependabot PRs GitHub does not expose repository secrets to Dependabot PRs, so the Cloudflare Pages deploy always fails with a missing API token. Skip the entire job when the actor is dependabot[bot]. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andre Manoel <amanoel@nvidia.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>	2026-04-13 20:28:38 -03:00
Andre Manoel	82c1a69739	ci: add PR hygiene automation (linked issue check + stale PR cleanup) (#521 ) * ci: add PR hygiene automation (linked issue check + stale PR cleanup) Add two workflows to enforce contribution quality and clean up abandoned PRs: - pr-linked-issue.yml: required status check that validates external PRs reference a triaged issue. Collaborators bypass. Re-triggers automatically when a maintainer adds the `triaged` label to the linked issue. - pr-stale.yml: daily cron that reminds authors of failing checks after 7/14 days of inactivity and auto-closes after 14/28 days (external/collaborator). Respects `keep-open` label. New labels created: `triaged`, `task`, `keep-open`. Closes #518 Signed-off-by: Andrea Manoel <amanoel@nvidia.com> * ci: add agentic repository triage workflow Add a weekly scheduled workflow that uses Claude to triage all open issues and PRs, producing a combined dashboard report on a pinned tracking issue. - New recipe (.agents/recipes/issue-triage/) classifies issues, checks staleness, cross-references merged PRs, detects duplicates, and flags PR health problems (missing linked issues, failing checks, orphaned PRs) - New workflow (.github/workflows/agentic-ci-issue-triage.yml) runs every Monday 10:00 UTC on the agentic-ci runner, with manual dispatch support - pr-stale.yml now adds needs-attention label to linked issues when a PR is auto-closed, bridging the two workflows via labels * docs: document stale PR policy and auto-retrigger in CONTRIBUTING.md * fix: address review findings in PR hygiene workflows - pr-linked-issue: fix comment gate so failure comments are posted - pr-stale: upgrade issues permission to write for labeling - pr-stale: compare reminder timestamp against last activity so push/comment actually resets the stale timer * fix: use --body-file in retrigger job to avoid shell quoting issues PR bodies with backticks or unmatched quotes would break the gh pr edit --body "$NEW_BODY" call. Write to a temp file and use --body-file instead. * fix: retrigger job drops PRs after the first jq outputs newline-separated numbers but GITHUB_OUTPUT only preserves the first line. Convert to space-separated so the for loop processes all matching PRs. * fix: harden workflows against shell injection - Move attacker-influenced values (${{ user.login }}, step outputs) from expression interpolation in run: blocks to env vars - Replace echo "$PR_BODY" \| grep with write-to-file + grep-file to avoid shell expansion of untrusted PR body content - Same treatment for PR body handling in retrigger and stale jobs * refactor: replace peter-evans actions with gh api calls Remove peter-evans/find-comment and peter-evans/create-or-update-comment third-party action dependencies. Replace with gh api calls for finding, creating, updating, and deleting bot comments. Eliminates supply chain risk from unpinned third-party actions. * docs: add pull_request_target security comment --------- Signed-off-by: Andrea Manoel <amanoel@nvidia.com>	2026-04-13 20:26:02 -03:00
Andre Manoel	aee3d3ff90	ci: publish devnotes independently of releases (#536 ) * ci: add workflow to publish devnotes independently of releases Adds a GitHub Actions workflow that rebuilds the `latest` docs alias when devnotes change on main, so blog posts go live without cutting a package release. * ci: pin actions to commit SHAs and restrict default permissions Address Greptile review findings: - Pin checkout, setup-uv, and download-artifact to commit SHAs matching the pattern from #517 - Add top-level permissions: {} to restrict default token scope * ci: build devnotes from last deployed state, not main Instead of building the full site from main (which could include unreleased docs), checkout the commit that latest was last built from (tracked in gh-pages commit messages) and overlay only docs/devnotes/ from main. Download notebooks from the last successful build-docs run instead of rebuilding them. * ci: add actions:read permission for notebook download The gh run list/download calls need actions:read on GITHUB_TOKEN, which is denied by the top-level permissions: {} block.	2026-04-13 14:39:11 -03:00
Andre Manoel	47be28c799	fix: tune Dependabot config and fix DCO assistant bugs (#534 ) * fix: restrict Dependabot pip updates to security-only The Dependabot config added in #517 included weekly version-bump PRs for all three pip packages. This would generate noisy PRs for routine dep updates we don't need. Set open-pull-requests-limit: 0 on the pip ecosystems so only CVE-triggered security updates open PRs. GitHub Actions weekly bumps are kept as-is to keep SHA pins current. * fix: group Dependabot Actions PRs and fix DCO allowlist - Add a Dependabot group to bundle all GitHub Actions updates into a single weekly PR instead of one per action - Fix DCO allowlist: dependabot -> dependabot[bot] to match the actual GitHub username (the old value never matched, but there were no Dependabot PRs before #517 to expose the bug) * fix: align DCO assistant if-condition with custom sign-off text The step's if-condition checked for the default sign-off text but custom-pr-sign-comment uses different wording. This meant the issue_comment trigger was always skipped - sign-offs only worked by accident when a subsequent push re-triggered the action via pull_request_target.	2026-04-13 12:12:26 -03:00
Andre Manoel	54d51bdf89	chore: harden CI supply chain (#517 ) * ci: harden CI supply chain Pin all GitHub Actions to commit SHAs to prevent tag-based supply chain attacks (same class as CVE-2025-30066). Replace softprops/action-gh-release (single-maintainer, no security policy) with gh CLI. Add top-level permissions: {} to all workflows that lacked it, enforcing least-privilege by default. Enable Dependabot for GitHub Actions and pip dependencies. Closes #471 * fix: add dependabot pip entries for each sub-package The root directory has no pyproject.toml; the actual packages live under packages/data-designer-config, packages/data-designer-engine, and packages/data-designer.	2026-04-13 10:34:26 -03:00
Andre Manoel	13cd6879bb	fix: narrow docs-preview workflow path filter (#515 ) The docs-preview workflow triggered on all source code changes due to the broad `packages//src/data_designer/` path glob. This caused unnecessary Cloudflare Pages deployments on code-only PRs like #505. Remove the source code path filter so the workflow only triggers on actual docs content changes (docs/*, mkdocs.yml, and the workflow file itself).	2026-04-09 16:51:10 -03:00
Andre Manoel	0e90ea644b	docs: add async engine dev note (#490 ) * fix: address review feedback on async engine dev note - Fix wall-clock claim: 41% -> 22% to match benchmark table - Fix dual-model speedup rounding: 1.7x -> 1.6x (10.0/6.1 = 1.64) - Fix run_config API: use dd.set_run_config() instead of passing to create() * docs: add async engine dev note Add "Async All the Way Down" dev note covering the async task-queue scheduler built across PRs #356, #378, #404, #429, #456. Includes benchmark results, architecture diagrams, and DAG shape illustrations. * feat: add docs preview workflow for PRs Build MkDocs site on PRs that touch docs and deploy to Cloudflare Pages. Each PR gets a browseable preview URL posted as a comment. Notebook tutorials use placeholder stubs since they require API keys to execute. Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID repo secrets. * fix: update speedup chart alt text from 1.7x to 1.6x * docs: improve timeline figure context and labeling Add DAG subtitle to sync-vs-async timeline figure and bridge the surrounding text to explain which workload shape is being shown. * edits+additions to async-all-the-way-down dev notes * clarify two semaphore dance * remove dead link * replace hero image * docs: update scale figures with nginx-accurate data and adjust sizing Regenerate scale-model-timeline and scale-boxplot from nginx access logs (column_progress.csv, sync/summary.json) instead of buffered execution logs. Optimize both PNGs to palette mode. Adjust figure widths and update model timeline commentary. * add link from owning-the-model-stack to async-dev-node * docs: address review feedback on async blog post - Tighten intro to a concise abstract, move pipeline narrative into "The Bottleneck Was Structural" section - Remove multi-column generators / seed readers paragraph (TMI) - Clarify sync engine ran columns sequentially within each batch --------- Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>	2026-04-08 15:51:04 -03:00
Andre Manoel	6b92351682	ci: add PR review workflow and recipe for agentic CI (#498 ) * ci: add PR review workflow and recipe Add the remaining Phase 1 deliverables for the agentic CI plan: - PR review recipe that composes the existing review-code skill - PR review workflow with collaborator-only gate, auth detection, pre-flight checks, and re-review label support - Mark Phase 1 items complete in the plan (except docs) * fix: use explicit draft == false instead of ! operator in workflow if The ! operator in a >- YAML block may cause parsing issues. Use explicit comparison instead. * fix: address review feedback + simplify if condition for debugging - Fix: only re-review label triggers on labeled events (greptile) - Fix: use printf instead of echo -e for prompt assembly (greptile) - Debug: simplify if condition to isolate why job is skipping * debug: set if to true to test runner connectivity * debug: add job to dump event context for if-condition debugging * fix: use collaborator API for permission check instead of author_association author_association in webhook payloads reports NONE when org membership is private, causing the job to skip even for members. Replace with a gate job that checks collaborator permissions via the API, which works regardless of org visibility settings. * fix: disable prompt caching and skip posting on review failure - Set DISABLE_PROMPT_CACHING=1 for Bedrock-backed endpoints that don't support cache_control parameters - Don't post a comment when the review file isn't produced, just emit a warning annotation on the workflow run * fix: rename label to agent-review, remove synchronize trigger - Rename re-review -> agent-review for clarity - Remove synchronize from trigger types so reviews are opt-in on subsequent pushes (use the agent-review label to retrigger) - Reviews still auto-run on PR open and draft -> ready transitions * fix: validate PR number input and remove unused auth mode step * fix: address review feedback - quoting, checkout ordering, stale docs - Pass all step outputs through env vars instead of direct expression injection in shell (PR number, model name) - Resolve head SHA before checkout so dispatch doesn't clone at wrong ref - Use set -o pipefail + continue-on-error instead of \|\| true - Remove stale synchronize references from plan doc * fix: add specific review guidance for plan docs * fix: check labeler permission for agent-review on external PRs For labeled events, check the sender (who added the label) instead of the PR author. This lets maintainers authorize agent reviews on PRs from external contributors by adding the agent-review label.	2026-04-07 21:47:42 -03:00
Nabin Mulepati	4768a3671d	chore: plan 427, PR 2 of agent-first development plan (#478 ) * save progress * undo review-code skill change * delete status file * small tweaks * Fix 429 info * update workind on skill info * updates * Update architecture/overview.md Co-authored-by: Johnny Greco <jogreco@nvidia.com> * fix: correct symbol names and CLI commands in architecture docs Address review comments: - models.md: describe clients as native httpx adapters, not SDK wrappers - agent-introspection.md: use actual family keys (columns, samplers, etc.) not column-types - cli.md: use correct command `data-designer config models` - plugins.md: SEED_READER not SEED_SOURCE, inject_into_processor_config_type_union Made-with: Cursor --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2026-04-06 15:26:33 -06:00
Andre Manoel	0d80858b60	fix: use --bare and --tools in health probe CLI check (#489 ) The "Verify Claude CLI" step fails on the CI runner because Claude Code tries to initialize keychain, LSP, plugins, and CLAUDE.md discovery before making the API call. On a bare runner these resources don't exist, causing exit code 1. - Add --bare to skip all initialization and force ANTHROPIC_API_KEY auth - Add --tools "" to disable tool definitions (health check doesn't need them, and this avoids sending a large payload to the gateway)	2026-04-02 13:48:32 -03:00
Andre Manoel	5265745335	ci: add agentic CI plan, health probe workflow, and recipe scaffold (#473 ) * docs: add agentic CI plan for automated PR reviews and daily maintenance Closes #472 * docs: add API configuration and auth modes to agentic CI plan * docs: add PoC lessons and operational details to agentic CI plan * docs: add runner label targeting to agentic CI plan * docs: add re-review label and workflow_dispatch triggers to PR review * docs: rename runner label to agentic-ci * docs: add check run as gate for PR review, output stays as comment * ci: add agentic CI health probe workflow and recipe scaffold - Health probe: pings inference API, checks latency, verifies Claude CLI - Runs every 6h on self-hosted agentic-ci runner, plus manual dispatch - Dual auth mode: custom endpoint (secret) or OAuth fallback - Recipe scaffold: _runner.md shared context, health-probe recipe - Update .agents/README.md to include recipes directory * docs: address Greptile review feedback on agentic CI plan - Add checks: write to recipe frontmatter example - Add concurrency group to daily maintenance workflow spec - Clarify fork PRs are out of scope (pull_request event only) - Document workflow_dispatch callers as trusted (accepted risk) * fix: skip API curl in OAuth mode, add branch protection note - Health probe: skip the direct API ping step in OAuth mode (no API key available for curl; Claude CLI step is the sole health signal) - Guard latency threshold check on custom auth mode - Plan: note that contents:write on daily suites requires branch protection rules to prevent agent self-merging * fix: address Nabin's second review feedback - Health probe: fix latency threshold string comparison with fromJSON() - Health probe: add permissions: contents: read - Health probe: fail fast if AGENTIC_CI_MODEL variable is not set - Runner context: add prompt-injection defense and output sanitization - Plan: update Phase 2 deliverable to match cache-based memory approach - Plan: reference STYLEGUIDE.md in code-quality suite - README: note that recipes don't need a .claude/ symlink * docs: sync plan with implementation decisions - Health probe uses workflow failure, not issue open/close - Pre-flight checks should fail fast on missing config - Add GHA string comparison gotcha to PoC lessons - Add explicit permissions block recommendation to PoC lessons - Bump max_turns from 20 to 30 in recipe example * docs: address PR review feedback on agentic CI plan - Review docs PRs with lighter recipe instead of skipping by file type - Switch runner memory from committed branch to GH Actions cache - Add import perf check to test-health suite - Add nuance on dependency pinning strictness vs DX - Add Follow-up: Weekend Agents section (perf, AI-QA, repo triage) - Add cost guardrails open question - Add status field to frontmatter	2026-04-01 16:43:31 -03:00
oliver könig	8ca8e2447b	ci: upgrade GitHub Actions for Node.js 24 compatibility (#450 ) * ci: upgrade GitHub Actions for Node.js 24 compatibility Upgrades actions to versions compatible with the Node.js 24 runtime: - actions/checkout: → v6 - actions/upload-artifact: → v6 - actions/download-artifact: → v7 - actions/github-script: → v8 - actions/setup-python: → v6 Mirrors: `1d5e68b074` Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: also upgrade actions/cache and astral-sh/setup-uv to node24-compatible versions - actions/cache: v4 → v5 in build-notebooks.yml - astral-sh/setup-uv: v5/v6 → v7 in ci.yml, check-colab-notebooks.yml, health-checks.yml, build-docs.yml, build-notebooks.yml Addresses: https://github.com/NVIDIA-NeMo/DataDesigner/pull/450#issuecomment-4154872141 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>	2026-03-30 17:39:05 -03:00
Andre Manoel	2564834a47	fix: cache notebook builds to avoid flaky upstream model failures (#370 ) * fix: cache notebook builds to avoid failures from flaky upstream models The build-notebooks CI executes all tutorial notebooks on every run. When an upstream model (e.g. black-forest-labs/flux.2-pro) is down, the entire docs build fails even if no notebooks changed. Add per-notebook caching based on source file SHA-256 hashes. Unchanged notebooks are served from cache, and only modified ones are re-executed. On the first CI run (empty cache), the workflow seeds the cache from the last successful build artifact. Also add a minimal test script (test_flux_image_gen.py) to reproduce the flux.2-pro health check failure locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments on notebook caching - Don't write .sha256 during seeding so changed notebooks are detected - Rename TMPDIR to SEED_TMPDIR to avoid shadowing the POSIX env var - Use portable sha256 helper (sha256sum with shasum fallback) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: only seed cache when truly empty, restore hash writing Skip artifact seeding when a partial cache was restored (it already has correct per-file hashes). Only seed + write current hashes when the cache dir is completely empty (true bootstrapping). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restrict artifact seed lookup to main branch Prevents seeding from feature branch runs that may have different notebook sources. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add actions:read permission for artifact seeding The seed step uses gh run list and gh run download which require actions:read. Without it, these calls silently fail and the cold-start cache bootstrapping never executes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: only use notebook cache when called from build-docs Scheduled Monday runs and manual workflow_dispatch should execute all notebooks to catch regressions (e.g. library changes that break a notebook). Caching is only used via workflow_call (from build-docs) where the goal is fast, resilient doc deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use jq // empty to avoid "null" string on empty run list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add use_cache input flag to notebook and docs workflows Replace event_name-based cache logic with an explicit use_cache boolean input. Defaults: - build-notebooks: workflow_call=true, dispatch=false, schedule=false - build-docs: dispatch=true (toggleable), release=false This gives full control over caching from the GitHub Actions UI. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-05 12:30:14 -03:00
Andre Manoel	46358461ee	fix: repair notebook CI (dead model, missing API key, pyarrow type bug) (#348 ) * fix: repair notebook CI by replacing dead vision model and adding missing API key - Replace `meta/llama-4-scout-17b-16e-instruct` (no longer serving on build.nvidia.com) with `nvidia/nemotron-nano-12b-v2-vl` (project default) in tutorial notebook 4 - Add `OPENROUTER_API_KEY` to the `build-notebooks` workflow so notebooks 5 and 6 (which use OpenRouter for image generation) can authenticate - Regenerate colab notebooks to reflect the model change * fix: handle pyarrow list types in notebook 6 display_image When image columns are loaded from parquet with pyarrow backend, list values are pyarrow ListScalars, not Python lists. The isinstance(x, list) check fails, causing the whole ListScalar to be treated as a single path string (producing filenames ending in `png')]`). Use isinstance(x, str) instead to correctly handle any iterable type.	2026-02-23 13:27:47 -03:00
Andre Manoel	58734d09f0	test: add provider health checks script and CI workflow (#301 ) * test: add e2e health checks for default provider models Add parametrized tests that verify model connectivity for all default providers (nvidia, openai, openrouter). Tests check API key availability and skip when not configured. * chore: move health checks out of e2e tests - Convert pytest test to standalone script at scripts/health_checks.py - Add `make health-checks` target - Add CI workflow (weekly + on release + manual dispatch) - Remove test_health_checks.py from tests_e2e/ * chore: make health checks non-blocking in CI * fix: print traceback to stdout to avoid interleaving * chore: add all provider API keys to health checks CI Co-authored-by: Cursor <cursoragent@cursor.com> * chore: remove temporary push trigger from health checks Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-06 15:18:35 -03:00
Andre Manoel	2630201a37	chore: add CODEOWNERS for automatic PR review assignment (#251 ) Assigns @NVIDIA-NeMo/data_designer_reviewers as default reviewers for all pull requests.	2026-01-28 11:04:37 -03:00
Johnny Greco	c19f35639f	chore: add publish script and update license headers (#253 )	2026-01-28 08:47:34 -05:00
Johnny Greco	ae0665fa16	refactor: slim package refactor into three subpackages (#240 ) * remove old structure * major shuffle * streamline project configs * update make commands * updates to make commands * remove essentials * initialize logger in interface * uv lock * ignore notepad * update workflows * fix e2e project config * generate colab notebooks * resolve default model settings in interface * fix build commands * update perf import make command * cleaning up some slop * update recipes * move conftest files to tests/ * update subpackage readmes * streamline config_logging * use exports * update perf import usage pattern * update for IDE behavior with ruff * remove engine's fixtures file * add note to about lazy imports * update dependencies * update docs * doc fixes * uv lock * updates to catch up with main * clean up makefile * remove package gitignores * define deps only once * isolate tests * add test for protetion rule * create temp dirs for isolated tests * catch up to main * update headers * re apply changes * better result summaries for isolated tests * move exports into top-level init * fix client importlib version syntax * catch up with main	2026-01-27 13:53:20 -05:00
Johnny Greco	1ea824c692	chore: minor issue template tweaks (#198 ) * tweaks * update placeholder	2026-01-12 15:34:10 -05:00
Johnny Greco	738b183bfd	add templates (#197 )	2026-01-12 13:18:05 -05:00
Mike Knepper	2300230346	chore: Relax rich upper bound to allow 14.x series (#196 ) * Bump rich to 14.x series * Disable uv cache in CI e2e tests * Accept rich 13	2026-01-12 09:44:46 -06:00
Johnny Greco	f8c201e085	chore: update header script to check for diffs (#195 ) * update script * update headers * refactor a bit and add test script * update headers * update for edge case * update headers * add step to get file creation date * use git history to get copyright year * generation type is printed with inference parameters * fix unit test	2026-01-09 17:10:58 -05:00
Mike Knepper	2cfff52581	feat: Seed reader plugins (#191 )	2026-01-09 13:50:47 -06:00
Johnny Greco	82fbbf1d45	force py11 (#170 )	2026-01-05 16:57:02 -05:00
Andre Manoel	7fa9a413ac	docs: add option to open notebook directly in Colab (#126 )	2025-12-12 15:15:26 -03:00
Andre Manoel	9547b6854a	fix: add git user/email and allow manual trigger for docs pipeline (#105 ) * fix: add git user/email and allow manual trigger for docs pipeline * add push as trigger temporarily * fetching branch * removing push from trigger	2025-12-08 13:52:37 -03:00
Andre Manoel	275bbbf646	docs: add versioning using `mike` (#102 ) * initial changes * fix to override, adapting ci	2025-12-08 11:06:24 -03:00
Andre Manoel	fa86be1eae	fix: allow docs CI to be manually triggered, better download button (#99 )	2025-12-04 14:48:16 -03:00
Andre Manoel	279299f2dc	fix: update Python version to 3.11 on build notebooks CI (#96 )	2025-12-04 09:45:18 -03:00
Andre Manoel	5d4ad10b11	chore: moving notebooks to jupytext and cleaning up workflows (#91 ) * adding basic jupytext structure Co-authored-by: Johnny Greco <jogreco@nvidia.com> * few fixes * first test for ci * adding error intentionally to check workflow behavior * test calling from other workflows * typo * trying as job instead * couple of fixes * checking path * trying to fix path * wrapping up --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>	2025-12-03 17:29:07 -03:00
Andre Manoel	ce0fc0805a	docs: streamlining tutorials (#61 ) * first attempt * typo * it works! cleaning up * adding trigger again just to run once * cleanup * typo	2025-11-21 16:14:48 -03:00
Johnny Greco	dbe165723e	chore: add 3.10 to ci (#39 ) * add 3.10 to ci * strenum update for 3.10 * update type hint for 3.10 * import Self from type extensions for 3.10	2025-11-17 10:44:04 -05:00

1 2

74 commits