* feat(agentic-ci): decision-ready triage and daily PR fixes
Reorganize the weekly issue-triage report around recommended actions
(close as resolved, close as duplicate, needs maintainer decision,
ready for assignment, stuck PR, duplicate PRs, stale) so each flagged
item carries action + evidence + rationale and can be resolved without
opening it. Multi-comment split with i/N markers and orphan
reconciliation when the report grows or shrinks.
Flip the four daily audit suites with mechanical fix categories from
read-only reports to opening one PR per run:
- docs-and-references: broken-link, docstring-drift, arch-ref-rename
- structure: missing-future, lazy-import
- dependencies: transitive-gap, unused
- code-quality: bare-except (draft until landing rate proven)
test-health stays report-only (all candidates require inferring intent).
The shared procedure - fix_backlog selection, finding-hash spec for
stable cross-run identification, attempted_fixes lifecycle with
two-strike escalation, allowlists, ranking, branch/PR conventions -
lives in .agents/recipes/_fix-policy.md. Each suite recipe declares
only its eligible categories, branch types, and test requirements.
Workflow runs claude twice per suite (audit, then conditionally fix),
each capped at the existing --max-turns 50. Fix call is gated on
non-empty fix_backlog and skipped entirely for test-health.
* fix(agentic-ci): address review findings before merge
- Map per-package test targets explicitly in _fix-policy.md (Makefile
exposes test-config/test-engine/test-interface, not test-<package>).
- Use github-actions[bot] noreply identity for commits the recipes
produce.
- Refresh fix_backlog.data when an id already exists so the fix phase
cannot drive a PR from stale data after the underlying file changed.
- Stop time-pruning closed/abandoned attempted_fixes entries — pruning
before the two-strike threshold erases the history needed to
escalate. Single-strike entries now age out only via the 200-entry
cap.
- Disambiguate bare-except findings within the same function by
including a try-body hash in the finding id.
- Audit grep for code-quality now matches both `except:` and
`except BaseException:`, in parity with the fix eligibility.
- Restrict transitive-gap fix eligibility to cases where a sibling
package already declares the dep (avoids inventing version
specifiers from scratch).
- Issue-triage workflow handles multi-part reports in both the fallback
post step and the job summary; recipe always writes numbered parts.
* fix(agentic-ci): close residuals from review pass 2
- Replace remaining `make test-<package>` references with pointers to
the mapping table; only the table itself uses that placeholder now.
- Fix `gh api --paginate | jq | length` returning per-page counts: slurp
with `jq -s 'add // 0'` to get a single total.
- Compare posted-comment count to expected part count so a partial post
(agent posted part 1 but not 2/3) triggers the fallback instead of
being silently treated as success.
- Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're
not at the mercy of the runner's default shell.
- Disambiguate bare-except findings whose try-body hashes collide by
adding a per-function ordinal to the canonical_key.
- Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at`
(the schema has no `first_seen` field).
* fix(agentic-ci): identity-based partial-post detection in triage fallback
Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an
identity-based check that extracts every i/N marker seen in
today-dated bot comments and verifies each expected i is present.
A duplicate post of one part can no longer mask a missing other.
* fix(agentic-ci): close remaining bot-review findings
- Exempt two-strike attempted_fixes entries from the 200-entry cap
eviction. Cap now evicts non-two-strike oldest-first by
attempts[0].at; two-strike entries are silently-forgotten only in
the pathological all-200-are-two-strike case (itself a signal).
- Specify the attempted_fixes PR-marker reconciliation algorithm:
scan open PR bodies for the `<!-- agentic-ci finding=<id> -->`
marker and back-fill missing entries.
- Tighten the daily workflow conditionals to gate on explicit step
outcomes (steps.audit.outcome == 'success' rather than success())
so a future pre-audit gate cannot accidentally trip the fix step.
* fix(agentic-ci): close Greptile pass-2 findings (timeout, re-verify wording)
- Bump daily-suite job timeout from 20 to 40 minutes. The split into
two sequential `claude --max-turns 50` invocations can saturate a
20-minute budget; a mid-fix SIGTERM would leave an orphaned branch
and inconsistent runner-state.
- Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids
rebuilding fix_backlog from scratch but does NOT override the
per-candidate re-verification step required by _fix-policy.md
step 4.1 (re-grep / re-read the specific file the candidate points
at). Single-candidate re-verification is required; whole-codebase
re-scanning is forbidden.
* fix(agentic-ci): close Greptile pass-3 P1s in triage fallback
- Guard `jq capture()` with a `test()` select. `capture()` errors on
non-match instead of returning empty, which would truncate
SEEN_PARTS if any unrelated today-dated bot comment lacks the
triage marker (e.g. from a sibling workflow). Adding the test()
guard ensures capture() only runs on bodies that already match.
- Iterate the MISSING[] array when posting fallback parts, not the
full PARTS[] array. Posting all parts when only some were missing
was creating duplicate comments for the parts the agent already
successfully posted.
* fix(agentic-ci): close johnnygreco review-pass warnings
Address the five Warnings from the 2026-05-07 review focused on the
trust boundary for autonomous PR generation. Five workflow/policy
adjustments shrink the surface where agent compliance is load-bearing:
- Workflow-level scope gate. After the fix step, re-derive the diff
against `origin/main` and validate against the per-suite path
allowlist (regex mirrored from `_fix-policy.md`), the 50-LOC cap, and
the 3-file cap. On violation, close the PR with `--delete-branch`
and flip the `attempted_fixes` entry from `open` to `abandoned` so
two-strike logic still sees the failure. The recipe alone could not
bind the agent's path choices; the workflow now does.
- Dependencies install-dev verification. For the dependencies suite
only, re-run `make install-dev` after the scope gate so the agent's
pyproject edit is exercised against the lockfile resolver. Closes
the PR if `install-dev` fails — catches the failure mode where the
per-package test target passed against the old cached lockfile.
- Flip matrix-job `cancel-in-progress` from true to false. A
cancellation between the agent's git push and `gh pr create` would
leave an orphaned branch with no `attempted_fixes` record;
reconciliation only covers PRs that were opened. Queueing a
duplicate run is the lesser evil. `_fix-policy.md` Atomicity
section now documents the trade-off.
- Allow `/tmp/audit-{{suite}}.md` in `_phase-audit.md`'s "do not
modify outside `{{memory_path}}/`" directive. A literal-minded
agent could refuse to write the report file, which would break the
job summary, artifact upload, and the fix phase's audit context.
- Always upload the agent log artifact (was `if: failure()` only) and
include `runner-state.json`. For autonomous mode, the most
interesting failure is "the workflow succeeded but the PR was
wrong"; the stream-json log is the only way to look back days
later.
Also takes johnnygreco's Suggestion 2: spell out in the policy doc
that the `draft_until_proven` flip is the sole human-gated
promotion step in the fix policy and must not be automated.
Greptile and the github-actions auto-reviewer's findings were
already closed in the prior pass-2/pass-3 commits; no action needed
on those.
* fix(agentic-ci): close Codex review-pass-2 findings on workflow gates
Codex flagged five issues in the prior commit's scope/lockfile gates.
This commit closes all five:
- HIGH: Wrong-PR targeting. Both gates selected the last globally-open
attempted_fixes entry, which could match a stale orphan from a
prior crashed run rather than the PR opened by *this* run. Adds a
pre-fix snapshot step that captures `(id, attempts-length)` pairs
before the fix runs, and changes the post-fix selectors to require
that the entry's attempts count grew during this run.
- HIGH: Docstring-only enforcement gap on the docs-and-references
suite. The .py path allowlist was at workflow level but the
docstring-only caveat was still policy-only. Adds an AST-based
check: for each .py file changed, parse the post-change tree,
collect docstring line ranges (module/class/function), then verify
every added line in the diff is either inside a docstring, a
comment, or whitespace. Verified locally with both pass and fail
fixtures.
- MEDIUM: Diff-ref mismatch. Gates diffed `origin/main...HEAD` rather
than `origin/main...origin/$BRANCH`, so a misbehaving agent that
left HEAD pointing elsewhere would have validated the wrong tree.
Now fetches `origin/$BRANCH` first and prefers that ref. Falls
back to HEAD only if fetch fails (with a warning).
- MEDIUM: FILE_COUNT bug. `grep -c '.' || echo 0` produced "0\n0" on
empty diff, breaking the downstream integer comparison. Replaces
with `mapfile -t FILE_ARR` + `${#FILE_ARR[@]}`, which is correct
for any input including empty.
- LOW: Non-atomic JSON writes. The runner-state mutations could leave
the file half-written if the workflow was cancelled mid-write.
Switches both gates to the temp-file + os.replace pattern.
Also: dependencies-lockfile gate now does an explicit
`git checkout --detach origin/$BRANCH` before re-running install-dev,
so verification runs against what was actually pushed rather than
relying on local working-tree state.
* fix(agentic-ci): gate fix + scope_gate steps on snapshot.outcome
Greptile review on 872d5617 flagged that the fix step's custom `if:`
expression bypasses GitHub Actions' implicit success() check. Without
explicitly referencing steps.snapshot.outcome, a snapshot failure
(corrupt runner-state, disk error) would let the fix step run anyway.
The scope gate's `jq --slurpfile prior /tmp/prior-attempted-fixes.json`
would then exit non-zero on the missing file, leave OPEN empty, and
hit the "nothing to validate" early-exit — silently approving whatever
the agent pushed.
Adds steps.snapshot.outcome == 'success' to both the fix step's
condition (the actual fix) and the scope_gate step's condition
(belt-and-suspenders against future refactors).
* fix(agentic-ci): harden daily fix gates
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
* fix(agentic-ci): validate all grown fix attempts
* fix(agentic-ci): harden post-fix gates
---------
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
* ci: add graphify structural impact analysis to PR review and structure audit
Add a graphify-based AST analysis tool that builds a directed graph of the
codebase (~2s, no LLM calls) to detect architectural impact. Integrates
into both the PR review workflow (pre-computed before claude runs) and the
Wednesday structure audit (with week-over-week diff).
PR review: extracts changed files against the full codebase graph, reports
risk level (LOW/MEDIUM/HIGH), god nodes affected, import direction
violations, and cross-package dependencies. Output saved to /tmp and read
by the review agent.
Structure audit: produces god node rankings, cross-package edge summary
table, import violation detection, and graph diff against previous week's
cached graph. Baselines saved for runner memory trend tracking.
* fix: harden graphify integration - security, correctness, and CI weight
- Fix KeyError: god_nodes() returns 'degree' not 'edges' (3 call sites)
- Fix deduped vs raw violation count inconsistency in baselines.json
- Security: run structural_impact.py from base-branch checkout so fork
PRs cannot inject code that executes with GH_TOKEN in scope
- Add --repo-root flag so the tool resolves package paths correctly when
invoked from a different checkout directory
- Replace make install-dev + .venv with lightweight /tmp/graphify-venv
(only graphifyy needed, saves ~2min CI per PR review)
- Add graphify-out/ to .gitignore (9MB graph cache is CI-only)
* fix: pin graphifyy version and fix dedup truncation
Pin graphifyy==0.4.23 in both CI workflows to prevent
breaking changes from unpinned installs. Fix _dedup()
label truncation at 30 chars that could merge distinct
entities sharing a common prefix.
* fix(ci): use array expansion for changed-files arg to handle special filenames
Replace unquoted $CHANGED_PY word-split with mapfile + array
expansion to prevent glob expansion and correctly handle
filenames containing spaces or special characters.
* fix: derive changed nodes from graph and improve MEDIUM risk reason
Derive changed_node_ids from the already-built graph by matching
source_file paths instead of running a separate extraction pass.
Removes implicit dependency on graphify ID stability across
independent extractions.
Fix MEDIUM risk reason to reflect the actual trigger (cluster
spread vs high-connectivity entity) instead of always reporting
cluster count.
* fix: address Codex review findings - security, edge coverage, dedup, stale artifacts
Split the workflow step to isolate GH_TOKEN from graphifyy execution,
preventing a compromised package release from exfiltrating write-scoped
tokens.
Scan both edge directions in _cross_package_edges so inbound dependents
and violations where the changed node is the target are visible. Detect
deleted files and report them as a risk signal.
Include relation type in dedup key so distinct edge types between the
same labels are not collapsed.
Clean stale /tmp artifacts before running analysis to prevent reruns
from reading old reports.
* fix: address review feedback - type annotations, hoist imports, narrow except, isolate daily graphify
- structural_impact.py:
- replace bare _build_graph dict return with frozen _Analysis dataclass
- add G: Any annotation on _cross_package_edges (STYLEGUIDE: all params typed)
- hoist `from graphify.export import to_json` and
`from networkx.readwrite import json_graph` to module top
(no perf justification for deferred import)
- narrow `except Exception` in graph-diff fallback to
(JSONDecodeError, KeyError, TypeError, OSError)
- agentic-ci-daily.yml: install graphifyy into /tmp/graphify-venv instead of
the project .venv, matching agentic-ci-pr-review.yml. Keeps graphify's
transitives (networkx) out of the project venv permanently.
- structure/recipe.md: invoke the tool via /tmp/graphify-venv/bin/python
to match the workflow change.
* feat(ci): warn when changed files touch unknown packages
A new package under packages/ that isn't in _PACKAGE_SUBDIRS is silently
absent from the graph - the analyzer would falsely report LOW risk with
0 entities. Add a _Note line in the changed-files report when any changed
or deleted file lives under packages/<unknown>/, so the failure mode the
analyzer is supposed to surface isn't itself silent.
_KNOWN_PACKAGE_DIRS is derived from _PACKAGE_SUBDIRS so future additions
stay in sync without a second source of truth.
* ci: raise agent audit turn limit and preserve logs
The Friday test-health audit hit the 30-turn cap on its first-ever run
(2026-04-24) and the agent log was discarded with the self-hosted
runner. Heavier recipes need more room, and the next failure should be
diagnosable.
- Raise --max-turns from 30 to 50
- Switch --output-format from text to stream-json so events are emitted
during the run instead of only at process exit; prefix with
stdbuf -oL -eL to line-buffer the pipe
- Upload /tmp/claude-audit-log.txt and /tmp/audit-<suite>.md as an
artifact (if: always(), 14-day retention) using the upload-artifact
SHA already pinned in build-notebooks.yml
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
* ci: disambiguate audit artifact name across run attempts
actions/upload-artifact@v4+ rejects duplicate names within a workflow,
and re-running a failed run reuses the same github.run_id. Append
github.run_attempt so re-runs upload successfully instead of failing at
the exact moment the artifact is most useful.
Found by Codex review of #571.
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
* ci: only upload agent log on failure
Raise the bar for persisting the full verbose stream-json event log:
we only need it when we're actually debugging a failure, and the audit
report itself still lands in the step summary on success. Shrinks the
window where tool inputs, read file contents, or other verbose-stream
detail could end up in a 14-day artifact.
Addresses the minor privacy finding from Codex review of #571.
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
* ci: drop raw agent log from job summary
With --output-format stream-json the previous tail -100 of the agent
log emitted raw NDJSON into the GH Actions UI summary, which is
unreadable. The audit report itself (/tmp/audit-<suite>.md) already
carries the human-readable payload, and the full event stream is
available as an on-failure artifact, so the raw tail was redundant and
worse than nothing for the summary surface.
Also rewords the fallback message to point at the artifact when no
report lands (typically a failure).
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
---------
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
The top-level `permissions: {}` added in #517 restricts all jobs to zero
permissions by default. The `build-notebooks` jobs that call the reusable
workflow did not override this, so GitHub Actions refused to start them
(startup_failure). Add the required `actions: read` and `contents: write`
permissions to both calling jobs.
Fixes the v0.5.7 release docs build failure.
* ci: add daily audit suites with 5 recipes and scheduled workflow
Add the daily maintenance infrastructure (Phase 2+3 of the agentic CI
plan). A new workflow runs one audit suite per weekday via day-of-week
rotation, with runner memory persisted via actions/cache.
Recipes: docs-and-references (Mon), dependencies (Tue), structure (Wed),
code-quality (Thu), test-health (Fri). Each targets gaps that CI and ruff
don't cover: cross-reference validation, transitive dep analysis, lazy
import compliance, complexity trends, and test-to-source mapping.
Reports go to the Actions step summary. Code changes use /create-pr.
* ci: add executable smoke checks and harden runner memory
Add executable smoke checks to test-health and code-quality recipes
that exercise real code paths (config build, validate, import timing,
registry completeness, error hierarchy, input rejection) without
needing an LLM provider. Checks are split into fixed canaries (same
every run) and creative checks (agent varies inputs each run).
Harden runner memory: define JSON schema in _runner.md with TTL and
size rules, validate state file after agent runs, only update
last_run on success, drop unused audit-log.md. Add make install-dev
workflow step so recipes can run Python against the installed packages.
* ci: fix codex review findings - test paths, provider check, step gating
Fix issues found by Codex review:
- Fix test paths: tests/ does not exist at repo root, use
packages/*/tests/ and packages/data-designer/tests/test_import_perf.py
- Remove DataDesigner(model_providers=[]) from smoke checks - raises
NoModelProvidersError; keep config-layer checks only
- Fix audit step gating: remove continue-on-error, use step outcome
to gate runner memory update (|| true + continue-on-error made the
step always "succeed", defeating the success() condition)
* ci: fix review findings - heredoc, state validation, lazy import wording
Fix heredoc with indented EOF terminator that never terminates - replace
with printf. Run state validation on all outcomes (not just success) so
corrupted state from a failed audit is caught before caching. Only stamp
last_run when audit succeeds. Align test-health lazy import section with
its own Constraints (report count only, don't duplicate structure audit).
Also fixes datetime.utcnow() deprecation and shell variable injection
in Python string by using os.environ instead.
* fix: use pull_request_target for agentic CI on fork PRs
* fix: read recipe files from base branch to prevent prompt injection
Recipe files define the agent's prompt. When using pull_request_target,
the fork's HEAD is checked out, so a malicious fork could craft recipe
files to exfiltrate API secrets via prompt injection. Fix by adding a
second sparse checkout from the base branch for .agents/recipes/ and
reading prompts from there instead of the fork tree.
* fix: align actions/checkout version for base-recipes checkout
Match the base-branch recipe checkout to v6.0.2 (same SHA as the PR
branch checkout) for consistency.
* fix: move expression interpolations to env vars in gate and review jobs
Replace direct ${{ }} interpolation in run: blocks with env vars.
Most values are GitHub-controlled, but github.event.label.name can
contain arbitrary characters and could break shell quoting. Moving
everything to env: is consistent with the injection-hardening pattern
applied in the rest of the workflow.
The yq JSON roundtrip was mangling the entire mkdocs.yml file
(indentation, quoting, comments), causing mike deploy to fail.
Extract a Python script that surgically replaces only the Dev Notes
nav block, leaving all other content byte-identical.
- Update post date from 2026-03-11 to 2026-04-14 so it appears as the
newest post on the devnotes page.
- Replace raw <img> tags with markdown image syntax so mkdocs rewrites
relative paths correctly for the blog plugin's slug-based URLs.
- Overlay mkdocs.yml from HEAD in publish-devnotes workflow so new nav
entries are included in devnotes-only rebuilds.
* ci: add PR hygiene automation (linked issue check + stale PR cleanup)
Add two workflows to enforce contribution quality and clean up abandoned PRs:
- pr-linked-issue.yml: required status check that validates external PRs
reference a triaged issue. Collaborators bypass. Re-triggers automatically
when a maintainer adds the `triaged` label to the linked issue.
- pr-stale.yml: daily cron that reminds authors of failing checks after 7/14
days of inactivity and auto-closes after 14/28 days (external/collaborator).
Respects `keep-open` label.
New labels created: `triaged`, `task`, `keep-open`.
Closes#518
Signed-off-by: Andrea Manoel <amanoel@nvidia.com>
* ci: add agentic repository triage workflow
Add a weekly scheduled workflow that uses Claude to triage all open issues
and PRs, producing a combined dashboard report on a pinned tracking issue.
- New recipe (.agents/recipes/issue-triage/) classifies issues, checks
staleness, cross-references merged PRs, detects duplicates, and flags
PR health problems (missing linked issues, failing checks, orphaned PRs)
- New workflow (.github/workflows/agentic-ci-issue-triage.yml) runs every
Monday 10:00 UTC on the agentic-ci runner, with manual dispatch support
- pr-stale.yml now adds needs-attention label to linked issues when a PR
is auto-closed, bridging the two workflows via labels
* docs: document stale PR policy and auto-retrigger in CONTRIBUTING.md
* fix: address review findings in PR hygiene workflows
- pr-linked-issue: fix comment gate so failure comments are posted
- pr-stale: upgrade issues permission to write for labeling
- pr-stale: compare reminder timestamp against last activity so
push/comment actually resets the stale timer
* fix: use --body-file in retrigger job to avoid shell quoting issues
PR bodies with backticks or unmatched quotes would break the
gh pr edit --body "$NEW_BODY" call. Write to a temp file and
use --body-file instead.
* fix: retrigger job drops PRs after the first
jq outputs newline-separated numbers but GITHUB_OUTPUT only
preserves the first line. Convert to space-separated so the
for loop processes all matching PRs.
* fix: harden workflows against shell injection
- Move attacker-influenced values (${{ user.login }}, step outputs)
from expression interpolation in run: blocks to env vars
- Replace echo "$PR_BODY" | grep with write-to-file + grep-file
to avoid shell expansion of untrusted PR body content
- Same treatment for PR body handling in retrigger and stale jobs
* refactor: replace peter-evans actions with gh api calls
Remove peter-evans/find-comment and peter-evans/create-or-update-comment
third-party action dependencies. Replace with gh api calls for finding,
creating, updating, and deleting bot comments. Eliminates supply chain
risk from unpinned third-party actions.
* docs: add pull_request_target security comment
---------
Signed-off-by: Andrea Manoel <amanoel@nvidia.com>
* ci: add workflow to publish devnotes independently of releases
Adds a GitHub Actions workflow that rebuilds the `latest` docs alias
when devnotes change on main, so blog posts go live without cutting
a package release.
* ci: pin actions to commit SHAs and restrict default permissions
Address Greptile review findings:
- Pin checkout, setup-uv, and download-artifact to commit SHAs
matching the pattern from #517
- Add top-level permissions: {} to restrict default token scope
* ci: build devnotes from last deployed state, not main
Instead of building the full site from main (which could include
unreleased docs), checkout the commit that latest was last built
from (tracked in gh-pages commit messages) and overlay only
docs/devnotes/ from main. Download notebooks from the last
successful build-docs run instead of rebuilding them.
* ci: add actions:read permission for notebook download
The gh run list/download calls need actions:read on GITHUB_TOKEN,
which is denied by the top-level permissions: {} block.
* fix: restrict Dependabot pip updates to security-only
The Dependabot config added in #517 included weekly version-bump PRs for
all three pip packages. This would generate noisy PRs for routine dep
updates we don't need. Set open-pull-requests-limit: 0 on the pip
ecosystems so only CVE-triggered security updates open PRs.
GitHub Actions weekly bumps are kept as-is to keep SHA pins current.
* fix: group Dependabot Actions PRs and fix DCO allowlist
- Add a Dependabot group to bundle all GitHub Actions updates into a
single weekly PR instead of one per action
- Fix DCO allowlist: dependabot -> dependabot[bot] to match the actual
GitHub username (the old value never matched, but there were no
Dependabot PRs before #517 to expose the bug)
* fix: align DCO assistant if-condition with custom sign-off text
The step's if-condition checked for the default sign-off text but
custom-pr-sign-comment uses different wording. This meant the
issue_comment trigger was always skipped - sign-offs only worked
by accident when a subsequent push re-triggered the action via
pull_request_target.
* ci: harden CI supply chain
Pin all GitHub Actions to commit SHAs to prevent tag-based supply chain
attacks (same class as CVE-2025-30066). Replace softprops/action-gh-release
(single-maintainer, no security policy) with gh CLI. Add top-level
permissions: {} to all workflows that lacked it, enforcing least-privilege
by default. Enable Dependabot for GitHub Actions and pip dependencies.
Closes#471
* fix: add dependabot pip entries for each sub-package
The root directory has no pyproject.toml; the actual packages live under
packages/data-designer-config, packages/data-designer-engine, and
packages/data-designer.
The docs-preview workflow triggered on all source code changes due to
the broad `packages/*/src/data_designer/**` path glob. This caused
unnecessary Cloudflare Pages deployments on code-only PRs like #505.
Remove the source code path filter so the workflow only triggers on
actual docs content changes (docs/**, mkdocs.yml, and the workflow
file itself).
* fix: address review feedback on async engine dev note
- Fix wall-clock claim: 41% -> 22% to match benchmark table
- Fix dual-model speedup rounding: 1.7x -> 1.6x (10.0/6.1 = 1.64)
- Fix run_config API: use dd.set_run_config() instead of passing to create()
* docs: add async engine dev note
Add "Async All the Way Down" dev note covering the async task-queue
scheduler built across PRs #356, #378, #404, #429, #456. Includes
benchmark results, architecture diagrams, and DAG shape illustrations.
* feat: add docs preview workflow for PRs
Build MkDocs site on PRs that touch docs and deploy to Cloudflare
Pages. Each PR gets a browseable preview URL posted as a comment.
Notebook tutorials use placeholder stubs since they require API
keys to execute.
Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID repo secrets.
* fix: update speedup chart alt text from 1.7x to 1.6x
* docs: improve timeline figure context and labeling
Add DAG subtitle to sync-vs-async timeline figure and bridge the
surrounding text to explain which workload shape is being shown.
* edits+additions to async-all-the-way-down dev notes
* clarify two semaphore dance
* remove dead link
* replace hero image
* docs: update scale figures with nginx-accurate data and adjust sizing
Regenerate scale-model-timeline and scale-boxplot from nginx access
logs (column_progress.csv, sync/summary.json) instead of buffered
execution logs. Optimize both PNGs to palette mode. Adjust figure
widths and update model timeline commentary.
* add link from owning-the-model-stack to async-dev-node
* docs: address review feedback on async blog post
- Tighten intro to a concise abstract, move pipeline narrative into
"The Bottleneck Was Structural" section
- Remove multi-column generators / seed readers paragraph (TMI)
- Clarify sync engine ran columns sequentially within each batch
---------
Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>
* ci: add PR review workflow and recipe
Add the remaining Phase 1 deliverables for the agentic CI plan:
- PR review recipe that composes the existing review-code skill
- PR review workflow with collaborator-only gate, auth detection,
pre-flight checks, and re-review label support
- Mark Phase 1 items complete in the plan (except docs)
* fix: use explicit draft == false instead of ! operator in workflow if
The ! operator in a >- YAML block may cause parsing issues. Use
explicit comparison instead.
* fix: address review feedback + simplify if condition for debugging
- Fix: only re-review label triggers on labeled events (greptile)
- Fix: use printf instead of echo -e for prompt assembly (greptile)
- Debug: simplify if condition to isolate why job is skipping
* debug: set if to true to test runner connectivity
* debug: add job to dump event context for if-condition debugging
* fix: use collaborator API for permission check instead of author_association
author_association in webhook payloads reports NONE when org membership
is private, causing the job to skip even for members. Replace with a
gate job that checks collaborator permissions via the API, which works
regardless of org visibility settings.
* fix: disable prompt caching and skip posting on review failure
- Set DISABLE_PROMPT_CACHING=1 for Bedrock-backed endpoints that don't
support cache_control parameters
- Don't post a comment when the review file isn't produced, just emit
a warning annotation on the workflow run
* fix: rename label to agent-review, remove synchronize trigger
- Rename re-review -> agent-review for clarity
- Remove synchronize from trigger types so reviews are opt-in on
subsequent pushes (use the agent-review label to retrigger)
- Reviews still auto-run on PR open and draft -> ready transitions
* fix: validate PR number input and remove unused auth mode step
* fix: address review feedback - quoting, checkout ordering, stale docs
- Pass all step outputs through env vars instead of direct expression
injection in shell (PR number, model name)
- Resolve head SHA before checkout so dispatch doesn't clone at wrong ref
- Use set -o pipefail + continue-on-error instead of || true
- Remove stale synchronize references from plan doc
* fix: add specific review guidance for plan docs
* fix: check labeler permission for agent-review on external PRs
For labeled events, check the sender (who added the label) instead of
the PR author. This lets maintainers authorize agent reviews on PRs
from external contributors by adding the agent-review label.
* save progress
* undo review-code skill change
* delete status file
* small tweaks
* Fix 429 info
* update workind on skill info
* updates
* Update architecture/overview.md
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
* fix: correct symbol names and CLI commands in architecture docs
Address review comments:
- models.md: describe clients as native httpx adapters, not SDK wrappers
- agent-introspection.md: use actual family keys (columns, samplers, etc.) not column-types
- cli.md: use correct command `data-designer config models`
- plugins.md: SEED_READER not SEED_SOURCE, inject_into_processor_config_type_union
Made-with: Cursor
---------
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
The "Verify Claude CLI" step fails on the CI runner because Claude
Code tries to initialize keychain, LSP, plugins, and CLAUDE.md
discovery before making the API call. On a bare runner these
resources don't exist, causing exit code 1.
- Add --bare to skip all initialization and force ANTHROPIC_API_KEY auth
- Add --tools "" to disable tool definitions (health check doesn't need
them, and this avoids sending a large payload to the gateway)
* docs: add agentic CI plan for automated PR reviews and daily maintenance
Closes#472
* docs: add API configuration and auth modes to agentic CI plan
* docs: add PoC lessons and operational details to agentic CI plan
* docs: add runner label targeting to agentic CI plan
* docs: add re-review label and workflow_dispatch triggers to PR review
* docs: rename runner label to agentic-ci
* docs: add check run as gate for PR review, output stays as comment
* ci: add agentic CI health probe workflow and recipe scaffold
- Health probe: pings inference API, checks latency, verifies Claude CLI
- Runs every 6h on self-hosted agentic-ci runner, plus manual dispatch
- Dual auth mode: custom endpoint (secret) or OAuth fallback
- Recipe scaffold: _runner.md shared context, health-probe recipe
- Update .agents/README.md to include recipes directory
* docs: address Greptile review feedback on agentic CI plan
- Add checks: write to recipe frontmatter example
- Add concurrency group to daily maintenance workflow spec
- Clarify fork PRs are out of scope (pull_request event only)
- Document workflow_dispatch callers as trusted (accepted risk)
* fix: skip API curl in OAuth mode, add branch protection note
- Health probe: skip the direct API ping step in OAuth mode (no API
key available for curl; Claude CLI step is the sole health signal)
- Guard latency threshold check on custom auth mode
- Plan: note that contents:write on daily suites requires branch
protection rules to prevent agent self-merging
* fix: address Nabin's second review feedback
- Health probe: fix latency threshold string comparison with fromJSON()
- Health probe: add permissions: contents: read
- Health probe: fail fast if AGENTIC_CI_MODEL variable is not set
- Runner context: add prompt-injection defense and output sanitization
- Plan: update Phase 2 deliverable to match cache-based memory approach
- Plan: reference STYLEGUIDE.md in code-quality suite
- README: note that recipes don't need a .claude/ symlink
* docs: sync plan with implementation decisions
- Health probe uses workflow failure, not issue open/close
- Pre-flight checks should fail fast on missing config
- Add GHA string comparison gotcha to PoC lessons
- Add explicit permissions block recommendation to PoC lessons
- Bump max_turns from 20 to 30 in recipe example
* docs: address PR review feedback on agentic CI plan
- Review docs PRs with lighter recipe instead of skipping by file type
- Switch runner memory from committed branch to GH Actions cache
- Add import perf check to test-health suite
- Add nuance on dependency pinning strictness vs DX
- Add Follow-up: Weekend Agents section (perf, AI-QA, repo triage)
- Add cost guardrails open question
- Add status field to frontmatter
* fix: cache notebook builds to avoid failures from flaky upstream models
The build-notebooks CI executes all tutorial notebooks on every run.
When an upstream model (e.g. black-forest-labs/flux.2-pro) is down, the
entire docs build fails even if no notebooks changed.
Add per-notebook caching based on source file SHA-256 hashes. Unchanged
notebooks are served from cache, and only modified ones are re-executed.
On the first CI run (empty cache), the workflow seeds the cache from the
last successful build artifact.
Also add a minimal test script (test_flux_image_gen.py) to reproduce the
flux.2-pro health check failure locally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address review comments on notebook caching
- Don't write .sha256 during seeding so changed notebooks are detected
- Rename TMPDIR to SEED_TMPDIR to avoid shadowing the POSIX env var
- Use portable sha256 helper (sha256sum with shasum fallback)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: only seed cache when truly empty, restore hash writing
Skip artifact seeding when a partial cache was restored (it already has
correct per-file hashes). Only seed + write current hashes when the
cache dir is completely empty (true bootstrapping).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: restrict artifact seed lookup to main branch
Prevents seeding from feature branch runs that may have different
notebook sources.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add actions:read permission for artifact seeding
The seed step uses gh run list and gh run download which require
actions:read. Without it, these calls silently fail and the cold-start
cache bootstrapping never executes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: only use notebook cache when called from build-docs
Scheduled Monday runs and manual workflow_dispatch should execute all
notebooks to catch regressions (e.g. library changes that break a
notebook). Caching is only used via workflow_call (from build-docs)
where the goal is fast, resilient doc deployment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use jq // empty to avoid "null" string on empty run list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add use_cache input flag to notebook and docs workflows
Replace event_name-based cache logic with an explicit use_cache boolean
input. Defaults:
- build-notebooks: workflow_call=true, dispatch=false, schedule=false
- build-docs: dispatch=true (toggleable), release=false
This gives full control over caching from the GitHub Actions UI.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: repair notebook CI by replacing dead vision model and adding missing API key
- Replace `meta/llama-4-scout-17b-16e-instruct` (no longer serving on
build.nvidia.com) with `nvidia/nemotron-nano-12b-v2-vl` (project default)
in tutorial notebook 4
- Add `OPENROUTER_API_KEY` to the `build-notebooks` workflow so notebooks
5 and 6 (which use OpenRouter for image generation) can authenticate
- Regenerate colab notebooks to reflect the model change
* fix: handle pyarrow list types in notebook 6 display_image
When image columns are loaded from parquet with pyarrow backend,
list values are pyarrow ListScalars, not Python lists. The
isinstance(x, list) check fails, causing the whole ListScalar to be
treated as a single path string (producing filenames ending in
`png')]`). Use isinstance(x, str) instead to correctly handle any
iterable type.
* test: add e2e health checks for default provider models
Add parametrized tests that verify model connectivity for all
default providers (nvidia, openai, openrouter). Tests check API
key availability and skip when not configured.
* chore: move health checks out of e2e tests
- Convert pytest test to standalone script at scripts/health_checks.py
- Add `make health-checks` target
- Add CI workflow (weekly + on release + manual dispatch)
- Remove test_health_checks.py from tests_e2e/
* chore: make health checks non-blocking in CI
* fix: print traceback to stdout to avoid interleaving
* chore: add all provider API keys to health checks CI
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore: remove temporary push trigger from health checks
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* update script
* update headers
* refactor a bit and add test script
* update headers
* update for edge case
* update headers
* add step to get file creation date
* use git history to get copyright year
* generation type is printed with inference parameters
* fix unit test
* adding basic jupytext structure
Co-authored-by: Johnny Greco <jogreco@nvidia.com>
* few fixes
* first test for ci
* adding error intentionally to check workflow behavior
* test calling from other workflows
* typo
* trying as job instead
* couple of fixes
* checking path
* trying to fix path
* wrapping up
---------
Co-authored-by: Johnny Greco <jogreco@nvidia.com>