* ci: add PR hygiene automation (linked issue check + stale PR cleanup) Add two workflows to enforce contribution quality and clean up abandoned PRs: - pr-linked-issue.yml: required status check that validates external PRs reference a triaged issue. Collaborators bypass. Re-triggers automatically when a maintainer adds the `triaged` label to the linked issue. - pr-stale.yml: daily cron that reminds authors of failing checks after 7/14 days of inactivity and auto-closes after 14/28 days (external/collaborator). Respects `keep-open` label. New labels created: `triaged`, `task`, `keep-open`. Closes #518 Signed-off-by: Andrea Manoel <amanoel@nvidia.com> * ci: add agentic repository triage workflow Add a weekly scheduled workflow that uses Claude to triage all open issues and PRs, producing a combined dashboard report on a pinned tracking issue. - New recipe (.agents/recipes/issue-triage/) classifies issues, checks staleness, cross-references merged PRs, detects duplicates, and flags PR health problems (missing linked issues, failing checks, orphaned PRs) - New workflow (.github/workflows/agentic-ci-issue-triage.yml) runs every Monday 10:00 UTC on the agentic-ci runner, with manual dispatch support - pr-stale.yml now adds needs-attention label to linked issues when a PR is auto-closed, bridging the two workflows via labels * docs: document stale PR policy and auto-retrigger in CONTRIBUTING.md * fix: address review findings in PR hygiene workflows - pr-linked-issue: fix comment gate so failure comments are posted - pr-stale: upgrade issues permission to write for labeling - pr-stale: compare reminder timestamp against last activity so push/comment actually resets the stale timer * fix: use --body-file in retrigger job to avoid shell quoting issues PR bodies with backticks or unmatched quotes would break the gh pr edit --body "$NEW_BODY" call. Write to a temp file and use --body-file instead. * fix: retrigger job drops PRs after the first jq outputs newline-separated numbers but GITHUB_OUTPUT only preserves the first line. Convert to space-separated so the for loop processes all matching PRs. * fix: harden workflows against shell injection - Move attacker-influenced values (${{ user.login }}, step outputs) from expression interpolation in run: blocks to env vars - Replace echo "$PR_BODY" | grep with write-to-file + grep-file to avoid shell expansion of untrusted PR body content - Same treatment for PR body handling in retrigger and stale jobs * refactor: replace peter-evans actions with gh api calls Remove peter-evans/find-comment and peter-evans/create-or-update-comment third-party action dependencies. Replace with gh api calls for finding, creating, updating, and deleting bot comments. Eliminates supply chain risk from unpinned third-party actions. * docs: add pull_request_target security comment --------- Signed-off-by: Andrea Manoel <amanoel@nvidia.com>
5.8 KiB
🎨✨ Contributing to NeMo Data Designer 🎨✨
The skills and workflows in this repository are for developing DataDesigner. If you're looking to use DataDesigner to build datasets, see the product documentation instead.
This project uses agent-assisted development. Contributors are expected to use agents for investigation, planning, and implementation. The repository includes skills and guidance that make agents effective contributors.
Agents accelerate work; humans stay accountable. People make design decisions and own quality — agents help get there faster.
How to Contribute
- Open an issue using the appropriate issue template.
- Include investigation output. If you used an agent, paste its diagnostics. If you didn't, include the troubleshooting you tried.
- For non-trivial changes, create a plan document at
plans/<issue-number>/before building. Have your agent draft the plan — it should describe the approach, trade-offs considered, affected subsystems, and a delivery strategy. See existing plans inplans/for reference. Submit the plan in a PR for review. - Once the plan is approved, implement using agent-assisted development. See DEVELOPMENT.md for local setup and workflow.
Before You Open an Issue
- Clone the repo and point your agent at it
- Have the agent search docs and existing issues (the
search-docsandsearch-githubskills can help) - If the agent can't resolve it, include the diagnostics in your issue
- If you didn't use an agent, include the troubleshooting you already tried
When to Open an Issue
- Real bugs — reproduced or agent-confirmed
- Feature proposals with design context
- Problems that
search-docs/search-githubcouldn't resolve
When NOT to Open an Issue
- Questions about how things work — an agent can answer these from the codebase
- Configuration problems — an agent can diagnose these
- "How do I..." requests — try the product documentation first
Development Skills
The repository includes skills for common development tasks. These are located in .agents/skills/ and are automatically discovered by agent harnesses.
| Category | Skills | Purpose |
|---|---|---|
| Investigation | search-docs, search-github |
Find information, check for duplicates |
| Development | commit, create-pr, update-pr |
Standard development cycle |
| Review | review-code |
Multi-pass code review |
Pull Requests
- PRs must link to the issue they address (
Fixes #NNNorCloses #NNN). For external contributors, this is enforced by a required status check: the linked issue must exist and carry thetriagedlabel (added by a maintainer after review). Collaborators are exempt from this check. You can open the PR before the issue is triaged - the check re-runs automatically once a maintainer adds the label. - PRs with failing checks that remain inactive are automatically reminded after 7 days and closed after 14 days (collaborators: 14/28 days). Push an update or leave a comment to reset the timer. If you need more time, ask a maintainer to add the
keep-openlabel. - Use the
create-prskill for well-formatted PR descriptions, or follow the PR template - Ensure all checks pass before requesting review:
make check-all-fix make test - Run a self-review before opening the PR using the
review-codeskill. Address any critical or warning-level findings before requesting human review. If you have access to multiple models, run the review with different models across passes — different models surface different issues, and a single pass rarely catches everything.
Pull Request Review Process
- PRs receive an automated CI code review. You must address all critical and warning-level findings from the automated review before requesting human review.
- Maintainers will review your PR and may request changes
- Address feedback by pushing additional commits to your branch
- Reply to each comment before resolving it. If the comment resulted in a code change, include the commit hash that addresses it. Do not resolve comments without a response.
- Once approved, a maintainer will merge your PR
Commit Messages
- Use imperative mood ("add feature" not "added feature")
- Keep the subject line under 50 characters (hard limit: 72)
- Reference issue numbers when applicable (
Fixes #123)
License Headers
All code files must include the NVIDIA copyright header:
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
Use make update-license-headers to add headers automatically.
Signing Off on Your Work (DCO)
When contributing, you must agree that you have authored 100% of the content, that you have the necessary rights to the content, and that the content you contribute may be provided under the project license. All contributors are asked to sign the Developer Certificate of Origin (DCO) when submitting their first pull request. The process is automated by a bot that will comment on the pull request.
Code of Conduct
Data Designer follows the Contributor Covenant Code of Conduct. Please read our complete Code of Conduct for full details.
Reference
- AGENTS.md — architecture, layering, design principles
- STYLEGUIDE.md — code style, naming, imports, type annotations
- DEVELOPMENT.md — local setup, testing, day-to-day workflow