Add three violation recovery scenarios:
- Claimed complete without running verification
- Ran command but didn't paste output
- Used banned phrases before verification
Each includes detection, recovery steps, and rationale.
Task 26/26 from infrastructure implementation plan.
Add three violation recovery scenarios:
- Jumped to fix before finding root cause
- Multiple hypotheses tested simultaneously
- Skipped Phase 1 checklist
Each includes detection, recovery steps, and rationale.
Task 25/26 from infrastructure implementation plan.
Add three violation recovery scenarios:
- Wrote implementation before test
- Test passes without implementation (false green)
- Kept code as reference instead of deleting
Each includes detection, recovery steps, and rationale.
Task 24/26 from infrastructure implementation plan.
Add comprehensive 'When You Violate This Skill' section with template and example.
Provides standardized structure for skills to document violation detection and recovery.
Task 23/26 from infrastructure implementation plan.
**High (H1):** Add $ anchor to VERDICT value regex to prevent
accepting invalid values like "PASS with notes". Now strictly
enforces PASS, FAIL, or NEEDS_DISCUSSION with nothing after.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
**Critical (C3):** Fix output limiting by using head to actually
restrict results to top 5 skills. Previous implementation used break
in a pipe subshell which didn't work correctly.
Changes:
- Filter zero-score skills in awk before limiting
- Use head -5 to enforce limit before while loop
- Move count increment inside loop for correct numbering
- Show warning only if total_matches > 5
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
**Critical (C2):** Add optional event-id parameter that enables
idempotent metrics tracking. Duplicate events are detected and
skipped, preventing double-counting.
**High (H4):** Make metrics updates atomic by combining timestamp
and data updates in single jq commands, ensuring consistency.
Changes:
- Accept optional 5th parameter: event-id
- Track event_ids in metrics.json
- Check for duplicates before processing
- Record event_id atomically with data updates
- Return success for duplicate events (idempotent)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
**Critical (C1):** Fix git_diff_order to use file modification times
instead of git log timestamps for uncommitted changes. This ensures
TDD compliance checks work correctly during development.
**High (H3):** Optimize find performance by pruning node_modules and
vendor directories, preventing unnecessary directory traversal.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Bumps plugin version to 0.0.3.
The prompts for `code-reviewer`, `business-logic-reviewer`, and
`security-reviewer` are completely rewritten. This introduces a
standardized, highly-structured format for all three agents.
Key enhancements include:
- Prioritized checklists for systematic, gate-specific review
- A strict markdown output format with a clear verdict section
- Explicit pass/fail criteria to reduce ambiguity
- Common anti-patterns and best-practice examples to improve guidance
The previous agent prompts were less structured, which could lead to
inconsistent review quality and less thorough analysis. This overhaul
provides a more methodical and repeatable process for the agents. The
standardized structure, checklists, and explicit criteria ensure that
reviews are more comprehensive, reliable, and easier for developers to parse
and act upon, improving the entire automated review workflow.
This commit replaces the single code reviewer with a structured,
three-gate sequential review process to improve quality and efficiency.
The new workflow consists of three specialized agents run in order:
1. `code-reviewer` (Gate 1 - Foundation): Validates architecture,
code quality, and maintainability.
2. `business-logic-reviewer` (Gate 2 - Correctness): A new agent to
verify business rules, requirements, and edge cases.
3. `security-reviewer` (Gate 3 - Safety): A new agent for auditing
security vulnerabilities.
This sequential process ensures that each stage builds on a validated
foundation, preventing wasted effort. For instance, security analysis
is only performed on code that has passed architectural and business
logic reviews.
All reviewer agents are also upgraded to the `opus` model, and all
related skills and documentation are updated to reflect this new workflow.
---
This change replaces the general-purpose code reviewer with a more robust, three-stage sequential review process involving specialized agents. By introducing distinct gates for code quality, business logic, and security, reviews become more focused and effective. Each gate builds on the previous one, ensuring foundational issues are resolved before more specific analysis begins, which prevents wasted effort and improves final code quality.
Overhaul README.md from a brief summary into a comprehensive guide
for developers. It now includes sections on installation, philosophy, a
full skill list, usage examples, and contribution guidelines.
Refactor CLAUDE.md, the primary guide for the AI, for better structure
and actionability. The file now clearly outlines the repository's
architecture, key workflows, common commands, and mandatory patterns.
The previous documentation was minimal and no longer reflected the
project's maturity and complexity. This extensive rewrite provides a
professional and thorough onboarding experience for new users and
contributors. For the AI, the new structure offers a clearer and more
actionable model of the repository, enabling it to use the skills and
workflows more effectively and consistently.
This introduces the "Skill Bulletproofing" initiative, a major effort
to make agent skills more robust, reliable, and resistant to common
failure modes like rationalization and incorrect execution.
A detailed implementation plan outlines the systematic enhancement of
core skills with features like mandatory checkpoints, state tracking, and
explicit failure conditions.
To improve skill discovery and awareness, a new quick reference guide is
added and injected into the agent's context at session start. This
provides a scannable overview of all available skills. Formal test
plans are also added to document baseline behaviors and validate the
effectiveness of the improvements.
The legacy `RELEASE-NOTES.md` file is removed, as this new structured
approach to planning and documentation provides a more granular and
useful historical record.