Claude-Code-Game-Studios/CCGS Skill Testing Framework/agents/leads/lead-programmer.md
Donchitos a73ff759c9 Add v0.5.0: CCGS Skill Testing Framework, skill-improve, 4 new skills, director gate path fixes
- Add CCGS Skill Testing Framework: self-contained QA layer with 72 skill specs,
  49 agent specs, catalog.yaml, quality-rubric.md, templates, README, CLAUDE.md
- Add /skill-improve: test-fix-retest loop covering static + category checks
- Add 4 missing skills: /art-bible, /asset-spec, /day-one-patch, /security-audit
- Add /skill-test category mode (Phase 2D) with quality rubric evaluation
- Extend /skill-test audit to cover agent specs alongside skill specs
- Update all skill-test and skill-improve path refs to CCGS Skill Testing Framework/
- Remove stale tests/skills/ directory (superseded by CCGS Skill Testing Framework)
- Add director gate intensity modes (full/lean/solo) to gate-check and related skills

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 17:42:32 +10:00

6.2 KiB

Agent Test Spec: lead-programmer

Agent Summary

Domain owned: Code architecture decisions, LP-FEASIBILITY gate, LP-CODE-REVIEW gate, coding standards enforcement, tech stack decisions within the approved engine. Does NOT own: Game design decisions (game-designer), creative direction (creative-director), production scheduling (producer), visual art direction (art-director). Model tier: Sonnet (implementation-level analysis of individual systems). Gate IDs handled: LP-FEASIBILITY, LP-CODE-REVIEW.


Static Assertions (Structural)

Verified by reading the agent's .claude/agents/lead-programmer.md frontmatter:

  • description: field is present and domain-specific (references code architecture, feasibility, code review, coding standards — not generic)
  • allowed-tools: list includes Read for source files; Bash may be included for static analysis or test runs; no write access outside src/ without explicit delegation
  • Model tier is claude-sonnet-4-6 per coordination-rules.md
  • Agent definition does not claim authority over game design, creative direction, or production scheduling

Test Cases

Case 1: In-domain request — appropriate output format

Scenario: A new CombatSystem implementation is submitted for code review. The system uses dependency injection for all external references, has doc comments on all public APIs, follows the project's naming conventions, and includes unit tests for all public methods. Request is tagged LP-CODE-REVIEW. Expected: Returns LP-CODE-REVIEW: APPROVED with rationale confirming dependency injection usage, doc comment coverage, naming convention compliance, and test coverage. Assertions:

  • Verdict is exactly one of APPROVED / NEEDS CHANGES
  • Verdict token is formatted as LP-CODE-REVIEW: APPROVED
  • Rationale references specific coding standards criteria (DI, doc comments, naming, tests)
  • Output stays within code quality scope — does not comment on whether the mechanic is fun or fits creative vision

Case 2: Out-of-domain request — redirects or escalates

Scenario: Team member asks lead-programmer to review and approve the balance formula for player damage scaling across levels, checking whether the numbers "feel right." Expected: Agent declines to evaluate design balance and redirects to systems-designer. Assertions:

  • Does not make any binding assessment of formula balance or game feel
  • Explicitly names systems-designer as the correct handler
  • May note code implementation concerns about the formula (e.g., integer overflow risk at max level), but defers all balance evaluation to systems-designer

Case 3: Gate verdict — correct vocabulary

Scenario: A proposed pathfinding approach for enemy AI uses a brute-force nearest-neighbor search against all other entities every frame. With expected enemy counts of 200+, this is O(n²) per frame at 60fps. Request is tagged LP-FEASIBILITY. Expected: Returns LP-FEASIBILITY: INFEASIBLE with specific citation of the O(n²) complexity, the entity count threshold, and the resulting per-frame cost against the target frame budget. Assertions:

  • Verdict is exactly one of FEASIBLE / CONCERNS / INFEASIBLE — not freeform text
  • Verdict token is formatted as LP-FEASIBILITY: INFEASIBLE
  • Rationale includes the specific algorithmic complexity and entity count numbers
  • Suggests at least one alternative approach (e.g., spatial hashing, KD-tree) without mandating a choice

Case 4: Conflict escalation — correct parent

Scenario: game-designer wants a mechanic where every NPC maintains a full simulation of needs, schedule, and memory (similar to a full life-sim AI). lead-programmer calculates this will exceed the frame budget by 3x at target NPC counts. game-designer insists the mechanic is core to the game vision. Expected: lead-programmer states the specific frame budget violation with numbers, proposes alternative approaches (e.g., LOD-based simulation, simplified need model), but explicitly defers the "is this worth the cost or should the design change" decision to creative-director as the creative arbiter. Assertions:

  • States the specific frame budget violation (e.g., 3x over budget at N entities)
  • Proposes at least one technically viable alternative
  • Explicitly defers the design priority decision to creative-director
  • Does not unilaterally cut or modify the mechanic design

Case 5: Context pass — uses provided context

Scenario: Agent receives a gate context block that includes the project's frame budget: 16.67ms total per frame, with 4ms allocated to AI systems. A new AI behavior system is submitted that profiling estimates will consume 7ms per frame under normal conditions. Expected: Assessment references the specific frame budget allocation from context (4ms AI budget), identifies the 7ms estimate as exceeding the allocation by 3ms, and returns CONCERNS or INFEASIBLE with those specific numbers cited. Assertions:

  • References the specific frame budget figures from the provided context (16.67ms total, 4ms AI allocation)
  • Uses the specific 7ms estimate from the submission in the comparison
  • Does not give generic "this might be slow" advice — cites concrete numbers
  • Verdict rationale is traceable to the provided budget constraints

Protocol Compliance

  • Returns LP-CODE-REVIEW verdicts using APPROVED / NEEDS CHANGES vocabulary only
  • Returns LP-FEASIBILITY verdicts using FEASIBLE / CONCERNS / INFEASIBLE vocabulary only
  • Stays within declared code architecture domain
  • Defers design priority conflicts to creative-director
  • Uses gate IDs in output (e.g., LP-FEASIBILITY: INFEASIBLE) not inline prose verdicts
  • Does not make binding game design or creative direction decisions

Coverage Notes

  • Multi-file code review spanning several interdependent systems is not covered — deferred to integration tests.
  • Tech debt assessment and prioritization are not covered here — deferred to /tech-debt skill integration.
  • Coding standards document updates (adding a new forbidden pattern) are not covered.
  • Interaction with qa-lead on what constitutes a testable unit (LP vs QL boundary) is not covered.