Add v0.5.0: CCGS Skill Testing Framework, skill-improve, 4 new skills, director gate path fixes

- Add CCGS Skill Testing Framework: self-contained QA layer with 72 skill specs, 49 agent specs, catalog.yaml, quality-rubric.md, templates, README, CLAUDE.md - Add /skill-improve: test-fix-retest loop covering static + category checks - Add 4 missing skills: /art-bible, /asset-spec, /day-one-patch, /security-audit - Add /skill-test category mode (Phase 2D) with quality rubric evaluation - Extend /skill-test audit to cover agent specs alongside skill specs - Update all skill-test and skill-improve path refs to CCGS Skill Testing Framework/ - Remove stale tests/skills/ directory (superseded by CCGS Skill Testing Framework) - Add director gate intensity modes (full/lean/solo) to gate-check and related skills Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 13:27:18 +00:00 · 2026-04-06 17:42:32 +10:00 · 2026-04-06 17:42:32 +10:00 · a73ff759c9
commit a73ff759c9
parent 8ba9e736a5
192 changed files with 21953 additions and 1158 deletions
--- a/.claude/docs/director-gates.md
+++ b/.claude/docs/director-gates.md
@ -43,9 +43,9 @@ Examples:

 | Mode | What runs | Best for |
 |------|-----------|----------|
-| `full` | All gates active — current behaviour | New projects, teams, learning the workflow |
-| `lean` | PHASE-GATEs only (`/gate-check`) — all per-skill gates skipped | Experienced devs who trust their own design work |
-| `solo` | No director gates anywhere | Game jams, prototypes, seasoned solo devs at speed |
+| `full` | All gates active — every workflow step reviewed | Teams, learning users, or when you want thorough director feedback at every step |
+| `lean` | PHASE-GATEs only (`/gate-check`) — per-skill gates skipped | **Default** — solo devs and small teams; directors review at milestones only |
+| `solo` | No director gates anywhere | Game jams, prototypes, maximum speed |

 **Check pattern — apply before every gate spawn:**

@ -66,7 +66,18 @@ Apply the resolved mode:

 ## Invocation Pattern (copy into any skill)

+**MANDATORY: Resolve review mode before every gate spawn.** Never spawn a gate without checking. The resolved mode is determined once per skill run:
+1. If skill was called with `--review [mode]`, use that
+2. Else read `production/review-mode.txt`
+3. Else default to `lean`
+
+Apply the resolved mode:
+- `solo` → **skip all gates**. Note in output: `[GATE-ID] skipped — Solo mode`
+- `lean` → **skip unless this is a PHASE-GATE** (CD-PHASE-GATE, TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE). Note: `[GATE-ID] skipped — Lean mode`
+- `full` → spawn as normal
+
 ```
+# Apply mode check, then:
 Spawn `[agent-name]` via Task:
 - Gate: [GATE-ID] (see .claude/docs/director-gates.md)
 - Context: [fields listed under that gate]
@ -76,6 +87,7 @@ Spawn `[agent-name]` via Task:
 For parallel spawning (multiple directors at the same gate point):

 ```
+# Apply mode check for each gate first, then spawn all that survive:
 Spawn all [N] agents simultaneously via Task — issue all Task calls before
 waiting for any result. Collect all verdicts before proceeding.
 ```
@ -524,6 +536,86 @@ is invoked

 ---

+## Tier 1 — Art Director Gates
+
+Agent: `art-director` | Model tier: Sonnet | Domain: Visual identity, art bible, visual production readiness
+
+---
+
+### AD-CONCEPT-VISUAL — Visual Identity Anchor
+
+**Trigger**: After game pillars are locked (brainstorm Phase 4), in parallel with CD-PILLARS
+
+**Context to pass**:
+- Game concept (elevator pitch, core fantasy, unique hook)
+- Full pillar set with names, definitions, and design tests
+- Target platform (if known)
+- Any reference games or visual touchstones mentioned by the user
+
+**Prompt**:
+> "Based on these game pillars and core concept, propose 2-3 distinct visual identity
+> directions. For each direction provide: (1) a one-line visual rule that could guide
+> all visual decisions (e.g., 'everything must move', 'beauty is in the decay'), (2)
+> mood and atmosphere targets, (3) shape language (sharp/rounded/organic/geometric
+> emphasis), (4) color philosophy (palette direction, what colors mean in this world).
+> Be specific — avoid generic descriptions. One direction should directly serve the
+> primary design pillar. Name each direction. Recommend which best serves the stated
+> pillars and explain why."
+
+**Verdicts**: CONCEPTS (multiple valid options — user selects) / STRONG (one direction clearly dominant) / CONCERNS (pillars don't provide enough direction to differentiate visual identity yet)
+
+---
+
+### AD-ART-BIBLE — Art Bible Sign-Off
+
+**Trigger**: After the art bible is drafted (`/art-bible`), before asset production begins
+
+**Context to pass**:
+- Art bible path (`design/art/art-bible.md`)
+- Game pillars and core fantasy
+- Platform and performance constraints (from `.claude/docs/technical-preferences.md` if configured)
+- Visual identity anchor chosen during brainstorm (from `design/gdd/game-concept.md`)
+
+**Prompt**:
+> "Review this art bible for completeness and internal consistency. Does the color
+> system match the mood targets? Does the shape language follow from the visual
+> identity statement? Are the asset standards achievable within the platform
+> constraints? Does the character design direction give artists enough to work from
+> without over-specifying? Are there contradictions between sections? Would an
+> outsourcing team be able to produce assets from this document without additional
+> briefing? Return APPROVE (art bible is production-ready), CONCERNS [specific
+> sections needing clarification], or REJECT [fundamental inconsistencies that must
+> be resolved before asset production begins]."
+
+**Verdicts**: APPROVE / CONCERNS / REJECT
+
+---
+
+### AD-PHASE-GATE — Visual Readiness at Phase Transition
+
+**Trigger**: Always at `/gate-check` — spawn in parallel with CD-PHASE-GATE, TD-PHASE-GATE, and PR-PHASE-GATE
+
+**Context to pass**:
+- Target phase name
+- List of all art/visual artifacts present (file paths)
+- Visual identity anchor from `design/gdd/game-concept.md` (if present)
+- Art bible path if it exists (`design/art/art-bible.md`)
+
+**Prompt**:
+> "Review the current project state for [target phase] gate readiness from a visual
+> direction perspective. Is the visual identity established and documented at the
+> level this phase requires? Are the right visual artifacts in place? Would visual
+> teams be able to begin their work without visual direction gaps that cause costly
+> rework later? Are there visual decisions that are being deferred past their latest
+> responsible moment? Return READY, CONCERNS [specific visual direction gaps that
+> could cause production rework], or NOT READY [visual blockers that must exist
+> before this phase can succeed — specify what artifact is missing and why it
+> matters at this stage]."
+
+**Verdicts**: READY / CONCERNS / NOT READY
+
+---
+
 ## Tier 2 — Lead Gates

 These gates are invoked by orchestration skills and senior skills when a domain
@ -678,8 +770,9 @@ Spawn in parallel (issue all Task calls before waiting for any result):
 1. creative-director  → gate CD-PHASE-GATE
 2. technical-director → gate TD-PHASE-GATE
 3. producer           → gate PR-PHASE-GATE
+4. art-director       → gate AD-PHASE-GATE

-Collect all three verdicts, then apply escalation rules:
+Collect all four verdicts, then apply escalation rules:
 - Any NOT READY / REJECT → overall verdict minimum FAIL
 - Any CONCERNS → overall verdict minimum CONCERNS
 - All READY / APPROVE → eligible for PASS (still subject to artifact checks)
@ -704,10 +797,10 @@ When a new gate is needed for a new skill or workflow:

 | Stage | Required Gates | Optional Gates |
 |-------|---------------|----------------|
-| **Concept** | CD-PILLARS | TD-FEASIBILITY, PR-SCOPE |
+| **Concept** | CD-PILLARS, AD-CONCEPT-VISUAL | TD-FEASIBILITY, PR-SCOPE |
 | **Systems Design** | TD-SYSTEM-BOUNDARY, CD-SYSTEMS, PR-SCOPE, CD-GDD-ALIGN (per GDD) | ND-CONSISTENCY, AD-VISUAL |
-| **Technical Setup** | TD-ARCHITECTURE, TD-ADR (per ADR), LP-FEASIBILITY | TD-ENGINE-RISK |
-| **Pre-Production** | PR-EPIC, QL-STORY-READY (per story), PR-SPRINT, all three PHASE-GATE (via gate-check) | CD-PLAYTEST |
-| **Production** | LP-CODE-REVIEW (per story), QL-STORY-READY, PR-SPRINT (per sprint) | PR-MILESTONE, QL-TEST-COVERAGE |
-| **Polish** | QL-TEST-COVERAGE, CD-PLAYTEST, PR-MILESTONE | |
-| **Release** | All three PHASE-GATE (via gate-check) | QL-TEST-COVERAGE |
+| **Technical Setup** | TD-ARCHITECTURE, TD-ADR (per ADR), LP-FEASIBILITY, AD-ART-BIBLE | TD-ENGINE-RISK |
+| **Pre-Production** | PR-EPIC, QL-STORY-READY (per story), PR-SPRINT, all four PHASE-GATEs (via gate-check) | CD-PLAYTEST |
+| **Production** | LP-CODE-REVIEW (per story), QL-STORY-READY, PR-SPRINT (per sprint) | PR-MILESTONE, QL-TEST-COVERAGE, AD-VISUAL |
+| **Polish** | QL-TEST-COVERAGE, CD-PLAYTEST, PR-MILESTONE | AD-VISUAL |
+| **Release** | All four PHASE-GATEs (via gate-check) | QL-TEST-COVERAGE |
--- a/.claude/docs/workflow-catalog.yaml
+++ b/.claude/docs/workflow-catalog.yaml
@ -10,6 +10,9 @@
 # required: true → blocks progression to next phase (shown as REQUIRED)
 # required: false → optional enhancement (shown as OPTIONAL)
 # repeatable: true → runs multiple times (one per system, story, etc.)
+#
+# Phase gates (/gate-check): verdicts are ADVISORY — they guide the decision
+# but never hard-block advancement. The user always decides whether to proceed.

 phases:

@ -47,6 +50,14 @@ phases:
        required: false
        description: "Validate the game concept (recommended before proceeding)"

+      - id: art-bible
+        name: "Art Bible"
+        command: /art-bible
+        required: true
+        artifact:
+          glob: "design/art/art-bible.md"
+        description: "Author the visual identity specification (9 sections). Uses the Visual Identity Anchor produced by /brainstorm. Run after game concept is formed, before systems design."
+
      - id: map-systems
        name: "Systems Map"
        command: /map-systems
@ -84,9 +95,16 @@ phases:
          glob: "design/gdd/gdd-cross-review-*.md"
        description: "Holistic consistency check + design theory review across all GDDs simultaneously"

+      - id: consistency-check
+        name: "Consistency Check"
+        command: /consistency-check
+        required: false
+        repeatable: true
+        description: "Scan all GDDs for contradictions, undefined references, and mechanic conflicts. Run after /review-all-gdds, and again any time a GDD is added or revised mid-project."
+
  technical-setup:
    label: "Technical Setup"
-    description: "Architecture decisions, accessibility foundations, engine validation"
+    description: "Architecture decisions, visual identity specification, accessibility foundations, engine validation"
    next_phase: pre-production
    steps:
      - id: create-architecture
@ -132,9 +150,18 @@ phases:

  pre-production:
    label: "Pre-Production"
-    description: "UX specs, prototype the core mechanic, define stories, validate fun"
+    description: "UX specs, asset specs, prototype the core mechanic, define stories, validate fun"
    next_phase: production
    steps:
+      - id: asset-spec
+        name: "Asset Specs"
+        command: /asset-spec
+        required: false
+        repeatable: true
+        artifact:
+          glob: "design/assets/asset-manifest.md"
+        description: "Generate per-asset visual specifications and AI generation prompts from approved GDDs and level docs. Run once per system/level/character."
+
      - id: ux-design
        name: "UX Specs (key screens)"
        command: /ux-design
@ -180,6 +207,14 @@ phases:
          min_count: 2
        description: "Break each epic into implementable story files. Run per epic: /create-stories [epic-slug]"

+      - id: test-setup
+        name: "Test Framework Setup"
+        command: /test-setup
+        required: false
+        artifact:
+          note: "Check tests/ directory for engine-specific test framework scaffold"
+        description: "Scaffold the test framework and CI pipeline once before the first sprint. Leads to /test-helpers for fixture generation, /qa-plan per epic, and /smoke-check per sprint."
+
      - id: sprint-plan
        name: "First Sprint Plan"
        command: /sprint-plan
@ -191,11 +226,12 @@ phases:

      - id: vertical-slice
        name: "Vertical Slice (playtested)"
+        command: /playtest-report
        required: true
        artifact:
          glob: "production/playtests/*.md"
          min_count: 1
-        description: "Playable end-to-end core loop, playtested with ≥3 sessions. HARD GATE."
+        description: "Document vertical slice playtest sessions using /playtest-report. Run at least once here (≥1 session required before Production; ≥3 required before Polish). Each session should cover one complete run-through of the core loop."

  production:
    label: "Production"
@ -224,7 +260,14 @@ phases:
        repeatable: true
        artifact:
          note: "Check src/ for active code and production/epics/**/*.md for In Progress stories"
-        description: "Pick the next ready story and implement it with /dev-story [story-path]. Routes to the correct programmer agent. Then run /code-review and /story-done."
+        description: "Pick the next ready story and implement it with /dev-story [story-path]. Routes to the correct programmer agent."
+
+      - id: code-review
+        name: "Code Review"
+        command: /code-review
+        required: false
+        repeatable: true
+        description: "Architectural code review after each story implementation. Run after /dev-story, before /story-done."

      - id: story-done
        name: "Story Done Review"
@ -233,6 +276,33 @@ phases:
        repeatable: true
        description: "Verify all acceptance criteria, check GDD/ADR deviations, close the story"

+      - id: qa-plan
+        name: "QA Plan"
+        command: /qa-plan
+        required: false
+        repeatable: true
+        description: "Generate a QA test plan per epic or sprint. Run /qa-plan [epic-slug]. Produces test cases for /smoke-check, /regression-suite, and /test-evidence-review."
+
+      - id: bug-report
+        name: "Bug Report / Triage"
+        command: /bug-report
+        required: false
+        repeatable: true
+        description: "Log and prioritize bugs found during implementation. /bug-report creates a structured report; /bug-triage prioritizes the open backlog."
+
+      - id: retrospective
+        name: "Sprint Retrospective"
+        command: /retrospective
+        required: false
+        repeatable: true
+        description: "Post-sprint review to capture what worked and what to change. Run at the end of each sprint, before planning the next."
+
+      - id: team-feature
+        name: "Team Orchestration (optional)"
+        required: false
+        repeatable: true
+        description: "Coordinate multiple agents on a complex feature. Use: /team-combat, /team-narrative, /team-ui, /team-audio, /team-level, /team-live-ops, /team-qa. Run when a feature spans multiple agent domains."
+
      - id: scope-check
        name: "Scope Check"
        command: /scope-check
--- a/.claude/hooks/session-stop.sh
+++ b/.claude/hooks/session-stop.sh
@ -11,17 +11,17 @@ mkdir -p "$SESSION_LOG_DIR" 2>/dev/null
 RECENT_COMMITS=$(git log --oneline --since="8 hours ago" 2>/dev/null)
 MODIFIED_FILES=$(git diff --name-only 2>/dev/null)

-# --- Clean up active session state on normal shutdown ---
+# --- Archive active session state on shutdown (do NOT delete) ---
+# active.md persists across clean exits so multi-session recovery works.
+# It is only valid to delete active.md manually or when explicitly superseded.
 STATE_FILE="production/session-state/active.md"
 if [ -f "$STATE_FILE" ]; then
-    # Archive to session log before removing
    {
        echo "## Archived Session State: $TIMESTAMP"
        cat "$STATE_FILE"
        echo "---"
        echo ""
    } >> "$SESSION_LOG_DIR/session-log.md" 2>/dev/null
-    rm "$STATE_FILE" 2>/dev/null
 fi

 if [ -n "$RECENT_COMMITS" ] || [ -n "$MODIFIED_FILES" ]; then
--- a/.claude/skills/adopt/SKILL.md
+++ b/.claude/skills/adopt/SKILL.md
@ -4,7 +4,6 @@ description: "Brownfield onboarding — audits existing project artifacts for te
 argument-hint: "[focus: full | gdds | adrs | stories | infra]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, AskUserQuestion
-context: fork
 agent: technical-director
 ---

@ -37,7 +36,10 @@ wrong internal format.

 ## Phase 1: Detect Project State

-Read silently before presenting anything.
+Emit one line before reading: `"Scanning project artifacts..."` — this confirms the
+skill is running during the silent read phase.
+
+Then read silently before presenting anything else.

 ### Existence check
 - `production/stage.txt` — if present, read it (authoritative phase)
@ -48,6 +50,7 @@ Read silently before presenting anything.
 - Count story files: `production/epics/**/*.md` (excluding EPIC.md)
 - `.claude/docs/technical-preferences.md` — engine configured?
 - `docs/engine-reference/` — engine reference docs present?
+- Glob `docs/adoption-plan-*.md` — note the filename of the most recent prior plan if any exist

 ### Infer phase (if no stage.txt)
 Use the same heuristic as `/project-stage-detect`:
@ -58,9 +61,15 @@ Use the same heuristic as `/project-stage-detect`:
 - game-concept.md exists → Concept
 - Nothing → Fresh (not a brownfield project — suggest `/start`)

-If the project appears fresh (no artifacts at all), stop:
-> "This looks like a fresh project with no existing artifacts. Run `/start`
-> instead — `/adopt` is for projects that already have work to migrate."
+If the project appears fresh (no artifacts at all), use `AskUserQuestion`:
+- "This looks like a fresh project — no existing artifacts found. `/adopt` is for
+  projects with work to migrate. What would you like to do?"
+  - "Run `/start` — begin guided first-time onboarding"
+  - "My artifacts are in a non-standard location — help me find them"
+  - "Cancel"
+
+Then stop — do not proceed with the audit regardless of which option the user picks
+(each option leads to a different skill or manual investigation).

 Report: "Detected phase: [phase]. Found: [N] GDDs, [M] ADRs, [P] stories."

@ -247,7 +256,26 @@ Gap counts:
 Estimated remediation: [X blocking items × ~Y min each = roughly Z hours]
 ```

-Ask: "May I write the full migration plan to `docs/adoption-plan-[date].md`?"
+Before asking to write, show a **Gap Preview**:
+- List every BLOCKING gap as a one-line bullet describing the actual problem
+  (e.g. `systems-index.md: 3 rows have parenthetical status values`,
+  `adr-0002.md: missing ## Status section`). No counts — show the actual items.
+- Show HIGH / MEDIUM / LOW as counts only (e.g. `HIGH: 4, MEDIUM: 2, LOW: 1`).
+
+This gives the user enough context to judge scope before committing to writing the file.
+
+If a prior adoption plan was detected in Phase 1, add a note:
+> "A previous plan exists at `docs/adoption-plan-[prior-date].md`. The new plan will
+> reflect current project state — it does not diff against the prior run."
+
+Use `AskUserQuestion`:
+- "Ready to write the migration plan?"
+  - "Yes — write `docs/adoption-plan-[date].md`"
+  - "Show me the full plan preview first (don't write yet)"
+  - "Cancel — I'll handle migration manually"
+
+If the user picks "Show me the full plan preview", output the complete plan as a
+fenced markdown block. Then ask again with the same three options.

 ---

@ -261,7 +289,7 @@ If approved, write `docs/adoption-plan-[date].md` with this structure:
 > **Generated**: [date]
 > **Project phase**: [phase]
 > **Engine**: [name + version, or "Not configured"]
-> **Template version**: v0.4.0+
+> **Template version**: v1.0+

 Work through these steps in order. Check off each item as you complete it.
 Re-run `/adopt` anytime to check remaining gaps.
@ -334,29 +362,69 @@ are resolved. The new run will reflect the current state of the project.

 ---

+## Phase 6b: Set Review Mode
+
+After writing the adoption plan (or if the user cancels writing), check whether
+`production/review-mode.txt` exists.
+
+**If it exists**: Read it and note the current mode — "Review mode is already set to `[current]`." — skip the prompt.
+
+**If it does not exist**: Use `AskUserQuestion`:
+
+- **Prompt**: "One more setup step: how much design review would you like as you work through the workflow?"
+- **Options**:
+  - `Full` — Director specialists review at each key workflow step. Best for teams, learning the workflow, or when you want thorough feedback on every decision.
+  - `Lean (recommended)` — Directors only at phase gate transitions (/gate-check). Skips per-skill reviews. Balanced for solo devs and small teams.
+  - `Solo` — No director reviews at all. Maximum speed. Best for game jams, prototypes, or if reviews feel like overhead.
+
+Write the choice to `production/review-mode.txt` immediately after selection — no separate "May I write?" needed:
+- `Full` → write `full`
+- `Lean (recommended)` → write `lean`
+- `Solo` → write `solo`
+
+Create the `production/` directory if it does not exist.
+
+---
+
 ## Phase 7: Offer First Action

 After writing the plan, don't stop there. Pick the single highest-priority gap
-and offer to handle it immediately:
-
-If there are parenthetical status values in systems-index.md:
-> "The most urgent fix is the systems-index.md status values — this breaks
-> multiple skills right now. I can fix these in-place in under 2 minutes.
-> Shall I edit the file now?"
-
-If ADRs are missing Status fields:
-> "The most urgent fix is adding Status fields to your ADRs. Shall I start
-> with `docs/architecture/adr-0001.md` using `/architecture-decision retrofit`?"
-
-If GDDs are missing Acceptance Criteria:
-> "The most important GDD gap is missing Acceptance Criteria — without these,
-> `/create-stories` can't generate stories. Shall I start with
-> `design/gdd/[highest-priority-system].md` using `/design-system retrofit`?"
+and offer to handle it immediately using `AskUserQuestion`. Choose the first
+branch that applies:

+**If there are parenthetical status values in systems-index.md:**
 Use `AskUserQuestion`:
- "What would you like to do now?"
-  - Options: "Fix [most urgent gap] now", "Review the full plan first",
-    "I'll work through the plan myself", "Run `/project-stage-detect` for broader context"
+- "The most urgent fix is `systems-index.md` — [N] rows have parenthetical status
+  values (e.g. `Needs Revision (see notes)`) that break /gate-check,
+  /create-stories, and /architecture-review right now. I can fix these in-place."
+  - "Fix it now — edit systems-index.md"
+  - "I'll fix it myself"
+  - "Done — leave me with the plan"
+
+**If ADRs are missing `## Status` (and no parenthetical issue):**
+Use `AskUserQuestion`:
+- "The most urgent fix is adding `## Status` to [N] ADR(s): [list filenames].
+  Without it, /story-readiness silently passes all ADR checks. Start with
+  [first affected filename]?"
+  - "Yes — retrofit [first affected filename] now"
+  - "Retrofit all [N] ADRs one by one"
+  - "I'll handle ADRs myself"
+
+**If GDDs are missing Acceptance Criteria (and no blocking issues above):**
+Use `AskUserQuestion`:
+- "The most urgent gap is missing Acceptance Criteria in [N] GDD(s):
+  [list filenames]. Without them, /create-stories can't generate stories.
+  Start with [highest-priority GDD filename]?"
+  - "Yes — add Acceptance Criteria to [GDD filename] now"
+  - "Do all [N] GDDs one by one"
+  - "I'll handle GDDs myself"
+
+**If no BLOCKING or HIGH gaps exist:**
+Use `AskUserQuestion`:
+- "No blocking gaps — this project is template-compatible. What next?"
+  - "Walk me through the medium-priority improvements"
+  - "Run /project-stage-detect for a broader health check"
+  - "Done — I'll work through the plan at my own pace"

 ---

--- a/.claude/skills/architecture-decision/SKILL.md
+++ b/.claude/skills/architecture-decision/SKILL.md
@ -3,15 +3,19 @@ name: architecture-decision
 description: "Creates an Architecture Decision Record (ADR) documenting a significant technical decision, its context, alternatives considered, and consequences. Every major technical choice should have an ADR."
 argument-hint: "[title] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Task
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
 ---

 When this skill is invoked:

 ## 0. Parse Arguments — Detect Retrofit Mode

-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 **If the argument starts with `retrofit` followed by a file path**
 (e.g., `/architecture-decision retrofit docs/architecture/adr-0001-event-system.md`):
@ -163,33 +167,61 @@ or explicitly accepted as an intentional exception.

 ## 3. Guide the decision collaboratively

-Ask clarifying questions if the title alone is not sufficient. For each major
-section, present 2-4 options with pros/cons before drafting. Do not generate
-the ADR until the key decision is confirmed by the user.
+Before asking anything, derive the skill's best guesses from the context already
+gathered (GDDs read, engine reference loaded, existing ADRs scanned). Then present
+a **confirm/adjust** prompt using `AskUserQuestion` — not open-ended questions.

-Key questions to ask:
- What problem are we solving? What breaks if we don't decide this now?
- What constraints apply (engine version, platform, performance budget)?
- What alternatives have you already considered?
- Which post-cutoff engine features (if any) does this decision depend on?
- **Which GDD systems motivated this decision?** For each, what specific
-  requirement (rule, formula, performance constraint, integration point) in
-  that GDD cannot be satisfied without this architectural decision?
+**Derive assumptions first:**
+- **Problem**: Infer from the title + GDD context what decision needs to be made
+- **Alternatives**: Propose 2-3 concrete options from engine reference + GDD requirements
+- **Dependencies**: Scan existing ADRs for upstream dependencies; assume None if unclear
+- **GDD linkage**: Extract which GDD systems the title directly relates to
+- **Status**: Always `Proposed` for new ADRs — never ask the user what the status is

-If the decision is foundational (no GDD drives it directly), ask:
- Which GDD systems will this decision constrain or enable?
+**Scope of assumptions tab**: Assumptions cover only: problem framing, alternative approaches, upstream dependencies, GDD linkage, and status. Schema design questions (e.g., "How should spawn timing work?", "Should data be inline or external?") are NOT assumptions — they are design decisions belonging to a separate step after the assumptions are confirmed. Do not include schema design questions in the assumptions AskUserQuestion widget.

-This GDD linkage becomes a mandatory "GDD Requirements Addressed" section
-in the ADR. Do not skip it.
+**After assumptions are confirmed**, if the ADR involves schema or data design choices, use a separate multi-tab `AskUserQuestion` to ask each design question independently before drafting.

-**Does this ADR have ordering constraints?** Ask:
- Does this decision depend on any other ADR that isn't yet Accepted? (If
-  so, this ADR cannot be safely implemented until that one is resolved.)
- Does accepting this ADR unlock or unblock any other pending decisions?
- Does this ADR block any specific epic or story from starting?
+**Present assumptions with `AskUserQuestion`:**

-Record the answers in the **ADR Dependencies** section. If no ordering
-constraints exist, write "None" in each field.
+```
+Here's what I'm assuming before drafting:
+
+Problem: [one-sentence problem statement derived from context]
+Alternatives I'll consider:
+  A) [option derived from engine reference]
+  B) [option derived from GDD requirements]
+  C) [option from common patterns]
+GDD systems driving this: [list derived from context]
+Dependencies: [upstream ADRs if any, otherwise "None"]
+Status: Proposed
+
+[A] Proceed — draft with these assumptions
+[B] Change the alternatives list
+[C] Adjust the GDD linkage
+[D] Add a performance budget constraint
+[E] Something else needs changing first
+```
+
+Do not generate the ADR until the user confirms assumptions or provides corrections.
+
+**After engine specialist and TD reviews return** (Step 4.5/4.6), if unresolved
+decisions remain, present each one as a separate `AskUserQuestion` with the proposed
+options as choices plus a free-text escape:
+
+```
+Decision: [specific unresolved point]
+[A] [option from specialist review]
+[B] [alternative option]
+[C] Different approach — I'll describe it
+```
+
+**ADR Dependencies** — derive from existing ADRs, then confirm:
+- Does this decision depend on any other ADR not yet Accepted?
+- Does it unlock or unblock any other ADR or epic?
+- Does it block any specific epic from starting?
+
+Record answers in the **ADR Dependencies** section. Write "None" for each field if no constraints apply.

 ---

@ -312,14 +344,48 @@ to implement it.]
   - If the specialist identifies a **blocking issue** (wrong API, deprecated approach, engine version incompatibility): revise the Decision and Engine Compatibility sections accordingly, then confirm the changes with the user before proceeding
   - If the specialist finds **minor notes** only: incorporate them into the ADR's Risks subsection

+**Review mode check** — apply before spawning TD-ADR:
+- `solo` → skip. Note: "TD-ADR skipped — Solo mode." Proceed to Step 4.7 (GDD sync check).
+- `lean` → skip (not a PHASE-GATE). Note: "TD-ADR skipped — Lean mode." Proceed to Step 4.7 (GDD sync check).
+- `full` → spawn as normal.
+
 4.6. **Technical Director Strategic Review** — After the engine specialist validation, spawn `technical-director` via Task using gate **TD-ADR** (`.claude/docs/director-gates.md`):
   - Pass: the ADR file path (or draft content), engine version, domain, any existing ADRs in the same domain
   - The TD validates architectural coherence (is this decision consistent with the whole system?) — distinct from the engine specialist's API-level check
   - If CONCERNS or REJECT: revise the Decision or Alternatives sections accordingly before proceeding

-5. Ask: "May I write this ADR to `docs/architecture/adr-[NNNN]-[slug].md`?"
+4.7. **GDD Sync Check** — Before presenting the write approval, scan all GDDs
+referenced in the "GDD Requirements Addressed" section for naming inconsistencies
+with the ADR's Key Interfaces and Decision sections (renamed signals, API methods,
+or data types). If any are found, surface them as a **prominent warning block**
+immediately before the write approval — not as a footnote:

-If yes, write the file, creating the directory if needed.
+```
+⚠️ GDD SYNC REQUIRED
+[gdd-filename].md uses names this ADR has renamed:
+  [old_name] → [new_name_from_adr]
+  [old_name_2] → [new_name_2_from_adr]
+The GDD must be updated before or alongside writing this ADR to prevent
+developers reading the GDD from implementing the wrong interface.
+```
+
+If no inconsistencies: skip this block silently.
+
+5. **Write approval** — Use `AskUserQuestion`:
+
+If GDD sync issues were found:
+- "ADR draft is complete. How would you like to proceed?"
+  - [A] Write ADR + update GDD in the same pass
+  - [B] Write ADR only — I'll update the GDD manually
+  - [C] Not yet — I need to review further
+
+If no GDD sync issues:
+- "ADR draft is complete. May I write it?"
+  - [A] Write ADR to `docs/architecture/adr-[NNNN]-[slug].md`
+  - [B] Not yet — I need to review further
+
+If yes to any write option, write the file, creating the directory if needed.
+For option [A] with GDD update: also update the GDD file(s) to use the new names.

 6. **Update Architecture Registry**

@ -340,10 +406,50 @@ Registry candidates from this ADR:
  EXISTING (referenced_by update only): player_health → already registered ✅
 ```

-Ask: "May I update `docs/registry/architecture.yaml` with these [N] new stances?"
+**Registry append logic**: When writing to `docs/registry/architecture.yaml`, do NOT assume sections are empty. The file may already have entries from previous ADRs written in this session. Before each Edit call:
+1. Read the current state of `docs/registry/architecture.yaml`
+2. Find the correct section (state_ownership, interfaces, forbidden_patterns, api_decisions)
+3. Append the new entry AFTER the last existing entry in that section — do not try to replace a `[]` placeholder that may no longer exist
+4. If the section has entries already, use the closing content of the last entry as the `old_string` anchor, and append the new entry after it

-If yes: append new entries. Never modify existing entries — if a stance is
-changing, set the old entry to `status: superseded_by: ADR-[NNNN]` and add
-the new entry.
+**BLOCKING — do not write to `docs/registry/architecture.yaml` without explicit user approval.**

-**Next Steps:** Run `/architecture-review` to validate coverage after the ADR is saved. Update any stories that were `Status: Blocked` pending this ADR to `Status: Ready`.
+Ask using `AskUserQuestion`:
+- "May I update `docs/registry/architecture.yaml` with these [N] new stances?"
+  - Options: "Yes — update the registry", "Not yet — I want to review the candidates", "Skip registry update"
+
+Only proceed if the user selects yes. If yes: append new entries. Never modify existing entries — if a stance is
+changing, set the old entry to `status: superseded_by: ADR-[NNNN]` and add the new entry.
+
+---
+
+## 7. Closing Next Steps
+
+After the ADR is written (and registry optionally updated), close with `AskUserQuestion`.
+
+Before generating the widget:
+1. Read `docs/registry/architecture.yaml` — check if any priority ADRs are still unwritten (look for ADRs flagged in technical-preferences.md or systems-index.md as prerequisites)
+2. Check if all prerequisite ADRs are now written. If yes, include a "Start writing GDDs" option.
+3. List ALL remaining priority ADRs as individual options — not just the next one or two.
+
+Widget format:
+```
+ADR-[NNNN] written and registry updated. What would you like to do next?
+[1] Write [next-priority-adr-name] — [brief description from prerequisites list]
+[2] Write [another-priority-adr] — [brief description]  (include ALL remaining ones)
+[N] Start writing GDDs — run `/design-system [first-undesigned-system]` (only show if all prerequisite ADRs are written)
+[N+1] Stop here for this session
+```
+
+If there are no remaining priority ADRs and no undesigned GDD systems, offer only "Stop here" and suggest running `/architecture-review` in a fresh session.
+
+**Always include this fixed notice in the closing output (do NOT omit it):**
+
+> To validate ADR coverage against your GDDs, open a **fresh Claude Code session**
+> and run `/architecture-review`.
+>
+> **Never run `/architecture-review` in the same session as `/architecture-decision`.**
+> The reviewing agent must be independent of the authoring context to give an unbiased
+> assessment. Running it here would invalidate the review.
+
+Update any stories that were `Status: Blocked` pending this ADR to `Status: Ready`.
--- a/.claude/skills/architecture-review/SKILL.md
+++ b/.claude/skills/architecture-review/SKILL.md
@ -3,8 +3,7 @@ name: architecture-review
 description: "Validates completeness and consistency of the project architecture against all GDDs. Builds a traceability matrix mapping every GDD technical requirement to ADRs, identifies coverage gaps, detects cross-ADR conflicts, verifies engine compatibility consistency across all decisions, and produces a PASS/CONCERNS/FAIL verdict. The architecture equivalent of /design-review."
 argument-hint: "[focus: full | coverage | consistency | engine | single-gdd path/to/gdd.md]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Task
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
 agent: technical-director
 model: opus
 ---
@ -452,10 +451,11 @@ FAIL: Critical gaps (Foundation/Core layer requirements uncovered),

 ## Phase 8: Write and Update Traceability Index

-Ask: "May I write this review to `docs/architecture/architecture-review-[date].md`?"
-
-Also ask: "May I update `docs/architecture/architecture-traceability.md` with the
-current matrix? This is the living index that future reviews update incrementally."
+Use `AskUserQuestion` for the write approval:
+- "Review complete. What would you like to write?"
+  - [A] Write all three files (review report + traceability index + TR registry)
+  - [B] Write review report only — `docs/architecture/architecture-review-[date].md`
+  - [C] Don't write anything yet — I need to review the findings first

 ### RTM Output (rtm mode only)

@ -596,7 +596,7 @@ Engine: [name + version]

 ## Phase 9: Handoff

-After completing the review:
+After completing the review and writing approved files, present:

 1. **Immediate actions**: List the top 3 ADRs to create (highest-impact gaps first,
   Foundation layer before Feature layer)
@ -605,6 +605,12 @@ After completing the review:
 3. **Rerun trigger**: "Re-run `/architecture-review` after each new ADR is written
   to verify coverage improves"

+Then close with `AskUserQuestion`:
+- "Architecture review complete. What would you like to do next?"
+  - [A] Write a missing ADR — open a fresh session and run `/architecture-decision [system]`
+  - [B] Run `/gate-check pre-production` — if all blocking gaps are resolved
+  - [C] Stop here for this session
+
 ---

 ## Error Recovery Protocol
--- a/.claude/skills/art-bible/SKILL.md
+++ b/.claude/skills/art-bible/SKILL.md
@ -0,0 +1,214 @@
+---
+name: art-bible
+description: "Guided, section-by-section Art Bible authoring. Creates the visual identity specification that gates all asset production. Run after /brainstorm is approved and before /map-systems or any GDD authoring begins."
+argument-hint: "[--review full|lean|solo]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
+---
+
+## Phase 0: Parse Arguments and Context Check
+
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
+
+Read `design/gdd/game-concept.md`. If it does not exist, fail with:
+> "No game concept found. Run `/brainstorm` first — the art bible is authored after the game concept is approved."
+
+Extract from game-concept.md:
+- Game title (working title)
+- Core fantasy and elevator pitch
+- Game pillars (all of them)
+- **Visual Identity Anchor** section if present (from brainstorm Phase 4 art-director output)
+- Target platform (if noted)
+
+Read `design/art/art-bible.md` if it exists — this is **resume mode**. Read which sections already have real content vs. placeholders. Only work on missing sections.
+
+Read `.claude/docs/technical-preferences.md` if it exists — extract performance budgets and engine for asset standard constraints.
+
+---
+
+## Phase 1: Framing
+
+Present the session context and ask two questions before authoring anything:
+
+Use `AskUserQuestion` with two tabs:
+- Tab **"Scope"** — "Which sections need to be authored today?"
+  Options: `Full bible — all 9 sections` / `Visual identity core (sections 1–4 only)` / `Asset standards only (section 8)` / `Resume — fill in missing sections`
+- Tab **"References"** — "Do you have reference games, films, or art that define the visual direction?"
+  (Free text — let the user type specific titles. Do NOT preset options here.)
+
+If the game-concept.md has a Visual Identity Anchor section, note it:
+> "Found a visual identity anchor from brainstorm: '[anchor name] — [one-line rule]'. I'll use this as the foundation for the art bible."
+
+---
+
+## Phase 2: Visual Identity Foundation (Sections 1–4)
+
+These four sections define the core visual language. **All other sections flow from them.** Author and write each to file before moving to the next.
+
+### Section 1: Visual Identity Statement
+
+**Goal**: A one-line visual rule plus 2–3 supporting principles that resolve visual ambiguity.
+
+If a visual anchor exists from game-concept.md: present it and ask:
+- "Build directly from this anchor?"
+- "Revise it before expanding?"
+- "Start fresh with new options?"
+
+**Agent delegation (MANDATORY)**: Spawn `art-director` via Task:
+- Provide: game concept (elevator pitch, core fantasy), full pillar set, platform target, any reference games/art from Phase 1 framing, the visual anchor if it exists
+- Ask: "Draft a Visual Identity Statement for this game. Provide: (1) a one-line visual rule that could resolve any visual decision ambiguity, (2) 2–3 supporting visual principles, each with a one-sentence design test ('when X is ambiguous, this principle says choose Y'). Anchor all principles directly in the stated pillars — each principle must serve a specific pillar."
+
+Present the art-director's draft to the user. Use `AskUserQuestion`:
+- Options: `[A] Lock this in` / `[B] Revise the one-liner` / `[C] Revise a supporting principle` / `[D] Describe my own direction`
+
+Write the approved section to file immediately.
+
+### Section 2: Mood & Atmosphere
+
+**Goal**: Emotional targets by game state — specific enough for a lighting artist to work from.
+
+For each major game state (e.g., exploration, combat, victory, defeat, menus — adapt to this game's states), define:
+- Primary emotion/mood target
+- Lighting character (time of day, color temperature, contrast level)
+- Atmospheric descriptors (3–5 adjectives)
+- Energy level (frenetic / measured / contemplative / etc.)
+
+**Agent delegation**: Spawn `art-director` via Task with the Visual Identity Statement and pillar set. Ask: "Define mood and atmosphere targets for each major game state in this game. Be specific — 'dark and foreboding' is not enough. Name the exact emotional target, the lighting character (warm/cool, high/low contrast, time of day direction), and at least one visual element that carries the mood. Each game state must feel visually distinct from the others."
+
+Write the approved section to file immediately.
+
+### Section 3: Shape Language
+
+**Goal**: The geometric vocabulary that makes this game's world visually coherent and distinguishable.
+
+Cover:
+- Character silhouette philosophy (how readable at thumbnail size? Distinguishing trait per archetype?)
+- Environment geometry (angular/curved/organic/geometric — which dominates and why?)
+- UI shape grammar (does UI echo the world aesthetic, or is it a distinct HUD language?)
+- Hero shapes vs. supporting shapes (what draws the eye, what recedes?)
+
+**Agent delegation**: Spawn `art-director` via Task with Visual Identity Statement and mood targets. Ask: "Define the shape language for this game. Connect each shape principle back to the visual identity statement and a specific game pillar. Explain what these shape choices communicate to the player emotionally."
+
+Write the approved section to file immediately.
+
+### Section 4: Color System
+
+**Goal**: A complete, producible palette system that serves both aesthetic and communication needs.
+
+Cover:
+- Primary palette (5–7 colors with roles — not just hex codes, but what each color means in this world)
+- Semantic color usage (what does red communicate? Gold? Blue? White? Establish the color vocabulary)
+- Per-biome or per-area color temperature rules (if the game has distinct areas)
+- UI palette (may differ from world palette — define the divergence explicitly)
+- Colorblind safety: which semantic colors need shape/icon/sound backup
+
+**Agent delegation**: Spawn `art-director` via Task with Visual Identity Statement and mood targets. Ask: "Design the color system for this game. Every semantic color assignment must be explained — why does this color mean danger/safety/reward in this world? Identify which color pairs might fail colorblind players and specify what backup cues are needed."
+
+Write the approved section to file immediately.
+
+---
+
+## Phase 3: Production Guides (Sections 5–8)
+
+These sections translate the visual identity into concrete production rules. They should be specific enough that an outsourcing team can follow them without additional briefing.
+
+### Section 5: Character Design Direction
+
+**Agent delegation**: Spawn `art-director` via Task with sections 1–4. Ask: "Define character design direction for this game. Cover: visual archetype for the player character (if any), distinguishing feature rules per character type (how do players tell enemies/NPCs/allies apart at a glance?), expression/pose style targets (stiff/expressive/realistic/exaggerated), and LOD philosophy (how much detail is preserved at game camera distance?)."
+
+Write the approved section to file.
+
+### Section 6: Environment Design Language
+
+**Agent delegation**: Spawn `art-director` via Task with sections 1–4. Ask: "Define the environment design language for this game. Cover: architectural style and its relationship to the world's culture/history, texture philosophy (painted vs. PBR vs. stylized — why this choice for this game?), prop density rules (sparse/dense — what drives the choice per area type?), and environmental storytelling guidelines (what visual details should tell the story without text?)."
+
+Write the approved section to file.
+
+### Section 7: UI/HUD Visual Direction
+
+**Agent delegation**: Spawn in parallel:
+- **`art-director`**: Visual style for UI — diegetic vs. screen-space HUD, typography direction (font personality, weight, size hierarchy), iconography style (flat/outlined/illustrated/photorealistic), animation feel for UI elements
+- **`ux-designer`**: UX alignment check — does the visual direction support the interaction patterns this game requires? Flag any conflicts between art direction and readability/accessibility needs.
+
+Collect both. If they conflict (e.g., art-director wants elaborate diegetic UI but ux-designer flags it would reduce combat readability), surface the conflict explicitly with both positions. Do NOT silently resolve — use `AskUserQuestion` to let the user decide.
+
+Write the approved section to file.
+
+### Section 8: Asset Standards
+
+**Agent delegation**: Spawn in parallel:
+- **`art-director`**: File format preferences, naming convention direction, texture resolution tiers, LOD level expectations, export settings philosophy
+- **`technical-artist`**: Engine-specific hard constraints — poly count budgets per asset category, texture memory limits, material slot counts, importer constraints, anything from the performance budgets in `.claude/docs/technical-preferences.md`
+
+If any art preference conflicts with a technical constraint (e.g., art-director wants 4K textures but performance budget requires 2K for mobile), resolve the conflict explicitly — note both the ideal and the constrained standard, and explain the tradeoff. Ambiguity in asset standards is where production costs are born.
+
+Write the approved section to file.
+
+---
+
+## Phase 4: Reference Direction (Section 9)
+
+**Goal**: A curated reference set that is specific about what to take and what to avoid from each source.
+
+**Agent delegation**: Spawn `art-director` via Task with the completed sections 1–8. Ask: "Compile a reference direction for this game. Provide 3–5 reference sources (games, films, art styles, or specific artists). For each: name it, specify exactly what visual element to draw from it (not 'the general aesthetic' — a specific technique, color choice, or compositional rule), and specify what to explicitly avoid or diverge from (to prevent the 'trying to copy X' reading). References should be additive — no two references should be pointing in exactly the same direction."
+
+Write the approved section to file.
+
+---
+
+## Phase 5: Art Director Sign-Off
+
+**Review mode check** — apply before spawning AD-ART-BIBLE:
+- `solo` → skip. Note: "AD-ART-BIBLE skipped — Solo mode." Proceed to Phase 6.
+- `lean` → skip (not a PHASE-GATE). Note: "AD-ART-BIBLE skipped — Lean mode." Proceed to Phase 6.
+- `full` → spawn as normal.
+
+After all sections are complete (or the scoped set from Phase 1 is complete), spawn `creative-director` via Task using gate **AD-ART-BIBLE** (`.claude/docs/director-gates.md`).
+
+Pass: art bible file path, game pillars, visual identity anchor.
+
+Handle verdict per standard rules in `director-gates.md`. Record the verdict in the art bible's status header:
+`> **Art Director Sign-Off (AD-ART-BIBLE)**: APPROVED [date] / CONCERNS (accepted) [date] / REVISED [date]`
+
+---
+
+## Phase 6: Close
+
+Before presenting next steps, check project state:
+- Does `design/gdd/systems-index.md` exist? → map-systems is done, skip that option
+- Does `.claude/docs/technical-preferences.md` contain a configured engine (not `[TO BE CONFIGURED]`)? → setup-engine is done, skip that option
+- Does `design/gdd/` contain any `*.md` files? → design-system has been run, skip that option
+- Does `design/gdd/gdd-cross-review-*.md` exist? → review-all-gdds is done
+- Do GDDs exist (check above)? → include /consistency-check option
+
+Use `AskUserQuestion` for next steps. Only include options that are genuinely next based on the state check above:
+
+**Option pool — include only if not already done:**
+- `[_] Run /map-systems — decompose the concept into systems before writing GDDs` (skip if systems-index.md exists)
+- `[_] Run /setup-engine — configure the engine (asset standards may need revisiting after engine is set)` (skip if engine configured)
+- `[_] Run /design-system — start the first GDD` (skip if any GDDs exist)
+- `[_] Run /review-all-gdds — cross-GDD consistency check (required before Technical Setup gate)` (skip if gdd-cross-review-*.md exists)
+- `[_] Run /asset-spec — generate per-asset visual specs and AI generation prompts from approved GDDs` (include if GDDs exist)
+- `[_] Run /consistency-check — scan existing GDDs against the art bible for visual direction conflicts` (include if GDDs exist)
+- `[_] Run /create-architecture — author the master architecture document (next Technical Setup step)`
+- `[_] Stop here`
+
+Assign letters A, B, C… only to the options actually included. Mark the most logical pipeline-advancing option as `(recommended)`.
+
+> **Always include** `/create-architecture` and Stop here as options — these are always valid next steps once the art bible is complete.
+
+---
+
+## Collaborative Protocol
+
+Every section follows: **Question → Options → Decision → Draft (from art-director agent) → Approval → Write to file**
+
+- Never draft a section without first spawning the relevant agent(s)
+- Write each section to file immediately after approval — do not batch
+- Surface all agent disagreements to the user — never silently resolve conflicts between art-director and technical-artist
+- The art bible is a constraint document: it restricts future decisions in exchange for visual coherence. Every section should feel like it narrows the solution space productively.
--- a/.claude/skills/asset-audit/SKILL.md
+++ b/.claude/skills/asset-audit/SKILL.md
@ -4,7 +4,6 @@ description: "Audits game assets for compliance with naming conventions, file si
 argument-hint: "[category|all]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep
-context: fork
 # Read-only diagnostic skill — no specialist agent delegation needed
 ---

--- a/.claude/skills/asset-spec/SKILL.md
+++ b/.claude/skills/asset-spec/SKILL.md
@ -0,0 +1,257 @@
+---
+name: asset-spec
+description: "Generate per-asset visual specifications and AI generation prompts from GDDs, level docs, or character profiles. Produces structured spec files and updates the master asset manifest. Run after art bible and GDD/level design are approved, before production begins."
+argument-hint: "[system:<name> | level:<name> | character:<name>] [--review full|lean|solo]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
+---
+
+If no argument is provided, check whether `design/assets/asset-manifest.md` exists:
+- If it exists: read it, find the first context (system/level/character) with any asset at status "Needed" but no spec file written yet, and use `AskUserQuestion`:
+  - Prompt: "The next unspecced context is **[target]**. Generate asset specs for it?"
+  - Options: `[A] Yes — spec [target]` / `[B] Pick a different target` / `[C] Stop here`
+- If no manifest: fail with:
+  > "Usage: `/asset-spec system:<name>` — e.g., `/asset-spec system:tower-defense`
+  > Or: `/asset-spec level:iron-gate-fortress` / `/asset-spec character:frost-warden`
+  > Run after your art bible and GDDs are approved."
+
+---
+
+## Phase 0: Parse Arguments
+
+Extract:
+- **Target type**: `system`, `level`, or `character`
+- **Target name**: the name after the colon (normalize to kebab-case)
+- **Review mode**: `--review [full|lean|solo]` if present
+
+**Mode behavior:**
+- `full` (default): spawn both `art-director` and `technical-artist` in parallel
+- `lean`: spawn `art-director` only — faster, skips technical constraint pass
+- `solo`: no agent spawning — main session writes specs from art bible rules alone. Use for simple asset categories or when speed matters more than depth.
+
+---
+
+## Phase 1: Gather Context
+
+Read all source material **before** asking the user anything.
+
+### Required reads:
+- **Art bible**: Read `design/art/art-bible.md` — fail if missing:
+  > "No art bible found. Run `/art-bible` first — asset specs are anchored to the art bible's visual rules and asset standards."
+  Extract: Visual Identity Statement, Color System (semantic colors), Shape Language, Asset Standards (Section 8 — dimensions, formats, polycount budgets, texture resolution tiers).
+
+- **Technical preferences**: Read `.claude/docs/technical-preferences.md` — extract performance budgets and naming conventions.
+
+### Source doc reads (by target type):
+- **system**: Read `design/gdd/[target-name].md`. Extract the **Visual/Audio Requirements** section. If it doesn't exist or reads `[To be designed]`:
+  > "The Visual/Audio section of `design/gdd/[target-name].md` is empty. Either run `/design-system [target-name]` to complete the GDD, or describe the visual needs manually."
+  Use `AskUserQuestion`: `[A] Describe needs manually` / `[B] Stop — complete the GDD first`
+- **level**: Read `design/levels/[target-name].md`. Extract art requirements, asset list, VFX needs, and the art-director's production concept specs from Step 4.
+- **character**: Read `design/narrative/characters/[target-name].md` or search `design/narrative/` for the character profile. Extract visual description, role, and any specified distinguishing features.
+
+### Optional reads:
+- **Existing manifest**: Read `design/assets/asset-manifest.md` if it exists — extract already-specced assets for this target to avoid duplicates.
+- **Related specs**: Glob `design/assets/specs/*.md` — scan for assets that could be shared (e.g., a common UI element specced for one system might apply here too).
+
+### Present context summary:
+> **Asset Spec: [Target Type] — [Target Name]**
+> - Source doc: [path] — [N] asset types identified
+> - Art bible: found — Asset Standards at Section 8
+> - Existing specs for this target: [N already specced / none]
+> - Shared assets found in other specs: [list or "none"]
+
+---
+
+## Phase 2: Asset Identification
+
+From the source doc, extract every asset type mentioned — explicit and implied.
+
+**For systems**: look for VFX events, sprite references, UI elements, audio triggers, particle effects, icon needs, and any "visual feedback" language.
+
+**For levels**: look for unique environment props, atmospheric VFX, lighting setups, ambient audio, skybox/background, and any area-specific materials.
+
+**For characters**: look for sprite sheets (idle, walk, attack, death), portrait/avatar, VFX attached to abilities, UI representation (icon, health bar skin).
+
+Group assets into categories:
+- **Sprite / 2D Art** — character sprites, UI icons, tile sheets
+- **VFX / Particles** — hit effects, ambient particles, screen effects
+- **Environment** — props, tiles, backgrounds, skyboxes
+- **UI** — HUD elements, menu art, fonts (if custom)
+- **Audio** — SFX, music tracks, ambient loops *(note: audio specs are descriptions only — no generation prompts)*
+- **3D Assets** — meshes, materials (if applicable per engine)
+
+Present the full identified list to the user. Use `AskUserQuestion`:
+- Prompt: "I identified [N] assets across [N] categories for **[target]**. Review before speccing:"
+- Show the grouped list in conversation text first
+- Options: `[A] Proceed — spec all of these` / `[B] Remove some assets` / `[C] Add assets I didn't catch` / `[D] Adjust categories`
+
+Do NOT proceed to Phase 3 without user confirmation of the asset list.
+
+---
+
+## Phase 3: Spec Generation
+
+Spawn specialist agents based on review mode. **Issue all Task calls simultaneously — do not wait for one before starting the next.**
+
+### Full mode — spawn in parallel:
+
+**`art-director`** via Task:
+- Provide: full asset list from Phase 2, art bible Visual Identity Statement, Color System, Shape Language, the source doc's visual requirements, and any reference games/art mentioned in the art bible Section 9
+- Ask: "For each asset in this list, produce: (1) a 2–3 sentence visual description anchored to the art bible's shape language and color system — be specific enough that two different artists would produce consistent results; (2) a generation prompt ready for use with AI image tools (Midjourney/Stable Diffusion style — include style keywords, composition, color palette anchors, negative prompts); (3) which art bible rules directly govern this asset (cite by section). For audio assets, describe the sonic character instead of a generation prompt."
+
+**`technical-artist`** via Task:
+- Provide: full asset list, art bible Asset Standards (Section 8), technical-preferences.md performance budgets, engine name and version
+- Ask: "For each asset in this list, specify: (1) exact dimensions or polycount (match the art bible Asset Standards tiers — do not invent new sizes); (2) file format and export settings; (3) naming convention (from technical-preferences.md); (4) any engine-specific constraints this asset type must respect; (5) LOD requirements if applicable. Flag any asset type where the art bible's preferred standard conflicts with the engine's constraints."
+
+### Lean mode — spawn art-director only (skip technical-artist).
+
+### Solo mode — skip both. Derive specs from art bible rules alone, noting that technical constraints were not validated.
+
+**Collect both responses before Phase 4.** If any conflict exists between art-director and technical-artist (e.g., art-director specifies 4K textures but technical-artist flags the engine budget requires 512px), surface it explicitly — do NOT silently resolve.
+
+---
+
+## Phase 4: Compile and Review
+
+Combine the agent outputs into a draft spec per asset. Present all specs in conversation text using this format:
+
+```
+## ASSET-[NNN] — [Asset Name]
+
+| Field | Value |
+|-------|-------|
+| Category | [Sprite / VFX / Environment / UI / Audio / 3D] |
+| Dimensions | [e.g. 256×256px, 4-frame sprite sheet] |
+| Format | [PNG / SVG / WAV / etc.] |
+| Naming | [e.g. vfx_frost_hit_01.png] |
+| Polycount | [if 3D — e.g. <800 tris] |
+| Texture Res | [e.g. 512px — matches Art Bible §8 Tier 2] |
+
+**Visual Description:**
+[2–3 sentences. Specific enough for two artists to produce consistent results.]
+
+**Art Bible Anchors:**
+- §3 Shape Language: [relevant rule applied]
+- §4 Color System: [color role — e.g. "uses Threat Blue per semantic color rules"]
+
+**Generation Prompt:**
+[Ready-to-use prompt. Include: style keywords, composition notes, color palette anchors, lighting direction, negative prompts.]
+
+**Status:** Needed
+```
+
+After presenting all specs, use `AskUserQuestion`:
+- Prompt: "Asset specs for **[target]** — [N] assets. Review complete?"
+- Options: `[A] Approve all — write to file` / `[B] Revise a specific asset` / `[C] Regenerate with different direction`
+
+If [B]: ask which asset and what to change. Revise inline and re-present. Do NOT re-spawn agents for minor text revisions — only re-spawn if the visual direction itself needs to change.
+
+If [C]: ask what direction to change. Re-spawn the relevant agent with the updated brief.
+
+---
+
+## Phase 5: Write Spec File
+
+After approval, ask: "May I write the spec to `design/assets/specs/[target-name]-assets.md`?"
+
+Write the file with:
+
+```markdown
+# Asset Specs — [Target Type]: [Target Name]
+
+> **Source**: [path to source GDD/level/character doc]
+> **Art Bible**: design/art/art-bible.md
+> **Generated**: [date]
+> **Status**: [N] assets specced / [N] approved / [N] in production / [N] done
+
+[all asset specs in ASSET-NNN format]
+```
+
+Then update `design/assets/asset-manifest.md`. If it doesn't exist, create it:
+
+```markdown
+# Asset Manifest
+
+> Last updated: [date]
+
+## Progress Summary
+
+| Total | Needed | In Progress | Done | Approved |
+|-------|--------|-------------|------|----------|
+| [N] | [N] | [N] | [N] | [N] |
+
+## Assets by Context
+
+### [Target Type]: [Target Name]
+| Asset ID | Name | Category | Status | Spec File |
+|----------|------|----------|--------|-----------|
+| ASSET-001 | [name] | [category] | Needed | design/assets/specs/[target]-assets.md |
+```
+
+If the manifest already exists, append the new context block and update the Progress Summary counts.
+
+Ask: "May I update `design/assets/asset-manifest.md`?"
+
+---
+
+## Phase 6: Close
+
+Use `AskUserQuestion`:
+- Prompt: "Asset specs complete for **[target]**. What's next?"
+- Options:
+  - `[A] Spec another system — /asset-spec system:[next-system]`
+  - `[B] Spec a level — /asset-spec level:[level-name]`
+  - `[C] Spec a character — /asset-spec character:[character-name]`
+  - `[D] Run /asset-audit — validate delivered assets against specs`
+  - `[E] Stop here`
+
+---
+
+## Asset ID Assignment
+
+Asset IDs are assigned sequentially across the entire project — not per-context. Read the manifest before assigning IDs to find the current highest number:
+
+```
+Grep pattern="ASSET-" path="design/assets/asset-manifest.md"
+```
+
+Start new assets from `ASSET-[highest + 1]`. This ensures IDs are stable and unique across the whole project.
+
+If no manifest exists yet, start from `ASSET-001`.
+
+---
+
+## Shared Asset Protocol
+
+Before speccing an asset, check if an equivalent already exists in another context's spec:
+
+- Common UI elements (health bars, score displays) are often shared across systems
+- Generic environment props may appear in multiple levels
+- Character VFX (hit sparks, death effects) may reuse a base spec with color variants
+
+If a match is found: reference the existing ASSET-ID rather than creating a duplicate. Note the shared usage in the manifest's referenced-by column.
+
+> "ASSET-012 (Generic Hit Spark) already specced for Combat system. Reusing for Tower Defense — adding tower-defense to referenced-by."
+
+---
+
+## Error Recovery Protocol
+
+If any spawned agent returns BLOCKED or cannot complete:
+
+1. Surface immediately: "[AgentName]: BLOCKED — [reason]"
+2. In `lean` mode or if `technical-artist` blocks: proceed with art-director output only — note that technical constraints were not validated
+3. In `solo` mode or if `art-director` blocks: derive descriptions from art bible rules — flag as "Art director not consulted — verify against art bible before production"
+4. Always produce a partial spec — never discard work because one agent blocked
+
+---
+
+## Collaborative Protocol
+
+Every phase follows: **Identify → Confirm → Generate → Review → Approve → Write**
+
+- Never spec assets without first confirming the asset list with the user
+- Always anchor specs to the art bible — a spec that contradicts the art bible is wrong
+- Surface all agent disagreements — do not silently pick one
+- Write the spec file only after explicit approval
+- Update the manifest immediately after writing the spec
--- a/.claude/skills/balance-check/SKILL.md
+++ b/.claude/skills/balance-check/SKILL.md
@ -4,7 +4,6 @@ description: "Analyzes game balance data files, formulas, and configuration to i
 argument-hint: "[system-name|path-to-data-file]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep
-context: fork
 agent: economy-designer
 ---

--- a/.claude/skills/brainstorm/SKILL.md
+++ b/.claude/skills/brainstorm/SKILL.md
@ -3,15 +3,19 @@ name: brainstorm
 description: "Guided game concept ideation — from zero idea to a structured game concept document. Uses professional studio ideation techniques, player psychology frameworks, and structured creative exploration."
 argument-hint: "[genre or theme hint, or 'open'] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, WebSearch, AskUserQuestion
+allowed-tools: Read, Glob, Grep, Write, WebSearch, Task, AskUserQuestion
 ---

 When this skill is invoked:

 1. **Parse the argument** for an optional genre/theme hint (e.g., `roguelike`,
   `space survival`, `cozy farming`). If `open` or no argument, start from
-   scratch. Also extract `--review [full|lean|solo]` if present and store as
-   the review mode override for this run (see `.claude/docs/director-gates.md`).
+   scratch. Also resolve the review mode (once, store for all gate spawns this run):
+   1. If `--review [full|lean|solo]` was passed → use that
+   2. Else read `production/review-mode.txt` → use that value
+   3. Else → default to `lean`
+
+   See `.claude/docs/director-gates.md` for the full check pattern.

 2. **Check for existing concept work**:
   - Read `design/gdd/game-concept.md` if it exists (resume, don't restart)
@ -102,10 +106,24 @@ For each concept, present:
 - **Why It Could Work** (1 sentence on market/audience fit)
 - **Biggest Risk** (1 sentence on the hardest unanswered question)

-Present all three. Then use `AskUserQuestion` to capture the selection:
- **Use a single-list call — NO tabs, just `prompt` and `options`. Do not use a tabbed form here.**
- **Prompt**: "Which concept resonates with you? You can pick one, combine elements, or ask for fresh directions."
- **Options**: one option per concept (e.g., `Concept 1 — SCAR`), plus `Combine elements across concepts` and `Generate fresh directions`
+Present all three. Then use `AskUserQuestion` to capture the selection.
+
+**CRITICAL**: This MUST be a plain list call — no tabs, no form fields. Use exactly this structure:
+
+```
+AskUserQuestion(
+  prompt: "Which concept resonates with you? You can pick one, combine elements, or ask for fresh directions.",
+  options: [
+    "Concept 1 — [Title]",
+    "Concept 2 — [Title]",
+    "Concept 3 — [Title]",
+    "Combine elements across concepts",
+    "Generate fresh directions"
+  ]
+)
+```
+
+Do NOT use a `tabs` field here. The `tabs` form is for multi-field input only — using it here causes an "Invalid tool parameters" error. This is a plain `prompt` + `options` call.

 Never pressure toward a choice — let them sit with it.

@ -168,11 +186,36 @@ Then define **3+ anti-pillars** (what this game is NOT):
  be cool if..." features that don't serve the core vision
 - Frame as: "We will NOT do [thing] because it would compromise [pillar]"

-**After pillars and anti-pillars are agreed, spawn `creative-director` via Task using gate CD-PILLARS (`.claude/docs/director-gates.md`) before moving to Phase 5.**
+**Pillar confirmation**: After presenting the full pillar set, use `AskUserQuestion`:
+- Prompt: "Do these pillars feel right for your game?"
+- Options: `[A] Lock these in` / `[B] Rename or reframe one` / `[C] Swap a pillar out` / `[D] Something else`

-Pass: full pillar set with design tests, anti-pillars, core fantasy, unique hook.
+If the user selects B, C, or D, make the revision, then use `AskUserQuestion` again:
+- Prompt: "Pillars updated. Ready to lock these in?"
+- Options: `[A] Lock these in` / `[B] Revise another pillar` / `[C] Something else`

-Present the feedback to the user. If CONCERNS or REJECT, offer to revise specific pillars before moving on. If APPROVE, note the approval and continue.
+Repeat until the user selects [A] Lock these in.
+
+**Review mode check** — apply before spawning CD-PILLARS and AD-CONCEPT-VISUAL:
+- `solo` → skip both. Note: "CD-PILLARS skipped — Solo mode. AD-CONCEPT-VISUAL skipped — Solo mode." Proceed to Phase 5.
+- `lean` → skip both (not PHASE-GATEs). Note: "CD-PILLARS skipped — Lean mode. AD-CONCEPT-VISUAL skipped — Lean mode." Proceed to Phase 5.
+- `full` → spawn as normal.
+
+**After pillars and anti-pillars are agreed, spawn BOTH `creative-director` AND `art-director` via Task in parallel before moving to Phase 5. Issue both Task calls simultaneously — do not wait for one before starting the other.**
+
+- **`creative-director`** — gate **CD-PILLARS** (`.claude/docs/director-gates.md`)
+  Pass: full pillar set with design tests, anti-pillars, core fantasy, unique hook.
+
+- **`art-director`** — gate **AD-CONCEPT-VISUAL** (`.claude/docs/director-gates.md`)
+  Pass: game concept elevator pitch, full pillar set with design tests, target platform (if known), any reference games or visual touchstones the user mentioned.
+
+Collect both verdicts, then present them together using a two-tab `AskUserQuestion`:
+- Tab **"Pillars"**: present creative-director feedback. Options mirror the standard CD-PILLARS handling — `Lock in as-is` / `Revise [specific pillar]` / `Discuss further`.
+- Tab **"Visual anchor"**: present the art-director's 2-3 named visual direction options. Options: each named direction (one per option) + `Combine elements across directions` + `Describe my own direction`.
+
+The user's selected visual anchor (the named direction or their custom description) is stored as the **Visual Identity Anchor** — it will be written into the game-concept document and becomes the foundation of the art bible.
+
+If the creative-director returns CONCERNS or REJECT on pillars, resolve pillar issues before asking for the visual anchor selection — visual direction should flow from confirmed pillars.

 ---

@ -211,12 +254,22 @@ Ground the concept in reality:
 - **Biggest risks**: Technical risks, design risks, market risks
 - **Scope tiers**: What's the full vision vs. what ships if time runs out?

+**Review mode check** — apply before spawning TD-FEASIBILITY:
+- `solo` → skip. Note: "TD-FEASIBILITY skipped — Solo mode." Proceed directly to scope tier definition.
+- `lean` → skip (not a PHASE-GATE). Note: "TD-FEASIBILITY skipped — Lean mode." Proceed directly to scope tier definition.
+- `full` → spawn as normal.
+
 **After identifying biggest technical risks, spawn `technical-director` via Task using gate TD-FEASIBILITY (`.claude/docs/director-gates.md`) before scope tiers are defined.**

 Pass: core loop description, platform target, engine choice (or "undecided"), list of identified technical risks.

 Present the assessment to the user. If HIGH RISK, offer to revisit scope before finalising. If CONCERNS, note them and continue.

+**Review mode check** — apply before spawning PR-SCOPE:
+- `solo` → skip. Note: "PR-SCOPE skipped — Solo mode." Proceed to document generation.
+- `lean` → skip (not a PHASE-GATE). Note: "PR-SCOPE skipped — Lean mode." Proceed to document generation.
+- `full` → spawn as normal.
+
 **After scope tiers are defined, spawn `producer` via Task using gate PR-SCOPE (`.claude/docs/director-gates.md`).**

 Pass: full vision scope, MVP definition, timeline estimate, team size.
@ -230,35 +283,56 @@ Present the assessment to the user. If UNREALISTIC, offer to adjust the MVP defi
   brainstorm conversation, including the MDA analysis, player motivation
   profile, and flow state design sections.

-5. Ask: "May I write the game concept document to `design/gdd/game-concept.md`?"
+   **Include a Visual Identity Anchor section** in the game concept document with:
+   - The selected visual direction name
+   - The one-line visual rule
+   - The 2-3 supporting visual principles with their design tests
+   - The color philosophy summary

-If yes, generate the document using the template at `.claude/docs/templates/game-concept.md`, fill in ALL sections from the brainstorm conversation, and write the file, creating directories as needed.
+   This section is the seed of the art bible — it captures the "everything must
+   move" decision before it can be forgotten between sessions.

-If no:
- If the user already named a section to change, revise it directly — do not ask again which section.
- If the user said no without specifying what to change, use `AskUserQuestion` — "Which section would you like to revise?"
-  Options: `Elevator Pitch` / `Core Fantasy & Unique Hook` / `Pillars` / `Core Loop` / `MVP Definition` / `Scope Tiers` / `Risks` / `Something else — I'll describe`
+5. Use `AskUserQuestion` for write approval:
+- Prompt: "Game concept is ready. May I write it to `design/gdd/game-concept.md`?"
+- Options: `[A] Yes — write it` / `[B] Not yet — revise a section first`
+
+If [B]: ask which section to revise using `AskUserQuestion` with options: `Elevator Pitch` / `Core Fantasy & Unique Hook` / `Pillars` / `Core Loop` / `MVP Definition` / `Scope Tiers` / `Risks` / `Something else — I'll describe`

 After revising, show the updated section as a diff or clear before/after, then use `AskUserQuestion` — "Ready to write the updated concept document?"
-Options: `Yes — write it` / `Revise another section`
-Repeat until the user approves the write.
+Options: `[A] Yes — write it` / `[B] Revise another section`
+Repeat until the user selects [A].
+
+If yes, generate the document using the template at `.claude/docs/templates/game-concept.md`, fill in ALL sections from the brainstorm conversation, and write the file, creating directories as needed.

 **Scope consistency rule**: The "Estimated Scope" field in the Core Identity table must match the full-vision timeline from the Scope Tiers section — not just say "Large (9+ months)". Write it as "Large (X–Y months, solo)" or "Large (X–Y months, team of N)" so the summary table is accurate.

 6. **Suggest next steps** (in this order — this is the professional studio
   pre-production pipeline). List ALL steps — do not abbreviate or truncate:
   1. "Run `/setup-engine` to configure the engine and populate version-aware reference docs"
-   2. "Use `/design-review design/gdd/game-concept.md` to validate concept completeness before going downstream"
-   3. "Discuss vision with the `creative-director` agent for pillar refinement"
-   4. "Decompose the concept into individual systems with `/map-systems` — maps dependencies, assigns priorities, and creates the systems index"
+   2. "Run `/art-bible` to create the visual identity specification — do this BEFORE writing GDDs. The art bible gates asset production and shapes technical architecture decisions (rendering, VFX, UI systems)."
+   3. "Use `/design-review design/gdd/game-concept.md` to validate concept completeness before going downstream"
+   4. "Discuss vision with the `creative-director` agent for pillar refinement"
+   5. "Decompose the concept into individual systems with `/map-systems` — maps dependencies, assigns priorities, and creates the systems index"
   5. "Author per-system GDDs with `/design-system` — guided, section-by-section GDD writing for each system identified in step 4"
-   6. "Plan the technical architecture with `/create-architecture` — defines how all systems fit together and connect"
-   7. "Validate readiness to advance with `/gate-check` — phase gate before committing to production"
-   8. "Prototype the riskiest system with `/prototype [core-mechanic]` — validate the core loop before full implementation"
-   9. "Run `/playtest-report` after the prototype to validate the core hypothesis"
-   10. "If validated, plan the first sprint with `/sprint-plan new`"
+   6. "Plan the technical architecture with `/create-architecture` — produces the master architecture blueprint and Required ADR list"
+   7. "Record key architectural decisions with `/architecture-decision (×N)` — write one ADR per decision in the Required ADR list from `/create-architecture`"
+   8. "Validate readiness to advance with `/gate-check` — phase gate before committing to production"
+   9. "Prototype the riskiest system with `/prototype [core-mechanic]` — validate the core loop before full implementation"
+   10. "Run `/playtest-report` after the prototype to validate the core hypothesis"
+   11. "If validated, plan the first sprint with `/sprint-plan new`"

 7. **Output a summary** with the chosen concept's elevator pitch, pillars,
   primary player type, engine recommendation, biggest risk, and file path.

 Verdict: **COMPLETE** — game concept created and handed off for next steps.
+
+---
+
+## Context Window Awareness
+
+This is a multi-phase skill. If context reaches or exceeds 70% during any phase,
+append this notice to the current response before continuing:
+
+> **Context is approaching the limit (≥70%).** The game concept document is saved
+> to `design/gdd/game-concept.md`. Open a fresh Claude Code session to continue
+> if needed — progress is not lost.
--- a/.claude/skills/bug-report/SKILL.md
+++ b/.claude/skills/bug-report/SKILL.md
@ -10,8 +10,10 @@ allowed-tools: Read, Glob, Grep, Write

 Determine the mode from the argument:

- No `analyze` keyword → **Description Mode**: generate a structured bug report from the provided description
+- No keyword → **Description Mode**: generate a structured bug report from the provided description
 - `analyze [path]` → **Analyze Mode**: read the target file(s) and identify potential bugs
+- `verify [BUG-ID]` → **Verify Mode**: confirm a reported fix actually resolved the bug
+- `close [BUG-ID]` → **Close Mode**: mark a verified bug as closed with resolution record

 If no argument is provided, ask the user for a bug description before proceeding.

@ -87,6 +89,51 @@ If no argument is provided, ask the user for a bug description before proceeding

 ---

+## Phase 2C: Verify Mode
+
+Read `production/qa/bugs/[BUG-ID].md`. Extract the reproduction steps and expected result.
+
+1. **Re-run reproduction steps** — use Grep/Glob to check whether the root cause code path still exists as described. If the fix removed or changed it, note the change.
+2. **Run the related test** — if the bug's system has a test file in `tests/`, run it via Bash and report pass/fail.
+3. **Check for regression** — grep the codebase for any new occurrence of the pattern that caused the bug.
+
+Produce a verification verdict:
+
+- **VERIFIED FIXED** — reproduction steps no longer produce the bug; related tests pass
+- **STILL PRESENT** — bug reproduces as described; fix did not resolve the issue
+- **CANNOT VERIFY** — automated checks inconclusive; manual playtest required
+
+Ask: "May I update `production/qa/bugs/[BUG-ID].md` to set Status: Verified Fixed / Still Present / Cannot Verify?"
+
+If STILL PRESENT: reopen the bug, set Status back to Open, and suggest re-running `/hotfix [BUG-ID]`.
+
+---
+
+## Phase 2D: Close Mode
+
+Read `production/qa/bugs/[BUG-ID].md`. Confirm Status is `Verified Fixed` before closing. If status is anything else, stop: "Bug [ID] must be Verified Fixed before it can be closed. Run `/bug-report verify [BUG-ID]` first."
+
+Append a closure record to the bug file:
+
+```markdown
+## Closure Record
+**Closed**: [date]
+**Resolution**: Fixed — [one-line description of what was changed]
+**Fix commit / PR**: [if known]
+**Verified by**: qa-tester
+**Closed by**: [user]
+**Regression test**: [test file path, or "Manual verification"]
+**Status**: Closed
+```
+
+Update the top-level `**Status**: Open` field to `**Status**: Closed`.
+
+Ask: "May I update `production/qa/bugs/[BUG-ID].md` to mark it Closed?"
+
+After closing, check `production/qa/bug-triage-*.md` — if the bug appears in an open triage report, note: "Bug [ID] is referenced in the triage report. Run `/bug-triage` to refresh the open bug count."
+
+---
+
 ## Phase 3: Save Report

 Present the completed bug report(s) to the user.
@ -101,7 +148,16 @@ If no, stop here. Verdict: **BLOCKED** — user declined write.

 ## Phase 4: Next Steps

-After saving, suggest:
+After saving, suggest based on mode:

- Run `/bug-triage` to prioritize this bug alongside existing open bugs.
- If S1 or S2 severity, consider `/hotfix` for an emergency fix workflow.
+**After filing (Description/Analyze mode):**
+- Run `/bug-triage` to prioritize alongside existing open bugs
+- If S1 or S2: run `/hotfix [BUG-ID]` for emergency fix workflow
+
+**After fixing the bug (developer confirms fix is in):**
+- Run `/bug-report verify [BUG-ID]` — confirm the fix actually works before closing
+- Never mark a bug closed without verification — a fix that doesn't verify is still Open
+
+**After verify returns VERIFIED FIXED:**
+- Run `/bug-report close [BUG-ID]` — write the closure record and update status
+- Run `/bug-triage` to refresh the open bug count and remove it from the active list
--- a/.claude/skills/bug-triage/SKILL.md
+++ b/.claude/skills/bug-triage/SKILL.md
@ -4,7 +4,6 @@ description: "Read all open bugs in production/qa/bugs/, re-evaluate priority vs
 argument-hint: "[sprint | full | trend]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit
-context: fork
 ---

 # Bug Triage
--- a/.claude/skills/code-review/SKILL.md
+++ b/.claude/skills/code-review/SKILL.md
@ -4,7 +4,6 @@ description: "Performs an architectural and quality code review on a specified f
 argument-hint: "[path-to-file-or-directory]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Bash, Task
-context: fork
 agent: lead-programmer
 ---

@ -82,9 +81,13 @@ Identify the system category (engine, gameplay, AI, networking, UI, tools) and e

 ---

-## Phase 7: Engine Specialist Review
+## Phase 7: Specialist Reviews (Parallel)

-If an engine is configured, spawn engine specialists via Task in parallel with the review above. Determine which specialist applies to each file:
+Spawn all applicable specialists simultaneously via Task — do not wait for one before starting the next.
+
+### Engine Specialists
+
+If an engine is configured, determine which specialist applies to each file and spawn in parallel:

 - Primary language files (`.gd`, `.cs`, `.cpp`) → Language/Code Specialist
 - Shader files (`.gdshader`, `.hlsl`, shader graph) → Shader Specialist
@ -93,7 +96,23 @@ If an engine is configured, spawn engine specialists via Task in parallel with t

 Also spawn the **Primary Specialist** for any file touching engine architecture (scene structure, node hierarchy, lifecycle hooks).

-Collect findings and include them under `### Engine Specialist Findings`.
+### QA Testability Review
+
+For Logic and Integration stories, also spawn `qa-tester` via Task in parallel with the engine specialists. Pass:
+- The implementation files being reviewed
+- The story's `## QA Test Cases` section (the pre-written test specs from qa-lead)
+- The story's `## Acceptance Criteria`
+
+Ask the qa-tester to evaluate:
+- [ ] Are all test hooks and interfaces exposed (not hidden behind private/internal access)?
+- [ ] Do the QA test cases from the story's `## QA Test Cases` section map to testable code paths?
+- [ ] Are any acceptance criteria untestable as implemented (e.g., hardcoded values, no seam for injection)?
+- [ ] Does the implementation introduce any new edge cases not covered by the existing QA test cases?
+- [ ] Are there any observable side effects that should have a test but don't?
+
+For Visual/Feel and UI stories: qa-tester reviews whether the manual verification steps in `## QA Test Cases` are achievable with the implementation as written — e.g., "is the state the manual checker needs to reach actually reachable?"
+
+Collect all specialist findings before producing output.

 ---

@ -105,6 +124,10 @@ Collect findings and include them under `### Engine Specialist Findings`.
 ### Engine Specialist Findings: [N/A — no engine configured / CLEAN / ISSUES FOUND]
 [Findings from engine specialist(s), or "No engine configured." if skipped]

+### Testability: [N/A — Visual/Feel or Config story / TESTABLE / GAPS / BLOCKING]
+[qa-tester findings: test hooks, coverage gaps, untestable paths, new edge cases]
+[If BLOCKING: implementation must expose [X] before tests in ## QA Test Cases can run]
+
 ### ADR Compliance: [NO ADRS FOUND / COMPLIANT / DRIFT / VIOLATION]
 [List each ADR checked, result, and any deviations with severity]

--- a/.claude/skills/consistency-check/SKILL.md
+++ b/.claude/skills/consistency-check/SKILL.md
@ -4,7 +4,6 @@ description: "Scan all GDDs against the entity registry to detect cross-document
 argument-hint: "[full | since-last-review | entity:<name> | item:<name>]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit, Bash
-context: fork
 ---

 # Consistency Check
--- a/.claude/skills/content-audit/SKILL.md
+++ b/.claude/skills/content-audit/SKILL.md
@ -4,7 +4,6 @@ description: "Audit GDD-specified content counts against implemented content. Id
 argument-hint: "[system-name | --summary | (no arg = full audit)]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write
-context: fork
 agent: producer
 ---

--- a/.claude/skills/create-architecture/SKILL.md
+++ b/.claude/skills/create-architecture/SKILL.md
@ -3,8 +3,7 @@ name: create-architecture
 description: "Guided, section-by-section authoring of the master architecture document for the game. Reads all GDDs, the systems index, existing ADRs, and the engine reference library to produce a complete architecture blueprint before any code is written. Engine-version-aware: flags knowledge gaps and validates decisions against the pinned engine version."
 argument-hint: "[focus-area: full | layers | data-flow | api-boundaries | adr-audit] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Bash
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Bash, AskUserQuestion, Task
 agent: technical-director
 ---

@ -17,8 +16,12 @@ It sits between design and implementation, and must exist before sprint planning
 **Distinct from `/architecture-decision`**: ADRs record individual point decisions.
 This skill creates the whole-system blueprint that gives ADRs their context.

-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 **Argument modes:**
 - **No argument / `full`**: Full guided walkthrough — all sections, start to finish
@ -336,6 +339,11 @@ After writing the master architecture document, perform an explicit sign-off bef

 Apply gate **TD-ARCHITECTURE** (`.claude/docs/director-gates.md`) as a self-review. Check all four criteria from that gate definition against the completed document.

+**Review mode check** — apply before spawning LP-FEASIBILITY:
+- `solo` → skip. Note: "LP-FEASIBILITY skipped — Solo mode." Proceed to Phase 8 handoff.
+- `lean` → skip (not a PHASE-GATE). Note: "LP-FEASIBILITY skipped — Lean mode." Proceed to Phase 8 handoff.
+- `full` → spawn as normal.
+
 **Step 2 — Spawn `lead-programmer` via Task using gate LP-FEASIBILITY (`.claude/docs/director-gates.md`):**

 Pass: architecture document path, technical requirements baseline summary, ADR list.
--- a/.claude/skills/create-control-manifest/SKILL.md
+++ b/.claude/skills/create-control-manifest/SKILL.md
@ -4,7 +4,6 @@ description: "After architecture is complete, produces a flat actionable rules s
 argument-hint: "[update — regenerate from current ADRs]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write
-context: fork
 agent: technical-director
 ---

--- a/.claude/skills/create-epics/SKILL.md
+++ b/.claude/skills/create-epics/SKILL.md
@ -3,8 +3,7 @@ name: create-epics
 description: "Translate approved GDDs + architecture into epics — one epic per architectural module. Defines scope, governing ADRs, engine risk, and untraced requirements. Does NOT break into stories — run /create-stories [epic-slug] after each epic is created."
 argument-hint: "[system-name | layer: foundation|core|feature|presentation | all] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
 agent: technical-director
 ---

@ -28,8 +27,12 @@ will have changed.

 ## 1. Parse Arguments

-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 **Modes:**
 - `/create-epics all` — process all systems in layer order
@ -55,14 +58,16 @@ Grep pattern="## Summary" glob="design/gdd/*.md" output_mode="content" -A 5
 For `layer:` or `[system-name]` modes: filter to only in-scope GDDs based on
 the Summary quick-reference. Skip full-reading anything out of scope.

-### Step 2b — Full document load
+### Step 2b — Full document load (in-scope systems only)
+
+Using the Step 2a grep results, identify which systems are in scope. Read full documents **only for in-scope systems** — do not read GDDs or ADRs for out-of-scope systems or layers.

 Read for in-scope systems:

 - `design/gdd/systems-index.md` — authoritative system list, layers, priority
- In-scope GDDs (Approved or Designed status)
+- In-scope GDDs only (Approved or Designed status, filtered by Step 2a results)
 - `docs/architecture/architecture.md` — module ownership and API boundaries
- All Accepted ADRs — read the "GDD Requirements Addressed", "Decision", and "Engine Compatibility" sections
+- Accepted ADRs **whose domains cover in-scope systems only** — read the "GDD Requirements Addressed", "Decision", and "Engine Compatibility" sections; skip ADRs for unrelated domains
 - `docs/architecture/control-manifest.md` — manifest version date from header
 - `docs/architecture/tr-registry.yaml` — for tracing requirements to ADR coverage
 - `docs/engine-reference/[engine]/VERSION.md` — engine name, version, risk levels
@ -117,6 +122,11 @@ Options: "Yes, create it", "Skip", "Pause — I need to write ADRs first"

 ## 4b. Producer Epic Structure Gate

+**Review mode check** — apply before spawning PR-EPIC:
+- `solo` → skip. Note: "PR-EPIC skipped — Solo mode." Proceed to Step 5 (write epic files).
+- `lean` → skip (not a PHASE-GATE). Note: "PR-EPIC skipped — Lean mode." Proceed to Step 5 (write epic files).
+- `full` → spawn as normal.
+
 After all epics for the current layer are defined (Step 4 completed for all in-scope systems), and before writing any files, spawn `producer` via Task using gate **PR-EPIC** (`.claude/docs/director-gates.md`).

 Pass: the full epic structure summary (all epics, their scope summaries, governing ADR counts), the layer being processed, milestone timeline and team capacity.
--- a/.claude/skills/create-stories/SKILL.md
+++ b/.claude/skills/create-stories/SKILL.md
@ -3,8 +3,7 @@ name: create-stories
 description: "Break a single epic into implementable story files. Reads the epic, its GDD, governing ADRs, and control manifest. Each story embeds its GDD requirement TR-ID, ADR guidance, acceptance criteria, story type, and test evidence path. Run after /create-epics for each epic."
 argument-hint: "[epic-slug | epic-path] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
 agent: lead-programmer
 ---

@ -28,7 +27,10 @@ then Core, and so on — matching the dependency order.
 ## 1. Parse Argument

 Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+override for this run. If not provided, read `production/review-mode.txt`
+(default `full` if missing). This resolved mode applies to all gate spawns
+in this skill — apply the check pattern from `.claude/docs/director-gates.md`
+before every gate invocation.

 - `/create-stories [epic-slug]` — e.g. `/create-stories combat`
 - `/create-stories production/epics/combat/EPIC.md` — full path also accepted
@ -47,7 +49,15 @@ Read in full:
 - `docs/architecture/control-manifest.md` — extract rules for this epic's layer; note the Manifest Version date from the header
 - `docs/architecture/tr-registry.yaml` — load all TR-IDs for this system

-Report: "Loaded epic [name], GDD [filename], [N] governing ADRs, control manifest v[date]."
+**ADR existence validation**: After reading the governing ADRs list from the epic, confirm each ADR file exists on disk. If any ADR file cannot be found, **stop immediately** before decomposing any story:
+
+> "Epic references [ADR-NNNN: title] but `docs/architecture/[adr-file].md` was not found.
+> Check the filename in the epic's Governing ADRs list, or run `/architecture-decision`
+> to create it. Cannot create stories until all referenced ADR files are present."
+
+Do not proceed to Step 3 until all referenced ADR files are confirmed present.
+
+Report: "Loaded epic [name], GDD [filename], [N] governing ADRs (all confirmed present), control manifest v[date]."

 ---

@ -92,11 +102,36 @@ For each story, determine:

 ## 4b. QA Lead Story Readiness Gate

+**Review mode check** — apply before spawning QL-STORY-READY:
+- `solo` → skip. Note: "QL-STORY-READY skipped — Solo mode." Proceed to Step 5 (present stories for review).
+- `lean` → skip (not a PHASE-GATE). Note: "QL-STORY-READY skipped — Lean mode." Proceed to Step 5 (present stories for review).
+- `full` → spawn as normal.
+
 After decomposing all stories (Step 4 complete) but before presenting them for write approval, spawn `qa-lead` via Task using gate **QL-STORY-READY** (`.claude/docs/director-gates.md`).

 Pass: the full story list with acceptance criteria, story types, and TR-IDs; the epic's GDD acceptance criteria for reference.

-Present the QA lead's assessment. For each story flagged as GAPS or INADEQUATE, revise the acceptance criteria before proceeding — stories with untestable criteria cannot be implemented correctly. Once all stories reach ADEQUATE, proceed to Step 5.
+Present the QA lead's assessment. For each story flagged as GAPS or INADEQUATE, revise the acceptance criteria before proceeding — stories with untestable criteria cannot be implemented correctly. Once all stories reach ADEQUATE, proceed.
+
+**After ADEQUATE**: for every Logic and Integration story, ask the qa-lead to produce concrete test case specifications — one per acceptance criterion — in this format:
+
+```
+Test: [criterion text]
+  Given: [precondition]
+  When: [action]
+  Then: [expected result / assertion]
+  Edge cases: [boundary values or failure states to test]
+```
+
+For Visual/Feel and UI stories, produce manual verification steps instead:
+```
+Manual check: [criterion text]
+  Setup: [how to reach the state]
+  Verify: [what to look for]
+  Pass condition: [unambiguous pass description]
+```
+
+These test case specs are embedded directly into each story's `## QA Test Cases` section. The developer implements against these cases. The programmer does not write tests from scratch — QA has already defined what "done" looks like.

 ---

@ -122,7 +157,9 @@ Story 003: [title] — Visual/Feel — ADR-NNNN
 [N stories total: N Logic, N Integration, N Visual/Feel, N UI, N Config/Data]
 ```

-Ask: "May I write these [N] stories to `production/epics/[epic-slug]/`?"
+Use `AskUserQuestion`:
+- Prompt: "May I write these [N] stories to `production/epics/[epic-slug]/`?"
+- Options: `[A] Yes — write all [N] stories` / `[B] Not yet — I want to review or adjust first`

 ---

@ -185,6 +222,27 @@ change meaning. This is what the programmer reads instead of the ADR.]

 ---

+## QA Test Cases
+
+*Written by qa-lead at story creation. The developer implements against these — do not invent new test cases during implementation.*
+
+**[For Logic / Integration stories — automated test specs]:**
+
+- **AC-1**: [criterion text]
+  - Given: [precondition]
+  - When: [action]
+  - Then: [assertion]
+  - Edge cases: [boundary values / failure states]
+
+**[For Visual/Feel / UI stories — manual verification steps]:**
+
+- **AC-1**: [criterion text]
+  - Setup: [how to reach the state]
+  - Verify: [what to look for]
+  - Pass condition: [unambiguous pass description]
+
+---
+
 ## Test Evidence

 **Story Type**: [type]
@ -222,18 +280,21 @@ Replace the "Stories: Not yet created" line with a populated table:

 ## 7. After Writing

-Tell the user:
+Use `AskUserQuestion` to close with context-aware next steps:

-"[N] stories written to `production/epics/[epic-slug]/`.
+Check:
+- Are there other epics in `production/epics/` without stories yet? List them.
+- Is this the last epic? If so, include `/sprint-plan` as an option.

-To start implementation:
-1. Run `/story-readiness [story-path]` to confirm the first story is ready
-2. Run `/dev-story [story-path]` to implement it
-3. Run `/code-review [changed files]` after implementation
-4. Run `/story-done [story-path]` to close it
+Widget:
+- Prompt: "[N] stories written to `production/epics/[epic-slug]/`. What next?"
+- Options (include all that apply):
+  - `[A] Start implementing — run /story-readiness [first-story-path]` (Recommended)
+  - `[B] Create stories for [next-epic-slug] — run /create-stories [slug]` (only if other epics have no stories yet)
+  - `[C] Plan the sprint — run /sprint-plan` (only if all epics have stories)
+  - `[D] Stop here for this session`

-Work through stories in order — each story's `Depends on:` field tells you
-what must be DONE before you can start it."
+Note in output: "Work through stories in order — each story's `Depends on:` field tells you what must be DONE before you can start it."

 ---

--- a/.claude/skills/day-one-patch/SKILL.md
+++ b/.claude/skills/day-one-patch/SKILL.md
@ -0,0 +1,218 @@
+---
+name: day-one-patch
+description: "Prepare a day-one patch for a game launch. Scopes, prioritises, implements, and QA-gates a focused patch addressing known issues discovered after gold master but before or immediately after public launch. Treats the patch as a mini-sprint with its own QA gate and rollback plan."
+argument-hint: "[scope: known-bugs | cert-feedback | all]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Edit, Bash, Task, AskUserQuestion
+---
+
+# Day-One Patch
+
+Every shipped game has a day-one patch. Planning it before launch day prevents
+chaos. This skill scopes the patch to only what is safe and necessary, gates it
+through a lightweight QA pass, and ensures a rollback plan exists before anything
+ships. It is a mini-sprint — not a hotfix, not a full sprint.
+
+**When to run:**
+- After the gold master build is locked (cert approved or launch candidate tagged)
+- When known bugs exist that are too risky to address in the gold master
+- When cert feedback requires minor fixes post-submission
+- When a pre-launch playtest surfaces must-fix issues after the release gate passed
+
+**Day-one patch scope rules:**
+- Only P1/P2 bugs that are SAFE to fix quickly
+- No new features — this is fix-only
+- No refactoring — minimum viable change
+- Any fix that requires more than 4 hours of dev time belongs in patch 1.1, not day-one
+
+**Output:** `production/releases/day-one-patch-[version].md`
+
+---
+
+## Phase 1: Load Release Context
+
+Read:
+- `production/stage.txt` — confirm project is in Release stage
+- The most recent file in `production/gate-checks/` — read the release gate verdict
+- `production/qa/bugs/*.md` — load all bugs with Status: Open or Fixed — Pending Verification
+- `production/sprints/` most recent — understand what shipped
+- `production/security/security-audit-*.md` most recent — check for any open security items
+
+If `production/stage.txt` is not `Release` or `Polish`:
+> "Day-one patch prep is for Release-stage projects. Current stage: [stage]. This skill is not appropriate until you are approaching launch."
+
+---
+
+## Phase 2: Scope the Patch
+
+### Step 2a — Classify open bugs for patch inclusion
+
+For each open bug, evaluate:
+
+| Criterion | Include in day-one? |
+|-----------|-------------------|
+| S1 or S2 severity | Yes — must include if safe to fix |
+| P1 priority | Yes |
+| Fix estimated < 4 hours | Yes |
+| Fix requires architecture change | No — defer to 1.1 |
+| Fix introduces new code paths | No — too risky |
+| Fix is data/config only (no code change) | Yes — very low risk |
+| Cert feedback requirement | Yes — required for platform approval |
+| S3/S4 severity | Only if trivial config fix; otherwise defer |
+
+### Step 2b — Present patch scope to user
+
+Use `AskUserQuestion`:
+- Prompt: "Based on open bugs and cert feedback, here is the proposed day-one patch scope. Does this look right?"
+- Show: table of included bugs (ID, severity, description, estimated effort)
+- Show: table of deferred bugs (ID, severity, reason deferred)
+- Options: `[A] Approve this scope` / `[B] Adjust — I want to add or remove items` / `[C] No day-one patch needed`
+
+If [C]: output "No day-one patch required. Proceed to `/launch-checklist`." Stop.
+
+### Step 2c — Check total scope
+
+Sum estimated effort. If total exceeds 1 day of work:
+> "⚠️ Patch scope is [N hours] — this exceeds a safe day-one window. Consider deferring lower-priority items to patch 1.1. A bloated day-one patch introduces more risk than it removes."
+
+Use `AskUserQuestion` to confirm proceeding or reduce scope.
+
+---
+
+## Phase 3: Rollback Plan
+
+Before any code is written, define the rollback procedure. This is non-negotiable.
+
+Spawn `release-manager` via Task. Ask them to produce a rollback plan covering:
+- How to revert to the gold master build on each target platform
+- Platform-specific rollback constraints (some platforms cannot roll back cert builds)
+- Who is responsible for triggering the rollback
+- What player communication is required if a rollback occurs
+
+Present the rollback plan. Ask: "May I write this rollback plan to `production/releases/rollback-plan-[version].md`?"
+
+Do not proceed to Phase 4 until the rollback plan is written.
+
+---
+
+## Phase 4: Implement Fixes
+
+For each bug in the approved scope, spawn a focused implementation loop:
+
+1. Spawn `lead-programmer` via Task with:
+   - The bug report (exact reproduction steps and root cause if known)
+   - The constraint: minimum viable fix only, no cleanup
+   - The affected files (from bug report Technical Context section)
+
+2. The lead-programmer implements and runs targeted tests.
+
+3. Spawn `qa-tester` via Task to verify: does the bug reproduce after the fix?
+
+For config/data-only fixes: make the change directly (no programmer agent needed). Confirm the value changed and re-run any relevant smoke test.
+
+---
+
+## Phase 5: Patch QA Gate
+
+This is a lightweight QA pass — not a full `/team-qa`. The patch is already QA-approved from the release gate; we are only re-verifying the changed areas.
+
+Spawn `qa-lead` via Task with:
+- List of all changed files
+- List of bugs fixed (with verification status from Phase 4)
+- The smoke check scope for the affected systems
+
+Ask qa-lead to determine: **Is a targeted smoke check sufficient, or do any fixes touch systems that require a broader regression?**
+
+Run the required QA scope:
+- **Targeted smoke check** — run `/smoke-check [affected-systems]`
+- **Broader regression** — run targeted tests in `tests/unit/` and `tests/integration/` for affected systems
+
+QA verdict must be PASS or PASS WITH WARNINGS before proceeding. If FAIL: scope the failing fix out of the day-one patch and defer to 1.1.
+
+---
+
+## Phase 6: Generate Patch Record
+
+```markdown
+# Day-One Patch: [Game Name] v[version]
+
+**Date prepared**: [date]
+**Target release**: [launch date or "day of launch"]
+**Base build**: [gold master tag or commit]
+**Patch build**: [patch tag or commit]
+
+---
+
+## Patch Notes (Internal)
+
+### Bugs Fixed
+| BUG-ID | Severity | Description | Fix summary |
+|--------|----------|-------------|-------------|
+| BUG-NNN | S[1-4] | [description] | [one-line fix] |
+
+### Deferred to 1.1
+| BUG-ID | Severity | Description | Reason deferred |
+|--------|----------|-------------|-----------------|
+| BUG-NNN | S[1-4] | [description] | [reason] |
+
+---
+
+## QA Sign-Off
+
+**QA scope**: [Targeted smoke / Broader regression]
+**Verdict**: [PASS / PASS WITH WARNINGS]
+**QA lead**: qa-lead agent
+**Date**: [date]
+**Warnings (if any)**: [list or "None"]
+
+---
+
+## Rollback Plan
+
+See: `production/releases/rollback-plan-[version].md`
+
+**Trigger condition**: If [N] or more S1 bugs are reported within [X] hours of launch, execute rollback.
+**Rollback owner**: [user / producer]
+
+---
+
+## Approvals Required Before Deploy
+
+- [ ] lead-programmer: all fixes reviewed
+- [ ] qa-lead: QA gate PASS confirmed
+- [ ] producer: deployment timing approved
+- [ ] release-manager: platform submission confirmed
+
+---
+
+## Player-Facing Patch Notes
+
+[Draft for community-manager to review before publishing]
+
+[list player-facing changes in plain language]
+```
+
+Ask: "May I write this patch record to `production/releases/day-one-patch-[version].md`?"
+
+---
+
+## Phase 7: Next Steps
+
+After the patch record is written:
+
+1. Run `/patch-notes` to generate the player-facing version of the patch notes
+2. Run `/bug-report verify [BUG-ID]` for each fixed bug after the patch is live
+3. Run `/bug-report close [BUG-ID]` for each verified fix
+4. Schedule a post-launch review 48–72 hours after launch using `/retrospective launch`
+
+**If any S1 bugs remain open after the patch:**
+> "⚠️ S1 bugs remain open and were not patched. These are accepted risks. Document them in the rollback plan trigger conditions — if they occur at scale, rollback may be preferable to a follow-up patch."
+
+---
+
+## Collaborative Protocol
+
+- **Scope discipline is everything** — resist scope creep; every addition increases risk
+- **Rollback plan first, always** — a patch without a rollback plan is irresponsible
+- **Deferred is not forgotten** — every deferred bug gets a 1.1 ticket automatically
+- **Player communication is part of the patch** — `/patch-notes` is a required output, not optional
--- a/.claude/skills/design-review/SKILL.md
+++ b/.claude/skills/design-review/SKILL.md
@ -1,17 +1,33 @@
 ---
 name: design-review
 description: "Reviews a game design document for completeness, internal consistency, implementability, and adherence to project design standards. Run this before handing a design document to programmers."
-argument-hint: "[path-to-design-doc]"
+argument-hint: "[path-to-design-doc] [--depth full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep
-context: fork
-# Read-only diagnostic skill — no specialist agent delegation needed
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
+---
+
+## Phase 0: Parse Arguments
+
+Extract `--depth [full|lean|solo]` if present. Default is `full` when no flag is given.
+
+**Note**: `--depth` controls the *analysis depth* of this skill (how many specialist agents are spawned). It is independent of the global review mode in `production/review-mode.txt`, which controls director gate spawning. These are two different concepts — `--depth` is about how thoroughly *this* skill analyses the document.
+
+- **`full`**: Complete review — all phases + specialist agent delegation (Phase 3b)
+- **`lean`**: All phases, no specialist agents — faster, single-session analysis
+- **`solo`**: Phases 1-4 only, no delegation, no Phase 5 next-step prompt — use when called from within another skill
+
 ---

 ## Phase 1: Load Documents

 Read the target design document in full. Read CLAUDE.md to understand project context and standards. Read related design documents referenced or implied by the target doc (check `design/gdd/` for related systems).

+**Dependency graph validation:** For every system listed in the Dependencies section, use Glob to check whether its GDD file exists in `design/gdd/`. Flag any that don't exist yet — these are broken references that downstream authors will hit.
+
+**Lore/narrative alignment:** If `design/gdd/game-concept.md` or any file in `design/narrative/` exists, read it. Note any mechanical choices in this GDD that contradict established world rules, tone, or design pillars. Pass this context to `game-designer` in Phase 3b.
+
+**Prior review check:** Check whether `design/gdd/reviews/[doc-name]-review-log.md` exists. If it does, read the most recent entry — note what verdict was given and what blocking items were listed. This session is a re-review; track whether prior items were addressed.
+
 ---

 ## Phase 2: Completeness Check
@ -48,42 +64,194 @@ Evaluate against the Design Document Standard checklist:

 ---

+## Phase 3b: Adversarial Specialist Review (full mode only)
+
+**Skip this phase in `lean` or `solo` mode.**
+
+**This phase is MANDATORY in full mode.** Do not skip it.
+
+**Before spawning any agents**, print this notice:
+> "Full review: spawning specialist agents in parallel. This typically takes 8–15 minutes. Use `--review lean` for faster single-session analysis."
+
+### Step 1 — Identify all domains the GDD touches
+
+Read the GDD and identify every domain present. A GDD can touch multiple domains simultaneously — be thorough. Common signals:
+
+| If the GDD contains... | Spawn these agents |
+|------------------------|-------------------|
+| Costs, prices, drops, rewards, economy | `economy-designer` |
+| Combat stats, damage, health, DPS | `game-designer`, `systems-designer` |
+| AI behaviour, pathfinding, targeting | `ai-programmer` |
+| Level layout, spawning, wave structure | `level-designer` |
+| Player progression, XP, unlocks | `economy-designer`, `game-designer` |
+| UI, HUD, menus, player-facing displays | `ux-designer`, `ui-programmer` |
+| Dialogue, quests, story, lore | `narrative-director` |
+| Animation, feel, timing, juice | `gameplay-programmer` |
+| Multiplayer, sync, replication | `network-programmer` |
+| Audio cues, music triggers | `audio-director` |
+| Performance, draw calls, memory | `performance-analyst` |
+| Engine-specific patterns or APIs | Primary engine specialist (from `.claude/docs/technical-preferences.md`) |
+| Acceptance criteria, test coverage | `qa-lead` |
+| Data schema, resource structure | `systems-designer` |
+| Any gameplay system | `game-designer` (always) |
+
+**Always spawn `game-designer` and `systems-designer` as a baseline minimum.** Every GDD touches their domain.
+
+### Step 2 — Spawn all relevant specialists in parallel
+
+**CRITICAL: Task in this skill spawns a SUBAGENT — a separate independent Claude session
+with its own context window. It is NOT task tracking. Do NOT simulate specialist
+perspectives internally. Do NOT reason through domain views yourself. You MUST issue
+actual Task calls. A simulated review is not a specialist review.**
+
+Issue all Task calls simultaneously. Do NOT spawn one at a time.
+
+**Prompt each specialist adversarially:**
+> "Here is the GDD for [system] and the main review's structural findings so far.
+> Your job is NOT to validate this design — your job is to find problems.
+> Challenge the design choices from your domain expertise. What is wrong,
+> underspecified, likely to cause problems, or missing entirely?
+> Be specific and critical. Disagreement with the main review is welcome."
+
+**Additional instructions per agent type:**
+
+- **`game-designer`**: Anchor your review to the Player Fantasy stated in Section B of this GDD. Does this design actually deliver that fantasy? Would a player feel the intended experience? Flag any rules that serve implementability but undermine the stated feeling.
+
+- **`systems-designer`**: For every formula in the GDD, plug in boundary values (minimum and maximum plausible inputs). Report whether any outputs go degenerate — negative values, division by zero, infinity, or nonsensical results at the extremes.
+
+- **`qa-lead`**: Review every acceptance criterion. Flag any that are not independently testable — phrases like "feels balanced", "works correctly", "performs well" are not ACs. Suggest concrete rewrites for any that fail this test.
+
+### Step 3 — Senior lead review
+
+After all specialists respond, spawn `creative-director` as the **senior reviewer**:
+- Provide: the GDD, all specialist findings, any disagreements between them
+- Ask: "Synthesise these findings. What are the most important issues? Do you agree with the specialists? What is your overall verdict on this design?"
+- The creative-director's synthesis becomes the **final verdict** in Phase 4.
+
+### Step 4 — Surface disagreements
+
+If specialists disagree with each other or with the creative-director, do NOT silently pick one view. Present the disagreement explicitly in Phase 4 so the user can adjudicate.
+
+Mark every finding with its source: `[game-designer]`, `[economy-designer]`, `[creative-director]` etc.
+
+---
+
 ## Phase 4: Output Review

 ```
 ## Design Review: [Document Title]
+Specialists consulted: [list agents spawned]
+Re-review: [Yes — prior verdict was X on YYYY-MM-DD / No — first review]

 ### Completeness: [X/8 sections present]
 [List missing sections]

-### Consistency Issues
-[List any internal or cross-system contradictions]
+### Dependency Graph
+[List each declared dependency and whether its GDD file exists on disk]
+- ✓ enemy-definition-data.md — exists
+- ✗ loot-system.md — NOT FOUND (file does not exist yet)

-### Implementability Concerns
-[List any vague or unimplementable sections]
+### Required Before Implementation
+[Numbered list — blocking issues only. Each item tagged with source agent.]

-### Balance Concerns
-[List any obvious balance risks]
+### Recommended Revisions
+[Numbered list — important but not blocking. Source-tagged.]

-### Recommendations
-[Prioritized list of improvements]
+### Specialist Disagreements
+[Any cases where agents disagreed with each other or with the main review.
+Present both sides — do not silently resolve.]
+
+### Nice-to-Have
+[Minor improvements, low priority.]
+
+### Senior Verdict [creative-director]
+[Creative director's synthesis and overall assessment.]
+
+### Scope Signal
+Estimate implementation scope based on: dependency count, formula count,
+systems touched, and whether new ADRs are required.
+- **S** — single system, no formulas, no new ADRs, <3 dependencies
+- **M** — moderate complexity, 1-2 formulas, 3-6 dependencies
+- **L** — multi-system integration, 3+ formulas, may require new ADR
+- **XL** — cross-cutting concern, 5+ dependencies, multiple new ADRs likely
+Label clearly: "Rough scope signal: M (producer should verify before sprint planning)"

 ### Verdict: [APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED]
 ```

-This skill is read-only — no files are written.
+This skill is read-only — no files are written during Phase 4.

 ---

 ## Phase 5: Next Steps

-If the document being reviewed is `game-concept.md` or `game-pillars.md`:
- Check if `design/gdd/systems-index.md` exists. If not, recommend: "Run `/map-systems` to break the concept down into individual systems with dependencies and priorities, then write per-system GDDs."
+Use `AskUserQuestion` for ALL closing interactions. Never plain text.

-If the document is an individual system GDD:
- If verdict is APPROVED: suggest updating the system's status to 'Approved' in the systems index.
- If verdict is NEEDS REVISION or MAJOR REVISION NEEDED: suggest updating the status to 'In Review'.
+**First widget — what to do next:**

-Next skill options:
- APPROVED → `/create-epics` or `/map-systems`
- NEEDS REVISION → revise the doc then re-run `/design-review`
+If APPROVED (first-pass, no revision needed), proceed directly to the systems-index widget, review-log widget, then the final closing widget. Do not show a separate "what to do" widget — the final closing widget covers next steps.
+
+If NEEDS REVISION or MAJOR REVISION NEEDED, options:
+- `[A] Revise the GDD now — address blocking items together`
+- `[B] Stop here — revise in a separate session`
+- `[C] Accept as-is and move on (only if all items are advisory)`
+
+**If user selects [A] — Revise now:**
+
+Work through all blocking items, asking for design decisions only where you cannot resolve the issue from the GDD and existing docs alone. Group all design-decision questions into a single multi-tab `AskUserQuestion` before making any edits — do not interrupt mid-revision for each blocker individually.
+
+After all revisions are complete, show a summary table (blocker → fix applied) and use `AskUserQuestion` for a **post-revision closing widget**:
+
+- Prompt: "Revisions complete — [N] blockers resolved. What next?"
+- Note current context usage: if context is above ~50%, add: "(Recommended: /clear before re-review — this session has used X% context. A full re-review runs 5 agents and needs clean context.)"
+- Options:
+  - `[A] Re-review in a new session — run /design-review [doc-path] after /clear`
+  - `[B] Accept revisions and mark Approved — update systems index, skip re-review`
+  - `[C] Move to next system — /design-system [next-system] (#N in design order)`
+  - `[D] Stop here`
+
+Never end the revision flow with plain text. Always close with this widget.
+
+**Second widget — systems index update (always show this separately):**
+
+Use a second `AskUserQuestion`:
+- Prompt: "May I update `design/gdd/systems-index.md` to mark [system] as [In Review / Approved]?"
+- Options: `[A] Yes — update it` / `[B] No — leave it as-is`
+
+**Third widget — review log (always offer):**
+
+Use a third `AskUserQuestion`:
+- Prompt: "May I append this review summary to `design/gdd/reviews/[doc-name]-review-log.md`? This creates a revision history so future re-reviews can track what changed."
+- Options: `[A] Yes — append to review log` / `[B] No — skip`
+
+If yes, append an entry in this format:
+```
+## Review — [YYYY-MM-DD] — Verdict: [APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED]
+Scope signal: [S/M/L/XL]
+Specialists: [list]
+Blocking items: [count] | Recommended: [count]
+Summary: [2-3 sentence summary of key findings from creative-director verdict]
+Prior verdict resolved: [Yes / No / First review]
+```
+
+---
+
+**Final closing widget — always show after all file writes complete:**
+
+Once the systems-index and review-log widgets are answered, check project state and show one final `AskUserQuestion`:
+
+Before building options, read:
+- `design/gdd/systems-index.md` — find any system with Status: In Review or NEEDS REVISION (other than the one just reviewed)
+- Count `.md` files in `design/gdd/` (excluding game-concept.md, systems-index.md) to determine if `/review-all-gdds` is worth offering (≥2 GDDs)
+- Find the next system with Status: Not Started in design order
+
+Build the option list dynamically — only include options that are genuinely next:
+- `[_] Run /design-review [other-gdd-path] — [system name] is still [In Review / NEEDS REVISION]` (include if another GDD needs review)
+- `[_] Run /consistency-check — verify this GDD's values don't conflict with existing GDDs` (always include if ≥1 other GDD exists)
+- `[_] Run /review-all-gdds — holistic design-theory review across all designed systems` (include if ≥2 GDDs exist)
+- `[_] Run /design-system [next-system] — next in design order` (always include, name the actual system)
+- `[_] Stop here`
+
+Assign letters A, B, C… only to included options. Mark the most pipeline-advancing option as `(recommended)`.
+
+Never end the skill with plain text after file writes. Always close with this widget.
--- a/.claude/skills/design-system/SKILL.md
+++ b/.claude/skills/design-system/SKILL.md
@ -10,14 +10,24 @@ When this skill is invoked:

 ## 1. Parse Arguments & Validate

-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`

-A system name or retrofit path is **required**. If missing, fail with:
-> "Usage: `/design-system <system-name>` — e.g., `/design-system movement`
-> Or to fill gaps in an existing GDD: `/design-system retrofit design/gdd/[system-name].md`
-> Run `/map-systems` first to create the systems index, then use this skill
-> to write individual system GDDs."
+See `.claude/docs/director-gates.md` for the full check pattern.
+
+A system name or retrofit path is **required**. If missing:
+
+1. Check if `design/gdd/systems-index.md` exists.
+2. If it exists: read it, find the highest-priority system with status "Not Started" or equivalent, and use `AskUserQuestion`:
+   - Prompt: "The next system in your design order is **[system-name]** ([priority] | [layer]). Start designing it?"
+   - Options: `[A] Yes — design [system-name]` / `[B] Pick a different system` / `[C] Stop here`
+   - If [A]: proceed with that system name. If [B]: ask which system to design (plain text). If [C]: exit.
+3. If no systems index exists, fail with:
+   > "Usage: `/design-system <system-name>` — e.g., `/design-system movement`
+   > Or to fill gaps in an existing GDD: `/design-system retrofit design/gdd/[system-name].md`
+   > No systems index found. Run `/map-systems` first to map your systems and get the design order."

 **Detect retrofit mode:**
 If the argument starts with `retrofit` or the argument is a file path to an
@ -271,7 +281,12 @@ Use the template structure from `.claude/docs/templates/game-design-document.md`

 Ask: "May I create the skeleton file at `design/gdd/[system-name].md`?"

-After writing, create `production/session-state/active.md` if it does not exist, then update it with:
+After writing, update `production/session-state/active.md`:
+- Use Glob to check if the file exists.
+- If it **does not exist**: use the **Write** tool to create it. Never attempt Edit on a file that may not exist.
+- If it **already exists**: use the **Edit** tool to update the relevant fields.
+
+File content:
 - Task: Designing [system-name] GDD
 - Current section: Starting (skeleton created)
 - File: design/gdd/[system-name].md
@ -304,10 +319,24 @@ Context  ->  Questions  ->  Options  ->  Decision  ->  Draft  ->  Approval  ->
 5. **Draft**: Write the section content in conversation text for review. Flag any
   provisional assumptions about undesigned dependencies.

-6. **Approval**: Ask "Approve this section, or would you like changes?"
+6. **Approval**: Immediately after the draft — in the SAME response — use
+   `AskUserQuestion`. **NEVER use plain text. NEVER skip this step.**
+   - Prompt: "Approve the [Section Name] section?"
+   - Options: `[A] Approve — write it to file` / `[B] Make changes — describe what to fix` / `[C] Start over`

-7. **Write**: Use the Edit tool to replace the `[To be designed]` placeholder with
-   the approved content. Confirm the write.
+   **The draft and the approval widget MUST appear together in one response.
+   If the draft appears without the widget, the user is left at a blank prompt
+   with no path forward — this is a protocol violation.**
+
+7. **Write**: Use the Edit tool to replace the placeholder with the approved content.
+   **CRITICAL**: Always include the section heading in the `old_string` to ensure
+   uniqueness — never match `[To be designed]` alone, as multiple sections use the
+   same placeholder and the Edit tool requires a unique match. Use this pattern:
+   ```
+   old_string: "## [Section Name]\n\n[To be designed]"
+   new_string: "## [Section Name]\n\n[approved content]"
+   ```
+   Confirm the write.

 8. **Registry conflict check** (Sections C and D only — Detailed Design and Formulas):
   After writing, scan the section content for entity names, item names, formula
@ -321,7 +350,8 @@ Context  ->  Questions  ->  Options  ->  Decision  ->  Draft  ->  Approval  ->
     (will be handled in Phase 5).

 After writing each section, update `production/session-state/active.md` with the
-completed section name.
+completed section name. Use Glob to check if the file exists — use Write to create
+it if absent, Edit to update it if present.

 ### Section-Specific Guidance

@ -333,6 +363,20 @@ Each section has unique design considerations and may benefit from specialist ag

 **Goal**: One paragraph a stranger could read and understand.

+**Derive recommended options before building the widget**: Read the system's category and layer from the systems index (already in context from Phase 2), then determine the recommended option for each tab:
+- **Framing tab**: Foundation/Infrastructure layer → `[A]` recommended. Player-facing categories (Combat, UI, Dialogue, Character, Animation, Visual Effects, Audio) → `[C] Both` recommended.
+- **ADR ref tab**: Glob `docs/architecture/adr-*.md` and grep for the system name in the GDD Requirements section of any ADR. If a matching ADR is found → `[A] Yes — cite the ADR` recommended. If none found → `[B] No` recommended.
+- **Fantasy tab**: Foundation/Infrastructure layer → `[B] No` recommended. All other categories → `[A] Yes` recommended.
+
+Append `(Recommended)` to the appropriate option text in each tab.
+
+**Framing questions (ask BEFORE drafting)**: Use `AskUserQuestion` with a multi-tab widget:
+- Tab "Framing" — "How should the overview frame this system?" Options: `[A] As a data/infrastructure layer (technical framing)` / `[B] Through its player-facing effect (design framing)` / `[C] Both — describe the data layer and its player impact`
+- Tab "ADR ref" — "Should the overview reference the existing ADR for this system?" Options: `[A] Yes — cite the ADR for implementation details` / `[B] No — keep the GDD at pure design level`
+- Tab "Fantasy" — "Does this system have a player fantasy worth stating?" Options: `[A] Yes — players feel it directly` / `[B] No — pure infrastructure, players feel what it enables`
+
+Use the user's answers to shape the draft. Do NOT answer these questions yourself and auto-draft.
+
 **Questions to ask**:
 - What is this system in one sentence?
 - How does a player interact with it? (active/passive/automatic)
@ -341,12 +385,32 @@ Each section has unique design considerations and may benefit from specialist ag
 **Cross-reference**: Check that the description aligns with how the systems index
 describes it. Flag discrepancies.

+**Design vs. implementation boundary**: Overview questions must stay at the behavior
+level — what the system *does*, not *how it is built*. If implementation questions
+arise during the Overview (e.g., "Should this use an Autoload singleton or a signal
+bus?"), note them as "→ becomes an ADR" and move on. Implementation patterns belong
+in `/architecture-decision`, not the GDD. The GDD describes behavior; the ADR
+describes the technical approach used to achieve it.
+
 ---

 ### Section B: Player Fantasy

 **Goal**: The emotional target — what the player should *feel*.

+**Derive recommended option before building the widget**: Read the system's category and layer from Phase 2 context:
+- Player-facing categories (Combat, UI, Dialogue, Character, Animation, Audio, Level/World) → `[A] Direct` recommended
+- Foundation/Infrastructure layer → `[B] Indirect` recommended
+- Mixed categories (Camera/input, Economy, AI with visible player effects) → `[C] Both` recommended
+
+Append `(Recommended)` to the appropriate option text.
+
+**Framing question (ask BEFORE drafting)**: Use `AskUserQuestion`:
+- Prompt: "Is this system something the player engages with directly, or infrastructure they experience indirectly?"
+- Options: `[A] Direct — player actively uses or feels this system` / `[B] Indirect — player experiences the effects, not the system` / `[C] Both — has a direct interaction layer and infrastructure beneath it`
+
+Use the answer to frame the Player Fantasy section appropriately. Do NOT assume the answer.
+
 **Questions to ask**:
 - What emotion or power fantasy does this serve?
 - What reference games nail this feeling? What specifically creates it?
@ -355,6 +419,16 @@ describes it. Flag discrepancies.
 **Cross-reference**: Must align with the game pillars. If the system serves a pillar,
 quote the relevant pillar text.

+**Agent delegation (MANDATORY)**: After the framing answer is given but before drafting,
+spawn `creative-director` via Task:
+- Provide: system name, framing answer (direct/indirect/both), game pillars, any reference games the user mentioned, the game concept summary
+- Ask: "Shape the Player Fantasy for this system. What emotion or power fantasy should it serve? What player moment should we anchor to? What tone and language fits the game's established feeling? Be specific — give me 2-3 candidate framings."
+- Collect the creative-director's framings and present them to the user alongside the draft.
+
+**Do NOT draft Section B without first consulting `creative-director`.** The framing
+answer tells us *what kind* of fantasy it is; the creative-director shapes *how it's
+described* — tone, language, the specific player moment to anchor to.
+
 ---

 ### Section C: Detailed Design (Core Rules, States, Interactions)
@ -375,9 +449,15 @@ This is usually the largest section. Break it into sub-sections:
 - What are the decision points the player faces?
 - What can the player NOT do? (Constraints are as important as capabilities)

-**Agent delegation**: For complex mechanics, use the Task tool to delegate to
-`game-designer` for high-level design review, or `systems-designer` for detailed
-mechanical modeling. Provide the full context gathered in Phase 2.
+**Agent delegation (MANDATORY)**: Before drafting Section C, spawn specialist agents via Task in parallel:
+- Look up the system category in the routing table (Section 6 of this skill)
+- Spawn the Primary Agent AND Supporting Agent(s) listed for this category
+- Provide each agent: system name, game concept summary, pillar set, dependency GDD excerpts, the specific section being worked on
+- Collect their findings before drafting
+- Surface any disagreements between agents to the user via `AskUserQuestion`
+- Draft only after receiving specialist input
+
+**Do NOT draft Section C without first consulting the appropriate specialists.** A `systems-designer` reviewing rules and mechanics will catch design gaps the main session cannot.

 **Cross-reference**: For each interaction listed, verify it matches what the
 dependency GDD specifies. If a dependency defines a value or formula and this
@ -414,14 +494,12 @@ table. A formula without defined variables cannot be implemented without guesswo
 - Should scaling be linear, logarithmic, or stepped?
 - What should the output ranges be at early/mid/late game?

-**Agent delegation**: For formula-heavy systems, delegate to `systems-designer`
-via the Task tool. Provide:
- The Core Rules from Section C (already written to file)
- Tuning goals from the user
- Balance context from dependency GDDs
-
-The agent should return proposed formulas with variable tables and expected output
-ranges. Present these to the user for review before approving.
+**Agent delegation (MANDATORY)**: Before proposing any formulas or balance values, spawn specialist agents via Task in parallel:
+- **Always spawn `systems-designer`**: provide Core Rules from Section C, tuning goals from user, balance context from dependency GDDs. Ask them to propose formulas with variable tables and output ranges.
+- **For economy/cost systems, also spawn `economy-designer`**: provide placement costs, upgrade cost intent, and progression goals. Ask them to validate cost curves and ratios.
+- Present the specialists' proposals to the user for review via `AskUserQuestion`
+- The user decides; the main session writes to file
+- **Do NOT invent formula values or balance numbers without specialist input.** A user without balance design expertise cannot evaluate raw numbers — they need the specialists' reasoning.

 **Cross-reference**: If a dependency GDD defines a formula whose output feeds into
 this system, reference it explicitly. Don't reinvent — connect.
@ -448,9 +526,7 @@ design question, not a specification.
 - What happens when two rules apply at the same time?
 - What happens if a player finds an unintended interaction? (Identify degenerate strategies)

-**Agent delegation**: For systems with complex interactions, delegate to
-`systems-designer` to identify edge cases from the formula space. For narrative
-systems, consult `narrative-director` for story-breaking edge cases.
+**Agent delegation (MANDATORY)**: Spawn `systems-designer` via Task before finalising edge cases. Provide: the completed Sections C and D, and ask them to identify edge cases from the formula and rule space that the main session may have missed. For narrative systems, also spawn `narrative-director`. Present their findings and ask the user which to include.

 **Cross-reference**: Check edge cases against dependency GDDs. If a dependency
 defines a floor, cap, or resolution rule that this system could violate, flag it.
@ -506,6 +582,8 @@ Include at least: one criterion per core rule from Section C, and one per formul
 from Section D. Do NOT write "the system works as designed" — every criterion must
 be independently verifiable by a QA tester without reading the GDD.

+**Agent delegation (MANDATORY)**: Spawn `qa-lead` via Task before finalising acceptance criteria. Provide: the completed GDD sections C, D, E, and ask them to validate that the criteria are independently testable and cover all core rules and formulas. Surface any gaps or untestable criteria to the user.
+
 **Questions to ask**:
 - What's the minimum set of tests that prove this works?
 - What performance budget does this system get? (frame time, memory)
@ -518,16 +596,30 @@ not just this system in isolation.

 ### Optional Sections: Visual/Audio, UI Requirements, Open Questions

-These sections are included in the template but aren't part of the 8 required
-sections. Offer them after the required sections are done:
+These sections are included in the template. Visual/Audio is **REQUIRED** for visual system categories — not optional. Determine the requirement level before asking:
+
+**Visual/Audio is REQUIRED (mandatory — do not offer to skip) for these system categories:**
+- Combat, damage, health
+- UI systems (HUD, menus)
+- Animation, character movement
+- Visual effects, particles, shaders
+- Character systems
+- Dialogue, quests, lore
+- Level/world systems
+
+For required systems: **spawn `art-director` via Task** before drafting this section. Provide: system name, game concept, game pillars, art bible sections 1–4 if they exist. Ask them to specify: (1) VFX and visual feedback requirements for this system's events, (2) any animation or visual style constraints, (3) which art bible principles most directly apply to this system. Present their output; do NOT leave this section as `[To be designed]` for visual systems.
+
+For **all other system categories** (Foundation/Infrastructure, Economy, AI/pathfinding, Camera/input), offer the optional sections after the required sections:

 Use `AskUserQuestion`:
 - "The 8 required sections are complete. Do you want to also define Visual/Audio
  requirements, UI requirements, or capture open questions?"
  - Options: "Yes, all three", "Just open questions", "Skip — I'll add these later"

-For **Visual/Audio**: Coordinate with `art-director` and `audio-director` if detail
-is needed. Often a brief note suffices at the GDD stage.
+For **Visual/Audio** (non-required systems): Coordinate with `art-director` and `audio-director` if detail is needed. Often a brief note suffices at the GDD stage.
+
+> **Asset Spec Flag**: After the Visual/Audio section is written with real content, output this notice:
+> "📌 **Asset Spec** — Visual/Audio requirements are defined. After the art bible is approved, run `/asset-spec system:[system-name]` to produce per-asset visual descriptions, dimensions, and generation prompts from this section."

 For **UI Requirements**: Coordinate with `ux-designer` for complex UI systems.
 After writing this section, check whether it contains real content (not just
@ -562,6 +654,11 @@ the source of truth). Verify:

 ### 5a-bis: Creative Director Pillar Review

+**Review mode check** — apply before spawning CD-GDD-ALIGN:
+- `solo` → skip. Note: "CD-GDD-ALIGN skipped — Solo mode." Proceed to Step 5b.
+- `lean` → skip (not a PHASE-GATE). Note: "CD-GDD-ALIGN skipped — Lean mode." Proceed to Step 5b.
+- `full` → spawn as normal.
+
 Before finalizing the GDD, spawn `creative-director` via Task using gate **CD-GDD-ALIGN** (`.claude/docs/director-gates.md`).

 Pass: completed GDD file path, game pillars (from `design/gdd/game-concept.md` or `design/gdd/game-pillars.md`), MDA aesthetics target.
@ -610,11 +707,14 @@ Present a completion summary:
 > - Provisional assumptions: [list any assumptions about undesigned dependencies]
 > - Cross-system conflicts found: [list or "none"]

-Use `AskUserQuestion`:
- "Run `/design-review` now to validate the GDD?"
-  - Options: "Yes, run review now", "I'll review it myself first", "Skip review"
+> **To validate this GDD, open a fresh Claude Code session and run:**
+> `/design-review design/gdd/[system-name].md`
+>
+> **Never run `/design-review` in the same session as `/design-system`.** The reviewing
+> agent must be independent of the authoring context. Running it here would inherit
+> the full design history, making independent critique impossible.

-If yes, invoke the design-review skill on the completed file.
+**NEVER offer to run `/design-review` inline.** Always direct the user to a fresh window.

 ### 5d: Update Systems Index

@ -645,6 +745,7 @@ Update `production/session-state/active.md` with:
 Use `AskUserQuestion`:
 - "What's next?"
  - Options:
+    - "Run `/consistency-check` — verify this GDD's values don't conflict with existing GDDs (recommended before designing the next system)"
    - "Design next system ([next-in-order])" — if undesigned systems remain
    - "Fix review findings" — if design-review flagged issues
    - "Stop here for this session"
@ -659,15 +760,19 @@ orchestrates the overall flow; agents provide expert content.

 | System Category | Primary Agent | Supporting Agent(s) |
 |----------------|---------------|---------------------|
-| Combat, damage, health | `game-designer` | `systems-designer` (formulas), `ai-programmer` (enemy AI) |
+| **Foundation/Infrastructure** (event bus, save/load, scene mgmt, service locator) | `systems-designer` | `gameplay-programmer` (feasibility), `engine-programmer` (engine integration) |
+| Combat, damage, health | `game-designer` | `systems-designer` (formulas), `ai-programmer` (enemy AI), `art-director` (hit feedback visual direction, VFX intent) |
 | Economy, loot, crafting | `economy-designer` | `systems-designer` (curves), `game-designer` (loops) |
 | Progression, XP, skills | `game-designer` | `systems-designer` (curves), `economy-designer` (sinks) |
-| Dialogue, quests, lore | `game-designer` | `narrative-director` (story), `writer` (content) |
-| UI systems (HUD, menus) | `game-designer` | `ux-designer` (flows), `ui-programmer` (feasibility) |
+| Dialogue, quests, lore | `game-designer` | `narrative-director` (story), `writer` (content), `art-director` (character visual profiles, cinematic tone) |
+| UI systems (HUD, menus) | `game-designer` | `ux-designer` (flows), `ui-programmer` (feasibility), `art-director` (visual style direction), `technical-artist` (render/shader constraints) |
 | Audio systems | `game-designer` | `audio-director` (direction), `sound-designer` (specs) |
 | AI, pathfinding, behavior | `game-designer` | `ai-programmer` (implementation), `systems-designer` (scoring) |
 | Level/world systems | `game-designer` | `level-designer` (spatial), `world-builder` (lore) |
 | Camera, input, controls | `game-designer` | `ux-designer` (feel), `gameplay-programmer` (feasibility) |
+| Animation, character movement | `game-designer` | `art-director` (animation style, pose language), `technical-artist` (rig/blend constraints), `gameplay-programmer` (feel) |
+| Visual effects, particles, shaders | `game-designer` | `art-director` (VFX visual direction), `technical-artist` (performance budget, shader complexity), `systems-designer` (trigger/state integration) |
+| Character systems (stats, archetypes) | `game-designer` | `art-director` (character visual archetype), `narrative-director` (character arc alignment), `systems-designer` (stat formulas) |

 **When delegating via Task tool**:
 - Provide: system name, game concept summary, dependency GDD excerpts, the specific
@ -715,3 +820,13 @@ This skill follows the collaborative design principle at every step:
 **Never** write a section without user approval.
 **Never** contradict an existing approved GDD without flagging the conflict.
 **Always** show where decisions come from (dependency GDDs, pillars, user choices).
+
+## Context Window Awareness
+
+This is a long-running skill. After writing each section, check if the status line
+shows context at or above 70%. If so, append this notice to the response:
+
+> **Context is approaching the limit (≥70%).** Your progress is saved — all approved
+> sections are written to `design/gdd/[system-name].md`. When you're ready to continue,
+> open a fresh Claude Code session and run `/design-system [system-name]` — it will
+> detect which sections are complete and resume from the next one.
--- a/.claude/skills/dev-story/SKILL.md
+++ b/.claude/skills/dev-story/SKILL.md
@ -3,8 +3,7 @@ name: dev-story
 description: "Read a story file and implement it. Loads the full context (story, GDD requirement, ADR guidelines, control manifest), routes to the right programmer agent for the system and engine, implements the code and test, and confirms each acceptance criterion. The core implementation skill — run after /story-readiness, before /code-review and /story-done."
 argument-hint: "[story-path]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Bash, Task
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Bash, Task, AskUserQuestion
 ---

 # Dev Story
@ -15,12 +14,15 @@ drives implementation to completion — including writing the test.

 **The loop for every story:**
 ```
+/qa-plan sprint           ← define test requirements before sprint begins
 /story-readiness [path]   ← validate before starting
 /dev-story [path]         ← implement it  (this skill)
 /code-review [files]      ← review it
 /story-done [path]        ← verify and close it
 ```

+**After all sprint stories are done:** run `/team-qa sprint` to execute the full QA cycle and get a sign-off verdict before advancing the project stage.
+
 **Output:** Source code + test file in the project's `src/` and `tests/` directories.

 ---
@ -38,7 +40,17 @@ If not found, ask: "Which story are we implementing?" Glob

 ## Phase 2: Load Full Context

-Read everything in this order — do not start implementation until all is loaded:
+**Before loading any context, verify required files exist.** Extract the ADR path from the story's `ADR Governing Implementation` field, then check:
+
+| File | Path | If missing |
+|------|------|------------|
+| TR registry | `docs/architecture/tr-registry.yaml` | **STOP** — "TR registry not found. Run `/create-epics` to generate it." |
+| Governing ADR | path from story's ADR field | **STOP** — "ADR file [path] not found. Run `/architecture-decision` to create it, or correct the filename in the story's ADR field." |
+| Control manifest | `docs/architecture/control-manifest.md` | **WARN and continue** — "Control manifest not found — layer rules cannot be checked. Run `/create-control-manifest`." |
+
+If the TR registry or governing ADR is missing, set the story status to **BLOCKED** in the session state and do not spawn any programmer agent.
+
+Read all of the following simultaneously — these are independent reads. Do not start implementation until all context is loaded:

 ### The story file
 Extract and hold:
@ -71,9 +83,16 @@ Read `docs/architecture/control-manifest.md`. Extract the rules for this story's
 - Performance guardrails

 Check: does the story's embedded Manifest Version match the current manifest header date?
-If they differ: "Story was written against manifest v[story-date]. Current manifest is
-v[current-date]. New rules may apply — reviewing the diff before implementing."
-Read the manifest carefully for any new rules added since the story was written.
+If they differ, use `AskUserQuestion` before proceeding:
+- Prompt: "Story was written against manifest v[story-date]. Current manifest is v[current-date]. New rules may apply. How do you want to proceed?"
+- Options:
+  - `[A] Update story manifest version and implement with current rules (Recommended)`
+  - `[B] Implement with old rules — I accept the risk of non-compliance`
+  - `[C] Stop here — I want to review the manifest diff first`
+
+If [A]: edit the story file's `Manifest Version:` field to the current manifest date before spawning the programmer. Then read the manifest carefully for new rules.
+If [B]: read the manifest carefully for new rules anyway, and note the version mismatch in the Phase 6 summary under "Deviations".
+If [C]: stop. Do not spawn any agent. Let the user review and re-run `/dev-story`.

 ### Engine reference
 Read `.claude/docs/technical-preferences.md`:
@ -89,6 +108,9 @@ Read `.claude/docs/technical-preferences.md`:
 Based on the story's **Layer**, **Type**, and **system name**, determine which
 specialist to spawn via Task.

+**Config/Data stories — skip agent spawning entirely:**
+If the story's Type is `Config/Data`, no programmer agent or engine specialist is needed. Jump directly to Phase 4 (Config/Data note). The implementation is a data file edit — no routing table evaluation, no engine specialist.
+
 ### Primary agent routing table

 | Story context | Primary agent |
--- a/.claude/skills/gate-check/SKILL.md
+++ b/.claude/skills/gate-check/SKILL.md
@ -3,8 +3,7 @@ name: gate-check
 description: "Validate readiness to advance between development phases. Produces a PASS/CONCERNS/FAIL verdict with specific blockers and required artifacts. Use when user says 'are we ready to move to X', 'can we advance to production', 'check if we can start the next phase', 'pass the gate'."
 argument-hint: "[target-phase: systems-design | technical-setup | pre-production | production | polish | release] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Bash, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Bash, Write, Task, AskUserQuestion
 model: opus
 ---

@ -37,14 +36,24 @@ The project progresses through these stages:

 **Target phase:** `$ARGUMENTS[0]` (blank = auto-detect current stage, then validate next transition)

-Also extract `--review [full|lean|solo]` if present. Note: in `solo` mode,
-director spawns (CD-PHASE-GATE, TD-PHASE-GATE, PR-PHASE-GATE) are skipped —
-gate-check becomes artifact-existence checks only. In `lean` mode, all three
-directors still run (phase gates are the purpose of lean mode).
+Also resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+Note: in `solo` mode, director spawns (CD-PHASE-GATE, TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE) are skipped — gate-check becomes artifact-existence checks only. In `lean` mode, all four directors still run (phase gates are the purpose of lean mode).

 - **With argument**: `/gate-check production` — validate readiness for that specific phase
 - **No argument**: Auto-detect current stage using the same heuristics as
-  `/project-stage-detect`, then validate the NEXT phase transition
+  `/project-stage-detect`, then **confirm with the user before running**:
+
+  Use `AskUserQuestion`:
+  - Prompt: "Detected stage: **[current stage]**. Running gate for [Current] → [Next] transition. Is this correct?"
+  - Options:
+    - `[A] Yes — run this gate`
+    - `[B] No — pick a different gate` (if selected, show a second widget listing all gate options: Concept → Systems Design, Systems Design → Technical Setup, Technical Setup → Pre-Production, Pre-Production → Production, Production → Polish, Polish → Release)
+  
+  Do not skip this confirmation step when no argument is provided.

 ---

@ -55,11 +64,13 @@ directors still run (phase gates are the purpose of lean mode).
 **Required Artifacts:**
 - [ ] `design/gdd/game-concept.md` exists and has content
 - [ ] Game pillars defined (in concept doc or `design/gdd/game-pillars.md`)
+- [ ] Visual Identity Anchor section exists in `design/gdd/game-concept.md` (from brainstorm Phase 4 art-director output)

 **Quality Checks:**
 - [ ] Game concept has been reviewed (`/design-review` verdict not MAJOR REVISION NEEDED)
 - [ ] Core loop is described and understood
 - [ ] Target audience is identified
+- [ ] Visual Identity Anchor contains a one-line visual rule and at least 2 supporting visual principles

 ---

@ -85,6 +96,7 @@ directors still run (phase gates are the purpose of lean mode).
 **Required Artifacts:**
 - [ ] Engine chosen (CLAUDE.md Technology Stack is not `[CHOOSE]`)
 - [ ] Technical preferences configured (`.claude/docs/technical-preferences.md` populated)
+- [ ] Art bible exists at `design/art/art-bible.md` with at least Sections 1–4 (Visual Identity Foundation)
 - [ ] At least 3 Architecture Decision Records in `docs/architecture/` covering
      Foundation-layer systems (scene management, event architecture, save/load)
 - [ ] Engine reference docs exist in `docs/engine-reference/[engine]/`
@ -110,6 +122,13 @@ directors still run (phase gates are the purpose of lean mode).
 - [ ] Architecture traceability matrix has **zero Foundation layer gaps**
      (all Foundation requirements must have ADR coverage before Pre-Production)

+**ADR Circular Dependency Check**: For all ADRs in `docs/architecture/`, read each ADR's
+"ADR Dependencies" / "Depends On" section. Build a dependency graph (ADR-A → ADR-B means
+A depends on B). If any cycle is detected (e.g. A→B→A, or A→B→C→A):
+- Flag as **FAIL**: "Circular ADR dependency: [ADR-X] → [ADR-Y] → [ADR-X].
+  Neither can reach Accepted while the cycle exists. Remove one 'Depends On' edge to
+  break the cycle."
+
 **Engine Validation** (read `docs/engine-reference/[engine]/VERSION.md` first):
 - [ ] ADRs that touch post-cutoff engine APIs are flagged with Knowledge Risk: HIGH/MEDIUM
 - [ ] `/architecture-review` engine audit shows no deprecated API usage
@ -122,6 +141,8 @@ directors still run (phase gates are the purpose of lean mode).
 **Required Artifacts:**
 - [ ] At least 1 prototype in `prototypes/` with a README
 - [ ] First sprint plan exists in `production/sprints/`
+- [ ] Art bible is complete (all 9 sections) and AD-ART-BIBLE sign-off verdict is recorded in `design/art/art-bible.md`
+- [ ] Character visual profiles exist for key characters referenced in narrative docs
 - [ ] All MVP-tier GDDs from systems index are complete
 - [ ] Master architecture document exists at `docs/architecture/architecture.md`
 - [ ] At least 3 ADRs covering Foundation-layer decisions exist in `docs/architecture/`
@ -174,6 +195,8 @@ directors still run (phase gates are the purpose of lean mode).
 - [ ] Test files exist in `tests/unit/` and `tests/integration/` covering Logic and Integration stories
 - [ ] All Logic stories from this sprint have corresponding unit test files in `tests/unit/`
 - [ ] Smoke check has been run with a PASS or PASS WITH WARNINGS verdict — report exists in `production/qa/`
+- [ ] QA plan exists in `production/qa/` (generated by `/qa-plan`) covering this sprint or final production sprint
+- [ ] QA sign-off report exists in `production/qa/` (generated by `/team-qa`) with verdict APPROVED or APPROVED WITH CONDITIONS
 - [ ] At least 3 distinct playtest sessions documented in `production/playtests/`
 - [ ] Playtest reports cover: new player experience, mid-game systems, and difficulty curve
 - [ ] Fun hypothesis from Game Concept has been explicitly validated or revised
@ -236,6 +259,14 @@ For each item in the target gate:
 - Don't just check existence — verify the file has real content (not just a template header)
 - For code checks, verify directory structure and file counts

+**Systems Design → Technical Setup gate — cross-GDD review check**:
+Use `Glob('design/gdd/gdd-cross-review-*.md')` to find the `/review-all-gdds` report.
+If no file matches, mark the "cross-GDD review report exists" artifact as **FAIL** and
+surface it prominently: "No `/review-all-gdds` report found in `design/gdd/`. Run
+`/review-all-gdds` before advancing to Technical Setup."
+If a file is found, read it and check the verdict line: a FAIL verdict means the
+cross-GDD consistency check failed and must be resolved before advancing.
+
 ### Quality Checks
 - For test checks: Run the test suite via `Bash` if a test runner is configured
 - For design review checks: `Read` the GDD and check for the 8 required sections
@ -264,17 +295,18 @@ For items that can't be automatically verified, **ask the user**:

 ## 4b. Director Panel Assessment

-Before generating the final verdict, spawn all three directors as **parallel subagents** via Task using the parallel gate protocol from `.claude/docs/director-gates.md`. Issue all three Task calls simultaneously — do not wait for one before starting the next.
+Before generating the final verdict, spawn all four directors as **parallel subagents** via Task using the parallel gate protocol from `.claude/docs/director-gates.md`. Issue all four Task calls simultaneously — do not wait for one before starting the next.

 **Spawn in parallel:**

 1. **`creative-director`** — gate **CD-PHASE-GATE** (`.claude/docs/director-gates.md`)
 2. **`technical-director`** — gate **TD-PHASE-GATE** (`.claude/docs/director-gates.md`)
 3. **`producer`** — gate **PR-PHASE-GATE** (`.claude/docs/director-gates.md`)
+4. **`art-director`** — gate **AD-PHASE-GATE** (`.claude/docs/director-gates.md`)

 Pass to each: target phase name, list of artifacts present, and the context fields listed in that gate's definition.

-**Collect all three responses, then present the Director Panel summary:**
+**Collect all four responses, then present the Director Panel summary:**

 ```
 ## Director Panel Assessment
@ -287,12 +319,15 @@ Technical Director: [READY / CONCERNS / NOT READY]

 Producer:           [READY / CONCERNS / NOT READY]
  [feedback]
+
+Art Director:       [READY / CONCERNS / NOT READY]
+  [feedback]
 ```

 **Apply to the verdict:**
 - Any director returns NOT READY → verdict is minimum FAIL (user may override with explicit acknowledgement)
 - Any director returns CONCERNS → verdict is minimum CONCERNS
- All three READY → eligible for PASS (still subject to artifact and quality checks from Section 3)
+- All four READY → eligible for PASS (still subject to artifact and quality checks from Section 3)

 ---

@ -387,17 +422,53 @@ echo -n "Production" > production/stage.txt

 ---

-## 7. Follow-Up Actions
+## 7. Closing Next-Step Widget
+
+After the verdict is presented and any stage.txt update is complete, close with a structured next-step prompt using `AskUserQuestion`.
+
+**Tailor the options to the gate that just ran:**
+
+For **systems-design PASS**:
+```
+Gate passed. What would you like to do next?
+[A] Run /create-architecture — produce your master architecture blueprint and ADR work plan (recommended next step)
+[B] Design more GDDs first — return here when all MVP systems are complete
+[C] Stop here for this session
+```
+
+> **Note for systems-design PASS**: `/create-architecture` is the required next step before writing any ADRs. It produces the master architecture document and a prioritized list of ADRs to write. Running `/architecture-decision` without this step means writing ADRs without a blueprint — skip it at your own risk.
+
+For **technical-setup PASS**:
+```
+Gate passed. What would you like to do next?
+[A] Start Pre-Production — begin prototyping the Vertical Slice
+[B] Write more ADRs first — run /architecture-decision [next-system]
+[C] Stop here for this session
+```
+
+For all other gates, offer the two most logical next steps for that phase plus "Stop here".
+
+---
+
+## 8. Follow-Up Actions

 Based on the verdict, suggest specific next steps:

+- **No art bible?** → `/art-bible` to create the visual identity specification
+- **Art bible exists but no asset specs?** → `/asset-spec system:[name]` to generate per-asset visual specs and generation prompts from approved GDDs
 - **No game concept?** → `/brainstorm` to create one
 - **No systems index?** → `/map-systems` to decompose the concept into systems
 - **Missing design docs?** → `/reverse-document` or delegate to `game-designer`
 - **Small design change needed?** → `/quick-design` for changes under ~4 hours (bypasses full GDD pipeline)
 - **No UX specs?** → `/ux-design [screen name]` to author specs, or `/team-ui [feature]` for full pipeline
 - **UX specs not reviewed?** → `/ux-review [file]` or `/ux-review all` to validate
- **No accessibility requirements doc?** → create `design/accessibility-requirements.md` using the accessibility-requirements template
+- **No accessibility requirements doc?** → Use `AskUserQuestion` to offer to create it now:
+  - Prompt: "The gate requires `design/accessibility-requirements.md`. Shall I create it from the template?"
+  - Options: `Create it now — I'll choose an accessibility tier`, `I'll create it myself`, `Skip for now`
+  - If "Create it now": use a second `AskUserQuestion` to ask for the tier:
+    - Prompt: "Which accessibility tier fits this project?"
+    - Options: `Basic — remapping + subtitles only (lowest effort)`, `Standard — Basic + colorblind modes + scalable UI`, `Comprehensive — Standard + motor accessibility + full settings menu`, `Exemplary — Comprehensive + external audit + full customization`
+  - Then write `design/accessibility-requirements.md` using the template at `.claude/docs/templates/accessibility-requirements.md`, filling in the chosen tier. Confirm: "May I write `design/accessibility-requirements.md`?"
 - **No interaction pattern library?** → `/ux-design patterns` to initialize it
 - **GDDs not cross-reviewed?** → `/review-all-gdds` (run after all MVP GDDs are individually approved)
 - **Cross-GDD consistency issues?** → fix flagged GDDs, then re-run `/review-all-gdds`
--- a/.claude/skills/help/SKILL.md
+++ b/.claude/skills/help/SKILL.md
@ -27,6 +27,29 @@ the artifact globs that indicate completion.

 ---

+## Step 1b: Find Skills Not in the Catalog
+
+After reading the catalog, Glob `.claude/skills/*/SKILL.md` to get the full list
+of installed skills. For each file, extract the `name:` field from its frontmatter.
+
+Compare against the `command:` values in the catalog. Any skill whose name does
+not appear as a catalog command is an **uncataloged skill** — still usable but not
+part of the phase-gated workflow.
+
+Collect these for the output in Step 7 — show them as a footer block:
+
+```
+### Also installed (not in workflow)
+- `/skill-name` — [description from SKILL.md frontmatter]
+- `/skill-name` — [description]
+```
+
+Only show this block if at least one uncataloged skill exists. Limit to the 10
+most relevant based on the user's current phase (QA skills in production, team
+skills in production/polish, etc.).
+
+---
+
 ## Step 2: Determine Current Phase

 Check in this order:
--- a/.claude/skills/hotfix/SKILL.md
+++ b/.claude/skills/hotfix/SKILL.md
@ -84,27 +84,71 @@ Use the Task tool to request sign-off in parallel:
 - `subagent_type: qa-tester` — Run targeted regression tests on the affected system
 - `subagent_type: producer` — Approve deployment timing and communication plan

+All three must return APPROVE before proceeding. If any returns CONCERNS or REJECT, do not deploy — surface the issue and resolve it first.
+
 ---

-## Phase 6: Summary
+## Phase 5b: QA Re-Entry Gate

-Output a summary with: severity, root cause, fix applied, testing status, and what approvals are still needed before deployment.
+After approvals, determine the QA scope required before deploying the hotfix. Spawn `qa-lead` via Task with:
+- The hotfix description and affected system
+- The regression test results from Phase 5
+- A list of all systems that touch the changed files (use Grep to find callers)
+
+Ask qa-lead: **Is a full smoke check sufficient, or does this fix require a targeted team-qa pass?**
+
+Apply the verdict:
+- **Smoke check sufficient** — run `/smoke-check` against the hotfix build. If PASS, proceed to Phase 6.
+- **Targeted QA pass required** — run `/team-qa [affected-system]` scoped to the changed system only. If QA returns APPROVED or APPROVED WITH CONDITIONS, proceed to Phase 6.
+- **Full QA required** — S1 fixes that touch core systems may require a full `/team-qa sprint`. This delays deployment but prevents a bad patch.
+
+Do not skip this gate. A hotfix that breaks something else is worse than the original bug.
+
+---
+
+## Phase 6: Update Bug Status and Deploy
+
+Update the original bug file if one exists:
+
+```markdown
+## Fix Record
+**Fixed in**: hotfix/[branch-name] — [commit hash or description]
+**Fixed date**: [date]
+**Status**: Fixed — Pending Verification
+```
+
+Set `**Status**: Fixed — Pending Verification` in the bug file header.
+
+Output a deployment summary:
+
+```
+## Hotfix Ready to Deploy: [short-name]
+
+**Severity**: [S1/S2]
+**Root cause**: [one line]
+**Fix**: [one line]
+**QA gate**: [Smoke check PASS / Team-QA APPROVED]
+**Approvals**: lead-programmer ✓ / qa-tester ✓ / producer ✓
+**Rollback plan**: [from Phase 2 record]
+
+Merge to: release branch AND development branch
+Next: /bug-report verify [BUG-ID] after deploy to confirm resolution
+```

 ### Rules
- Hotfixes must be the MINIMUM change to fix the issue — no cleanup, no refactoring, no "while we're here" changes
+- Hotfixes must be the MINIMUM change to fix the issue — no cleanup, no refactoring
 - Every hotfix must have a rollback plan documented before deployment
 - Hotfix branches merge to BOTH the release branch AND the development branch
 - All hotfixes require a post-incident review within 48 hours
- If the fix is complex enough to need more than 4 hours, escalate to technical-director for a scope decision
+- If the fix is complex enough to need more than 4 hours, escalate to `technical-director`

 ---

-## Phase 7: Next Steps
+## Phase 7: Post-Deploy Verification

-Verdict: **COMPLETE** — hotfix applied and backported.
+After deploying, run `/bug-report verify [BUG-ID]` to confirm the fix resolved the issue in the deployed build.

-After the fix is approved and merged:
+If VERIFIED FIXED: run `/bug-report close [BUG-ID]` to formally close it.
+If STILL PRESENT: the hotfix failed — immediately re-open, assess rollback, and escalate.

- Run `/smoke-check` to verify critical paths are intact.
- Run `/code-review` on the hotfix diff before merging to main.
- Schedule a post-incident review within 48 hours.
+Schedule a post-incident review within 48 hours using `/retrospective hotfix`.
--- a/.claude/skills/localize/SKILL.md
+++ b/.claude/skills/localize/SKILL.md
@ -1,20 +1,31 @@
 ---
 name: localize
-description: "Run the localization workflow: extract strings, validate localization readiness, check for hardcoded text, and generate translation-ready string tables."
-argument-hint: "[scan|extract|validate|status]"
+description: "Full localization pipeline: scan for hardcoded strings, extract and manage string tables, validate translations, generate translator briefings, run cultural/sensitivity review, manage VO localization, test RTL/platform requirements, enforce string freeze, and report coverage."
+argument-hint: "[scan|extract|validate|status|brief|cultural-review|vo-pipeline|rtl-check|freeze|qa]"
 user-invocable: true
 agent: localization-lead
-allowed-tools: Read, Glob, Grep, Write, Bash
+allowed-tools: Read, Glob, Grep, Write, Bash, Task, AskUserQuestion
 ---

-## Phase 1: Parse Subcommand
+# Localization Pipeline

-Determine the mode from the argument:
+Localization is not just translation — it is the full process of making a game
+feel native in every language and region. Poor localization breaks immersion,
+confuses players, and blocks platform certification. This skill covers the
+complete pipeline from string extraction through cultural review, VO recording,
+RTL layout testing, and localization QA sign-off.

- `scan` — Scan for localization issues (hardcoded strings, missing keys)
- `extract` — Extract new strings and generate/update string tables
- `validate` — Validate existing translations for completeness and format
- `status` — Report overall localization status
+**Modes:**
+- `scan` — Find hardcoded strings and localization anti-patterns (read-only)
+- `extract` — Extract strings and generate translation-ready tables
+- `validate` — Check translations for completeness, placeholders, and length
+- `status` — Coverage matrix across all locales
+- `brief` — Generate translator context briefing document for an external team
+- `cultural-review` — Flag culturally sensitive content, symbols, colours, idioms
+- `vo-pipeline` — Manage voice-over localization: scripts, recording specs, integration
+- `rtl-check` — Validate RTL language layout, mirroring, and font support
+- `freeze` — Enforce string freeze; lock source strings before translation begins
+- `qa` — Run the full localization QA cycle before release

 If no subcommand is provided, output usage and stop. Verdict: **FAIL** — missing required subcommand.

@ -24,16 +35,19 @@ If no subcommand is provided, output usage and stop. Verdict: **FAIL** — missi

 Search `src/` for hardcoded user-facing strings:

- String literals in UI code not wrapped in a localization function
+- String literals in UI code not wrapped in a localization function (`tr()`, `Tr()`, `NSLocalizedString`, `GetText`, etc.)
 - Concatenated strings that should be parameterized
 - Strings with positional placeholders (`%s`, `%d`) instead of named ones (`{playerName}`)
+- Format strings that mix locale-sensitive data (numbers, dates, currencies) without locale-aware formatting

 Search for localization anti-patterns:

 - Date/time formatting not using locale-aware functions
- Number formatting without locale awareness
- Text embedded in images or textures (flag asset files)
- Strings that assume left-to-right text direction
+- Number formatting without locale awareness (`1,000` vs `1.000`)
+- Text embedded in images or textures (flag asset files in `assets/`)
+- Strings that assume left-to-right text direction (positional layout, string assembly order)
+- Gender/plurality assumptions baked into string logic (must use plural forms or gender tokens)
+- Hardcoded punctuation (e.g. `"You won!"` — exclamation styles vary by locale)

 Report all findings with file paths and line numbers. This mode is read-only — no files are written.

@ -42,40 +56,50 @@ Report all findings with file paths and line numbers. This mode is read-only —
 ## Phase 2B: Extract Mode

 - Scan all source files for localized string references
- Compare against the existing string table (if any) in `assets/data/`
- Generate new entries for strings that don't have keys yet
+- Compare against the existing string table in `assets/data/strings/`
+- Generate new entries for strings not yet keyed
 - Suggest key names following the convention: `[category].[subcategory].[description]`
- Output a diff of new strings to add to the string table
+  - Example: `ui.hud.health_label`, `dialogue.npc.merchant.greeting`, `menu.main.play_button`
+- Each new entry must include a `context` field — a translator comment explaining:
+  - Where it appears (which screen, which scene)
+  - Maximum character length
+  - Any placeholder meaning (`{playerName}` = the player's chosen display name)
+  - Gender/plurality context if applicable

-Present the diff to the user. Ask: "May I write these new entries to `assets/data/strings/strings-[locale].json`?"
+Output a diff of new strings to add to the string table.
+
+Present the diff to the user. Ask: "May I write these new entries to `assets/data/strings/strings-en.json`?"

 If yes, write only the diff (new entries), not a full replacement. Verdict: **COMPLETE** — strings extracted and written.

-If no, stop here. Verdict: **BLOCKED** — user declined write.
-
 ---

 ## Phase 2C: Validate Mode

- Read all string table files in `assets/data/`
- Check each entry for:
-  - Missing translations (key exists but no translation for a locale)
-  - Placeholder mismatches (source has `{name}` but translation is missing it)
-  - String length violations (exceeds character limits for UI elements)
-  - Orphaned keys (translation exists but nothing references the key in code)
- Report validation results grouped by locale and severity. This mode is read-only — no files are written.
+Read all string table files in `assets/data/strings/`. For each locale, check:
+
+- **Completeness** — key exists in source (en) but no translation for this locale
+- **Placeholder mismatches** — source has `{name}` but translation omits it or adds extras
+- **String length violations** — translation exceeds the character limit recorded in the source `context` field
+- **Plural form count** — locale requires N plural forms; translation provides fewer
+- **Orphaned keys** — translation exists but nothing in `src/` references the key
+- **Stale translations** — source string changed after translation was written (flag for re-translation)
+- **Encoding** — non-ASCII characters present and font atlas supports them (flag if uncertain)
+
+Report validation results grouped by locale and severity. This mode is read-only — no files are written.

 ---

 ## Phase 2D: Status Mode

- Count total localizable strings
- Per locale: count translated, untranslated, and stale (source changed since translation)
+- Count total localizable strings in the source table
+- Per locale: count translated, untranslated, stale (source changed since translation)
 - Generate a coverage matrix:

 ```markdown
 ## Localization Status
 Generated: [Date]
+String freeze: [Active / Not yet called / Lifted]

 | Locale | Total | Translated | Missing | Stale | Coverage |
 |--------|-------|-----------|---------|-------|----------|
@ -83,25 +107,334 @@ Generated: [Date]
 | [locale] | [N] | [N] | [N] | [N] | [X]% |

 ### Issues
- [N] hardcoded strings found in source code
+- [N] hardcoded strings found in source code (run /localize scan)
 - [N] strings exceeding character limits
 - [N] placeholder mismatches
- [N] orphaned keys (can be cleaned up)
+- [N] orphaned keys
+- [N] strings added after freeze was called (freeze violations)
 ```

 This mode is read-only — no files are written.

 ---

-## Phase 3: Next Steps
+## Phase 2E: Brief Mode

- If scan found hardcoded strings: run `/localize extract` to begin extracting them.
- If validate found missing translations: share the report with the translation team.
- If approaching launch: run `/asset-audit` to verify all localized assets are present.
+Generate a translator context briefing document. This document is sent to the
+external translation team or localisation vendor alongside the string table export.
+
+Read:
+- `design/gdd/` — extract game genre, tone, setting, character names
+- `assets/data/strings/strings-en.json` — the source string table
+- Any existing lore or narrative documents in `design/narrative/`
+
+Generate `production/localization/translator-brief-[locale]-[date].md`:
+
+```markdown
+# Translator Brief — [Game Name] — [Locale]
+
+## Game Overview
+[2-3 paragraph summary of the game, genre, tone, and audience]
+
+## Tone and Voice
+- **Overall tone**: [e.g., "Darkly comic, not slapstick — think Terry Pratchett, not Looney Tunes"]
+- **Player address**: [e.g., "Second person, informal. Never formal 'vous' — always 'tu' for French"]
+- **Profanity policy**: [e.g., "Mild — PG-13 equivalent. Match intensity to source, do not soften or escalate"]
+- **Humour**: [e.g., "Wordplay exists — if a pun cannot translate, invent an equivalent local joke; do not translate literally"]
+
+## Character Glossary
+| Name | Role | Personality | Notes |
+|------|------|-------------|-------|
+| [Name] | [Role] | [Personality] | [Do not translate / transliterate as X] |
+
+## World Glossary
+| Term | Meaning | Notes |
+|------|---------|-------|
+| [Term] | [What it means] | [Keep in English / translate as X] |
+
+## Do Not Translate List
+The following must appear verbatim in all locales:
+- [Game name]
+- [UI terms that match in-engine labels]
+- [Brand or trademark names]
+
+## Placeholder Reference
+| Placeholder | What it represents | Example |
+|-------------|-------------------|---------|
+| `{playerName}` | Player's chosen display name | "Shadowblade" |
+| `{count}` | Integer quantity | "3" |
+
+## Character Limits
+Tight UI fields with hard limits are marked in the string table `context` field.
+Where no limit is stated, target ±30% of the English length as a guideline.
+
+## Contact
+Direct questions to: [placeholder for user/team contact]
+Delivery format: JSON, same schema as strings-en.json
+```
+
+Ask: "May I write this translator brief to `production/localization/translator-brief-[locale]-[date].md`?"
+
+---
+
+## Phase 2F: Cultural Review Mode
+
+Spawn `localization-lead` via Task. Ask them to audit the following for cultural sensitivity across the target locales (read from `assets/data/strings/` and `assets/`):
+
+### Content Areas to Review
+
+**Symbols and gestures**
+- Thumbs up, OK hand, peace sign — meanings vary by region
+- Religious or spiritual symbols in art, UI, or audio
+- National flags, map representations, disputed territories
+
+**Colours**
+- White (mourning in some Asian cultures), green (political associations in some regions), red (luck vs danger)
+- Alert/warning colours that conflict with cultural associations
+
+**Numbers**
+- 4 (death in Japanese/Chinese), 13, 666 — flag use in UI (room numbers, item counts, prices)
+
+**Humour and idioms**
+- Idioms that translate as offensive in other locales
+- Toilet/bodily humour that is inappropriate in some markets (notably Japan, Germany, Middle East)
+- Dark humour around topics that are culturally sensitive in specific regions
+
+**Violence and content ratings**
+- Content that would require ratings changes in DE (Germany), AU (Australia), CN (China), or AE (UAE)
+- Blood colour, gore level, drug references — flag all for region-specific asset variants if needed
+
+**Names and representations**
+- Character names that are offensive, profane, or carry negative meaning in target locales
+- Stereotyped representation of nationalities, religions, or ethnic groups
+
+Present findings as a table:
+
+| Finding | Locale(s) Affected | Severity | Recommended Action |
+|---------|--------------------|----------|--------------------|
+| [Description] | [Locale] | [BLOCKING / ADVISORY / NOTE] | [Change / Flag for review / Accept] |
+
+BLOCKING = must fix before shipping that locale. ADVISORY = recommend change. NOTE = informational only.
+
+Ask: "May I write this cultural review report to `production/localization/cultural-review-[date].md`?"
+
+---
+
+## Phase 2G: VO Pipeline Mode
+
+Manage the voice-over localization process. Determine the sub-task from the argument:
+
+- `vo-pipeline scan` — identify all dialogue lines that require VO recording
+- `vo-pipeline script` — generate recording scripts with director notes
+- `vo-pipeline validate` — check that all recorded VO files are present and correctly named
+- `vo-pipeline integrate` — verify VO files are correctly referenced in code/assets
+
+### VO Pipeline: Scan
+
+Read `assets/data/strings/` and `design/narrative/`. Identify:
+- All dialogue lines (keys matching `dialogue.*`) with source text
+- Lines already recorded (audio file exists in `assets/audio/vo/`)
+- Lines not yet recorded
+
+Output a recording manifest:
+
+```
+## VO Recording Manifest — [Date]
+
+| Key | Character | Source Line | Status |
+|-----|-----------|-------------|--------|
+| dialogue.npc.merchant.greeting | Merchant | "Welcome, traveller." | Recorded |
+| dialogue.npc.merchant.haggle | Merchant | "That's my final offer." | Needs recording |
+```
+
+### VO Pipeline: Script
+
+Generate a recording script document for each character, grouped by scene. Include:
+
+- Character name and brief personality note
+- Full dialogue line with pronunciation guide for unusual proper nouns
+- Emotion/direction note for each line (`[Warm, welcoming]`, `[Annoyed, clipped]`)
+- Any lines that are responses in a conversation (provide context: "Player just said X")
+
+Ask: "May I write the VO recording scripts to `production/localization/vo-scripts-[locale]-[date].md`?"
+
+### VO Pipeline: Validate
+
+Glob `assets/audio/vo/[locale]/` for all `.wav`/`.ogg` files. Cross-reference against the VO manifest. Report:
+- Missing files (line in script, no audio file)
+- Extra files (audio file exists, no matching string key)
+- Naming convention violations
+
+### VO Pipeline: Integrate
+
+Grep `src/` for VO audio references. Verify each referenced path exists in `assets/audio/vo/[locale]/`. Report broken references.
+
+---
+
+## Phase 2H: RTL Check Mode
+
+Right-to-left languages (Arabic, Hebrew, Persian, Urdu) require layout mirroring beyond
+just translating text. This mode validates the implementation.
+
+Read `.claude/docs/technical-preferences.md` to determine the engine. Then check:
+
+**Layout mirroring**
+- Is RTL layout enabled in the engine? (Godot: `Control.layout_direction`, Unity: `RTL Support` package, Unreal: text direction flags)
+- Are all UI containers set to auto-mirror, or are positions hardcoded?
+- Do progress bars, health bars, and directional indicators mirror correctly?
+
+**Text rendering**
+- Are fonts loaded that support Arabic/Hebrew character sets?
+- Is Arabic text rendered with correct ligatures (connected script)?
+- Are numbers displayed as Eastern Arabic numerals where required?
+
+**String assembly**
+- Are there any string concatenations that assume left-to-right reading order?
+- Do `{placeholder}` positions in sentences work correctly when sentence structure is reversed?
+
+**Asset review**
+- Are there UI icons with directional arrows or asymmetric designs that need mirrored variants?
+- Do any text-in-image assets exist that require RTL versions?
+
+Grep patterns to check:
+- Engine-specific RTL flags in scene/prefab files
+- Any `HBoxContainer`, `LinearLayout`, `HorizontalBox` nodes — verify layout_direction settings
+- String concatenation with `+` near dialogue or UI code
+
+Report findings. Flag BLOCKING issues (content unreadable without fix) vs ADVISORY (cosmetic improvements).
+
+Ask: "May I write this RTL check report to `production/localization/rtl-check-[date].md`?"
+
+---
+
+## Phase 2I: Freeze Mode
+
+String freeze locks the source (English) string table so that translations can proceed
+without the source changing under the translators.
+
+### freeze call
+
+Check current freeze status in `production/localization/freeze-status.md` (if it exists).
+
+If already frozen:
+> "String freeze is currently ACTIVE (called [date]). [N] strings have been added or modified since freeze. These are freeze violations — they require re-translation or an approved freeze lift."
+
+If not frozen, present the pre-freeze checklist:
+
+```
+Pre-Freeze Checklist
+[ ] All planned UI screens are implemented
+[ ] All dialogue lines are final (no further narrative revisions planned)
+[ ] All system strings (error messages, tutorial text) are complete
+[ ] /localize scan shows zero hardcoded strings
+[ ] /localize validate shows no placeholder mismatches in source (en)
+[ ] Marketing strings (store description, achievements) are final
+```
+
+Use `AskUserQuestion`:
+- Prompt: "Are all items above confirmed? Calling string freeze locks the source table."
+- Options: `[A] Yes — call string freeze now` / `[B] No — I still have strings to add`
+
+If [A]: Write `production/localization/freeze-status.md`:
+
+```markdown
+# String Freeze Status
+
+**Status**: ACTIVE
+**Called**: [date]
+**Called by**: [user]
+**Total strings at freeze**: [N]
+
+## Post-Freeze Changes
+[Any strings added or modified after freeze are listed here automatically by /localize extract]
+```
+
+### freeze lift
+
+If argument includes `lift`: update `freeze-status.md` Status to `LIFTED`, record the reason and date. Warn: "Lifting the freeze requires re-translation of all modified strings. Notify the translation team."
+
+### freeze check (auto-integrated into extract)
+
+When `extract` mode finds new or modified strings and `freeze-status.md` shows Status: ACTIVE — append the new keys to `## Post-Freeze Changes` and warn:
+> "⚠️ String freeze is active. [N] new/modified strings have been added. These are freeze violations. Notify your localization vendor before proceeding."
+
+---
+
+## Phase 2J: QA Mode
+
+Localization QA is a dedicated pass that runs after translations are delivered but
+before any locale ships. This is not the same as `/validate` (which checks completeness)
+— this is a structured playthrough-based quality check.
+
+Spawn `localization-lead` via Task with:
+- The target locale(s) to QA
+- The list of all screens/flows in the game (from `design/gdd/` or `/content-audit` output)
+- The current `/localize validate` report
+- The cultural review report (if it exists)
+
+Ask the localization-lead to produce a QA plan covering:
+
+1. **Functional string check** — every string displays in-game without truncation, placeholder errors, or encoding corruption
+2. **UI overflow check** — translated strings that exceed UI bounds (even if within character limits, some languages expand)
+3. **Contextual accuracy** — a sample of 10% of strings reviewed in-game for translation accuracy and natural phrasing
+4. **Cultural review items** — verify all BLOCKING items from the cultural review are resolved
+5. **VO sync check** — if VO exists, verify lip sync or subtitle timing is acceptable after translation
+6. **Platform cert requirements** — check platform-specific localization requirements (age ratings text, legal notices, ESRB/PEGI/CERO text)
+
+Output a QA verdict per locale:
+
+```
+## Localization QA Verdict — [Locale]
+
+**Status**: PASS / PASS WITH CONDITIONS / FAIL
+**Reviewed by**: localization-lead
+**Date**: [date]
+
+### Findings
+| ID | Area | Description | Severity | Status |
+|----|------|-------------|----------|--------|
+| LOC-001 | UI Overflow | "Settings" button text overflows on [Screen] | BLOCKING | Open |
+| LOC-002 | Translation | [Key] translation is literal — sounds unnatural | ADVISORY | Open |
+
+### Conditions (if PASS WITH CONDITIONS)
+- [Condition 1 — must resolve before ship]
+
+### Sign-Off
+[ ] All BLOCKING findings resolved
+[ ] Producer approves shipping [Locale]
+```
+
+Ask: "May I write this localization QA report to `production/localization/loc-qa-[locale]-[date].md`?"
+
+**Gate integration**: The Polish → Release gate requires a PASS or PASS WITH CONDITIONS verdict for every locale being shipped. A FAIL blocks release for that locale only — other locales may still proceed if their QA passes.
+
+---
+
+## Phase 3: Rules and Next Steps

 ### Rules
 - English (en) is always the source locale
- Every string table entry must include a translator comment explaining context
+- Every string table entry must include a `context` field with translator notes, character limits, and placeholder meaning
 - Never modify translation files directly — generate diffs for review
- Character limits must be defined per-UI-element and enforced automatically
- Right-to-left (RTL) language support should be considered from the start, not bolted on later
+- Character limits must be defined per-UI-element and enforced in validate mode
+- String freeze must be called before sending strings to translators — never translate a moving target
+- RTL support must be designed in from the start — retrofitting RTL layout is expensive
+- Cultural review is required for any locale where the game will be sold commercially
+- VO scripts must include director notes — raw dialogue lines produce flat recordings
+
+### Recommended Workflow
+
+```
+/localize scan            → find hardcoded strings
+/localize extract         → build string table
+/localize freeze          → lock source before sending to translators
+/localize brief           → generate translator briefing document
+[Send to translators]
+/localize validate        → check returned translations
+/localize cultural-review → flag culturally sensitive content
+/localize rtl-check       → if shipping Arabic / Hebrew / Persian
+/localize vo-pipeline     → if shipping dubbed VO
+/localize qa              → full localization QA pass
+```
+
+After `qa` returns PASS for all shipping locales, include the QA report path when running `/gate-check release`.
--- a/.claude/skills/map-systems/SKILL.md
+++ b/.claude/skills/map-systems/SKILL.md
@ -3,12 +3,12 @@ name: map-systems
 description: "Decompose a game concept into individual systems, map dependencies, prioritize design order, and create the systems index."
 argument-hint: "[next | system-name] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion, TodoWrite
+allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion, TodoWrite, Task
 ---

 When this skill is invoked:

-## 1. Parse Arguments
+## Parse Arguments

 Two modes:

@ -17,12 +17,16 @@ Two modes:
 - **`next`**: `/map-systems next` — Pick the highest-priority undesigned system
  from the index and hand off to `/design-system` (Phase 6).

-Also extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Also resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 ---

-## 2. Phase 1: Read Concept (Required Context)
+## Phase 1: Read Concept (Required Context)

 Read the game concept and any existing design work. This provides the raw material
 for systems decomposition.
@ -48,7 +52,7 @@ for systems decomposition.

 ---

-## 3. Phase 2: Systems Enumeration (Collaborative)
+## Phase 2: Systems Enumeration (Collaborative)

 Extract and identify all systems the game needs. This is the creative core of the
 skill — it requires human judgment because concept docs rarely enumerate every
@ -101,7 +105,7 @@ Iterate until the user approves the enumeration.

 ---

-## 4. Phase 3: Dependency Mapping (Collaborative)
+## Phase 3: Dependency Mapping (Collaborative)

 For each system, determine what it depends on. A system "depends on" another if
 it cannot function without that other system existing first.
@ -140,6 +144,11 @@ Show the dependency map as a layered list. Highlight:
 Use `AskUserQuestion` to ask: "Does this dependency ordering look right? Any
 dependencies I'm missing or that should be removed?"

+**Review mode check** — apply before spawning TD-SYSTEM-BOUNDARY:
+- `solo` → skip. Note: "TD-SYSTEM-BOUNDARY skipped — Solo mode." Proceed to priority assignment.
+- `lean` → skip (not a PHASE-GATE). Note: "TD-SYSTEM-BOUNDARY skipped — Lean mode." Proceed to priority assignment.
+- `full` → spawn as normal.
+
 **After dependency mapping is approved, spawn `technical-director` via Task using gate TD-SYSTEM-BOUNDARY (`.claude/docs/director-gates.md`) before proceeding to priority assignment.**

 Pass: the dependency map summary, layer assignments, bottleneck systems list, any circular dependency resolutions.
@ -148,7 +157,7 @@ Present the assessment. If REJECT, revise the system boundaries with the user be

 ---

-## 5. Phase 4: Priority Assignment (Collaborative)
+## Phase 4: Priority Assignment (Collaborative)

 Assign each system to a priority tier based on what milestone it's needed for.

@ -172,6 +181,18 @@ Which systems should be higher or lower priority?"
 Explain reasoning in conversation: "I placed [system] in MVP because the core loop
 requires it — without [system], the 30-second loop can't function."

+**"Why" column guidance**: When explaining why each system was placed in a priority tier, mix technical necessity with player-experience reasoning. Do not use purely technical justifications like "Combat needs damage math" — connect to player experience where relevant. Examples of good "Why" entries:
+- "Required for the core loop — without it, placement decisions have no consequence (Pillar 2: Placement is the Puzzle)"
+- "Ballista's punch-through identity is established here — this stat definition is what makes it feel different from Archer"
+- "Foundation for all economy decisions — players must understand upgrade costs to make meaningful placement choices"
+
+Pure technical necessity ("X depends on Y") is insufficient alone when the system directly shapes player experience.
+
+**Review mode check** — apply before spawning PR-SCOPE:
+- `solo` → skip. Note: "PR-SCOPE skipped — Solo mode." Proceed to writing the systems index.
+- `lean` → skip (not a PHASE-GATE). Note: "PR-SCOPE skipped — Lean mode." Proceed to writing the systems index.
+- `full` → spawn as normal.
+
 **After priorities are approved, spawn `producer` via Task using gate PR-SCOPE (`.claude/docs/director-gates.md`) before writing the index.**

 Pass: total system count per milestone tier, estimated implementation volume per tier (system count × average complexity), team size, stated project timeline.
@ -191,7 +212,7 @@ This is the order the team should write GDDs in.

 ---

-## 6. Phase 5: Create Systems Index (Write)
+## Phase 5: Create Systems Index (Write)

 ### Step 5a: Draft the Document

@ -215,6 +236,11 @@ Ask: "May I write the systems index to `design/gdd/systems-index.md`?"

 Wait for approval. Write the file only after "yes."

+**Review mode check** — apply before spawning CD-SYSTEMS:
+- `solo` → skip. Note: "CD-SYSTEMS skipped — Solo mode." Proceed to Phase 7 next steps.
+- `lean` → skip (not a PHASE-GATE). Note: "CD-SYSTEMS skipped — Lean mode." Proceed to Phase 7 next steps.
+- `full` → spawn as normal.
+
 **After the systems index is written, spawn `creative-director` via Task using gate CD-SYSTEMS (`.claude/docs/director-gates.md`).**

 Pass: systems index path, game pillars and core fantasy (from `design/gdd/game-concept.md`), MVP priority tier system list.
@ -234,7 +260,7 @@ If the user declined: **Verdict: BLOCKED** — user did not approve the write.

 ---

-## 7. Phase 6: Design Individual Systems (Handoff to /design-system)
+## Phase 6: Design Individual Systems (Handoff to /design-system)

 This phase is entered when:
 - The user says "yes" to designing systems after creating the index
@ -280,16 +306,20 @@ If continuing, return to Step 6a.

 ---

-## 8. Phase 7: Suggest Next Steps
+## Phase 7: Suggest Next Steps

-After the systems index is created (or after designing some systems), suggest
-the appropriate next actions:
+After the systems index is created (or after designing some systems), present next actions using `AskUserQuestion`:

- "Run `/design-system [system-name]` to write the next system's GDD"
- "Run `/design-review [path]` on each completed GDD to validate quality"
- "Run `/gate-check pre-production` to check if you're ready to start building"
- "Prototype the highest-risk system with `/prototype [system]`"
- "Plan the first implementation sprint with `/sprint-plan new`"
+- "Systems index is written. What would you like to do next?"
+  - [A] Start designing GDDs — run `/design-system [first-system-in-order]`
+  - [B] Ask a director to review the index first — ask `creative-director` or `technical-director` to validate the system set before committing to 10+ GDD sessions
+  - [C] Stop here for this session
+
+**The director review option ([B]) is worth highlighting**: having a Creative Director or Technical Director review the completed systems index before starting GDD authoring catches scope issues, missing systems, and boundary problems before they're locked in across many documents. It is optional but recommended for new projects.
+
+After any individual GDD is completed:
+- "Run `/design-review design/gdd/[system].md` in a fresh session to validate quality"
+- "Run `/gate-check systems-design` when all MVP GDDs are complete"

 ---

@ -314,3 +344,11 @@ This skill follows the collaborative design principle at every phase:
 **Never** auto-generate the full systems list and write it without review.
 **Never** start designing a system without user confirmation.
 **Always** show the enumeration, dependencies, and priorities for user validation.
+
+## Context Window Awareness
+
+If context reaches or exceeds 70% at any point, append this notice:
+
+> **Context is approaching the limit (≥70%).** The systems index is saved to
+> `design/gdd/systems-index.md`. Open a fresh Claude Code session to continue
+> designing individual GDDs — run `/map-systems next` to pick up where you left off.
--- a/.claude/skills/milestone-review/SKILL.md
+++ b/.claude/skills/milestone-review/SKILL.md
@ -3,13 +3,17 @@ name: milestone-review
 description: "Generates a comprehensive milestone progress review including feature completeness, quality metrics, risk assessment, and go/no-go recommendation. Use at milestone checkpoints or when evaluating readiness for a milestone deadline."
 argument-hint: "[milestone-name|current] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
 ---

 ## Phase 0: Parse Arguments

-Extract the milestone name (`current` or a specific name) and any `--review [full|lean|solo]`
-flag. Store the review mode as the override for this run (see `.claude/docs/director-gates.md`).
+Extract the milestone name (`current` or a specific name) and resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 ---

@ -104,6 +108,11 @@ Read all sprint reports for sprints within this milestone from `production/sprin

 ## Phase 3b: Producer Risk Assessment

+**Review mode check** — apply before spawning PR-MILESTONE:
+- `solo` → skip. Note: "PR-MILESTONE skipped — Solo mode." Present the Go/No-Go section without a producer verdict.
+- `lean` → skip (not a PHASE-GATE). Note: "PR-MILESTONE skipped — Lean mode." Present the Go/No-Go section without a producer verdict.
+- `full` → spawn as normal.
+
 Before generating the Go/No-Go recommendation, spawn `producer` via Task using gate **PR-MILESTONE** (`.claude/docs/director-gates.md`).

 Pass: milestone name and target date, current completion percentage, blocked story count, velocity data from sprint reports (if available), list of cut candidates.
--- a/.claude/skills/playtest-report/SKILL.md
+++ b/.claude/skills/playtest-report/SKILL.md
@ -3,13 +3,17 @@ name: playtest-report
 description: "Generates a structured playtest report template or analyzes existing playtest notes into a structured format. Use this to standardize playtest feedback collection and analysis."
 argument-hint: "[new|analyze path-to-notes] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
 ---

 ## Phase 1: Parse Arguments

-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 Determine the mode:

@ -112,6 +116,11 @@ Present the categorized list, then route:

 ## Phase 3b: Creative Director Player Experience Review

+**Review mode check** — apply before spawning CD-PLAYTEST:
+- `solo` → skip. Note: "CD-PLAYTEST skipped — Solo mode." Proceed to Phase 4 (save the report).
+- `lean` → skip (not a PHASE-GATE). Note: "CD-PLAYTEST skipped — Lean mode." Proceed to Phase 4 (save the report).
+- `full` → spawn as normal.
+
 After categorising findings, spawn `creative-director` via Task using gate **CD-PLAYTEST** (`.claude/docs/director-gates.md`).

 Pass: the structured report content, game pillars and core fantasy (from `design/gdd/game-concept.md`), the specific hypothesis being tested.
--- a/.claude/skills/project-stage-detect/SKILL.md
+++ b/.claude/skills/project-stage-detect/SKILL.md
@ -4,7 +4,6 @@ description: "Automatically analyze project state, detect stage, identify gaps,
 argument-hint: "[optional: role filter like 'programmer' or 'designer']"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Bash, Write
-context: fork
 model: haiku
 # Read-only diagnostic skill — no specialist agent delegation needed
 ---
--- a/.claude/skills/propagate-design-change/SKILL.md
+++ b/.claude/skills/propagate-design-change/SKILL.md
@ -4,7 +4,6 @@ description: "When a GDD is revised, scans all ADRs and the traceability index t
 argument-hint: "[path/to/changed-gdd.md]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Bash
-context: fork
 agent: technical-director
 ---

--- a/.claude/skills/prototype/SKILL.md
+++ b/.claude/skills/prototype/SKILL.md
@ -4,15 +4,18 @@ description: "Rapid prototyping workflow. Skips normal standards to quickly vali
 argument-hint: "[concept-description] [--review full|lean|solo]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit, Bash, Task
-context: fork
 agent: prototyper
 isolation: worktree
 ---

 ## Phase 1: Define the Question

-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 Read the concept description from the argument. Identify the core question this prototype must answer. If the concept is vague, state the question explicitly before proceeding — a prototype without a clear question wastes time.

@ -113,6 +116,11 @@ If yes, write the file.

 ## Phase 6: Creative Director Review

+**Review mode check** — apply before spawning CD-PLAYTEST:
+- `solo` → skip. Note: "CD-PLAYTEST skipped — Solo mode." Proceed to Phase 7 summary with the prototyper's recommendation as the final verdict.
+- `lean` → skip (not a PHASE-GATE). Note: "CD-PLAYTEST skipped — Lean mode." Proceed to Phase 7 summary with the prototyper's recommendation as the final verdict.
+- `full` → spawn as normal.
+
 Spawn `creative-director` via Task using gate **CD-PLAYTEST** (`.claude/docs/director-gates.md`).

 Pass: the full REPORT.md content, the original design question, game pillars and core fantasy from `design/gdd/game-concept.md` (if it exists).
--- a/.claude/skills/qa-plan/SKILL.md
+++ b/.claude/skills/qa-plan/SKILL.md
@ -3,8 +3,7 @@ name: qa-plan
 description: "Generate a QA test plan for a sprint or feature. Reads GDDs and story files, classifies stories by test type (Logic/Integration/Visual/UI), and produces a structured test plan covering automated tests required, manual test cases, smoke test scope, and playtest sign-off requirements. Run before sprint begins or when starting a major feature."
 argument-hint: "[sprint | feature: system-name | story: path]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Write, AskUserQuestion
 agent: qa-lead
 ---

--- a/.claude/skills/quick-design/SKILL.md
+++ b/.claude/skills/quick-design/SKILL.md
@ -4,7 +4,6 @@ description: "Lightweight design spec for small changes — tuning adjustments,
 argument-hint: "[brief description of the change]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit
-context: fork
 ---

 # Quick Design
@ -55,8 +54,10 @@ Before drafting anything, read the relevant context:

 - Search `design/gdd/` for the GDD most relevant to this change. Read the
  sections that this change would affect.
- Read `design/gdd/systems-index.md` to understand where this system sits in
-  the dependency graph and what tier it belongs to.
+- Check whether `design/gdd/systems-index.md` exists. If it does, read it to
+  understand where this system sits in the dependency graph and what tier it
+  belongs to. If it does not exist, note "No systems index found — skipping
+  dependency tier check." and continue.
 - Check `design/quick-specs/` for any prior quick specs that touched this
  system — avoid contradicting them.
 - If this is a Tuning change, also check `assets/data/` for the data file that
--- a/.claude/skills/regression-suite/SKILL.md
+++ b/.claude/skills/regression-suite/SKILL.md
@ -4,7 +4,6 @@ description: "Map test coverage to GDD critical paths, identify fixed bugs witho
 argument-hint: "[update | audit | report]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit
-context: fork
 ---

 # Regression Suite
--- a/.claude/skills/reverse-document/SKILL.md
+++ b/.claude/skills/reverse-document/SKILL.md
@ -4,7 +4,6 @@ description: "Generate design or architecture documents from existing implementa
 argument-hint: "<type> <path> (e.g., 'design src/gameplay/combat' or 'architecture src/core')"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit, Bash
-context: fork
 # Read-only diagnostic skill — no specialist agent delegation needed
 ---

--- a/.claude/skills/review-all-gdds/SKILL.md
+++ b/.claude/skills/review-all-gdds/SKILL.md
@ -3,9 +3,7 @@ name: review-all-gdds
 description: "Holistic cross-GDD consistency and game design review. Reads all system GDDs simultaneously and checks for contradictions between them, stale references, ownership conflicts, formula incompatibilities, and game design theory violations (dominant strategies, economic imbalance, cognitive overload, pillar drift). Run after all MVP GDDs are written, before architecture begins."
 argument-hint: "[focus: full | consistency | design-theory | since-last-review]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Bash
-context: fork
-agent: game-designer
+allowed-tools: Read, Glob, Grep, Write, Bash, AskUserQuestion, Task
 model: opus
 ---

@ -546,16 +544,16 @@ FAIL: One or more blocking issues must be resolved before architecture begins.

 ## Phase 6: Write Report and Flag GDDs

-Ask: "May I write this review to `design/gdd/gdd-cross-review-[date].md`?"
+Use `AskUserQuestion` for write permission:
+- Prompt: "May I write this review to `design/gdd/gdd-cross-review-[date].md`?"
+- Options: `[A] Yes — write the report` / `[B] No — skip`

-If any GDDs are flagged for revision:
-
-Ask: "Should I update the systems index to mark these GDDs as needing revision?"
- If yes: for each flagged GDD, update its Status field in systems-index.md
-  to "Needs Revision" with a short note in the adjacent Notes/Description column.
+If any GDDs are flagged for revision, use a second `AskUserQuestion`:
+- Prompt: "Should I update the systems index to mark these GDDs as needing revision? ([list of flagged GDDs])"
+- Options: `[A] Yes — update systems index` / `[B] No — leave as-is`
+- If yes: update each flagged GDD's Status field in systems-index.md to "Needs Revision".
  (Do NOT append parentheticals to the status value — other skills match "Needs Revision"
  as an exact string and parentheticals break that match.)
-  Ask approval before writing.

 ### Session State Update

@ -577,18 +575,27 @@ Confirm in conversation: "Session state updated."

 ## Phase 7: Handoff

-After the report is written:
+After all file writes are complete, use `AskUserQuestion` for a closing widget.

- **If FAIL**: "Resolve the blocking issues in the flagged GDDs, then re-run
-  `/review-all-gdds` to confirm they're cleared before starting architecture."
- **If CONCERNS**: "Warnings are present but not blocking. You may proceed to
-  `/create-architecture` and resolve warnings in parallel, or resolve them now
-  for a cleaner baseline."
- **If PASS**: "GDDs are internally consistent. Run `/create-architecture` to
-  begin translating the design into an engine-aware technical blueprint."
+Before building options, check project state:
+- Are there any Warning-level items that are simple edits (flagged with "30-second edit", "brief addition", or similar)? → offer inline quick-fix option
+- Are any GDDs in the "Flagged for Revision" table? → offer /design-review option for each
+- Read systems-index.md for the next system with Status: Not Started → offer /design-system option
+- Is the verdict PASS or CONCERNS? → offer /gate-check or /create-architecture

-Gate reminder: `/gate-check technical-setup` now requires a PASS or CONCERNS
-verdict from this review before architecture work can begin.
+Build the option list dynamically — only include options that apply:
+
+**Option pool:**
+- `[_] Apply quick fix: [W-XX description] in [gdd-name].md — [effort estimate]` (one option per simple-edit warning; only for Warning-level, not Blocking)
+- `[_] Run /design-review [flagged-gdd-path] — address flagged warnings` (one per flagged GDD, if any)
+- `[_] Run /design-system [next-system] — next in design order` (always include, name the actual system)
+- `[_] Run /create-architecture — begin architecture (verdict is PASS/CONCERNS)` (include if verdict is not FAIL)
+- `[_] Run /gate-check — validate Systems Design phase gate` (include if verdict is PASS)
+- `[_] Stop here`
+
+Assign letters A, B, C… only to included options. Mark the most pipeline-advancing option as `(recommended)`.
+
+Never end the skill with plain text. Always close with this widget.

 ---

--- a/.claude/skills/scope-check/SKILL.md
+++ b/.claude/skills/scope-check/SKILL.md
@ -4,7 +4,6 @@ description: "Analyze a feature or sprint for scope creep by comparing current s
 argument-hint: "[feature-name or sprint-N]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Bash
-context: fork
 model: haiku
 ---

--- a/.claude/skills/security-audit/SKILL.md
+++ b/.claude/skills/security-audit/SKILL.md
@ -0,0 +1,244 @@
+---
+name: security-audit
+description: "Audit the game for security vulnerabilities: save tampering, cheat vectors, network exploits, data exposure, and input validation gaps. Produces a prioritised security report with remediation guidance. Run before any public release or multiplayer launch."
+argument-hint: "[full | network | save | input | quick]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Bash, Write, Task
+agent: security-engineer
+---
+
+# Security Audit
+
+Security is not optional for any shipped game. Even single-player games have
+save tampering vectors. Multiplayer games have cheat surfaces, data exposure
+risks, and denial-of-service potential. This skill systematically audits the
+codebase for the most common game security failures and produces a prioritised
+remediation plan.
+
+**Run this skill:**
+- Before any public release (required for the Polish → Release gate)
+- Before enabling any online/multiplayer feature
+- After implementing any system that reads from disk or network
+- When a security-related bug is reported
+
+**Output:** `production/security/security-audit-[date].md`
+
+---
+
+## Phase 1: Parse Arguments and Scope
+
+**Modes:**
+- `full` — all categories (recommended before release)
+- `network` — network/multiplayer only
+- `save` — save file and serialization only
+- `input` — input validation and injection only
+- `quick` — high-severity checks only (fastest, for iterative use)
+- No argument — run `full`
+
+Read `.claude/docs/technical-preferences.md` to determine:
+- Engine and language (affects which patterns to search for)
+- Target platforms (affects which attack surfaces apply)
+- Whether multiplayer/networking is in scope
+
+---
+
+## Phase 2: Spawn Security Engineer
+
+Spawn `security-engineer` via Task. Pass:
+- The audit scope/mode
+- Engine and language from technical preferences
+- A manifest of all source directories: `src/`, `assets/data/`, any config files
+
+The security-engineer runs the audit across 6 categories (see Phase 3). Collect their full findings before proceeding.
+
+---
+
+## Phase 3: Audit Categories
+
+The security-engineer evaluates each of the following. Skip categories not applicable to the project scope.
+
+### Category 1: Save File and Serialization Security
+- Are save files validated before loading? (no blind deserialization)
+- Are save file paths constructed from user input? (path traversal risk)
+- Are save files checksummed or signed? (tamper detection)
+- Does the game trust numeric values from save files without bounds checking?
+- Are there any eval() or dynamic code execution calls near save loading?
+
+Grep patterns: `File.open`, `load`, `deserialize`, `JSON.parse`, `from_json`, `read_file` — check each for validation.
+
+### Category 2: Network and Multiplayer Security (skip if single-player only)
+- Is game state authoritative on the server, or does the client dictate outcomes?
+- Are incoming network packets validated for size, type, and value range?
+- Are player positions and state changes validated server-side?
+- Is there rate limiting on any network calls?
+- Are authentication tokens handled correctly (never sent in plaintext)?
+- Does the game expose any debug endpoints in release builds?
+
+Grep for: `recv`, `receive`, `PacketPeer`, `socket`, `NetworkedMultiplayerPeer`, `rpc`, `rpc_id` — check each call site for validation.
+
+### Category 3: Input Validation
+- Are any player-supplied strings used in file paths? (path traversal)
+- Are any player-supplied strings logged without sanitization? (log injection)
+- Are numeric inputs (e.g., item quantities, character stats) bounds-checked before use?
+- Are achievement/stat values checked before being written to any backend?
+
+Grep for: `get_input`, `Input.get_`, `input_map`, user-facing text fields — check validation.
+
+### Category 4: Data Exposure
+- Are any API keys, credentials, or secrets hardcoded in `src/` or `assets/`?
+- Are debug symbols or verbose error messages included in release builds?
+- Does the game log sensitive player data to disk or console?
+- Are any internal file paths or system information exposed to players?
+
+Grep for: `api_key`, `secret`, `password`, `token`, `private_key`, `DEBUG`, `print(` in release-facing code.
+
+### Category 5: Cheat and Anti-Tamper Vectors
+- Are gameplay-critical values stored only in memory, not in easily-editable files?
+- Are any critical game progression flags (e.g., "has paid for DLC") validated server-side?
+- Is there any protection against memory editing tools (Cheat Engine, etc.) for multiplayer?
+- Are leaderboard/score submissions validated before acceptance?
+
+Note: Client-side anti-cheat is largely unenforceable. Focus on server-side validation for anything competitive or monetised.
+
+### Category 6: Dependency and Supply Chain
+- Are any third-party plugins or libraries used? List them.
+- Do any plugins have known CVEs in the version being used?
+- Are plugin sources verified (official marketplace, reviewed repository)?
+
+Glob for: `addons/`, `plugins/`, `third_party/`, `vendor/` — list all external dependencies.
+
+---
+
+## Phase 4: Classify Findings
+
+For each finding, assign:
+
+**Severity:**
+| Level | Definition |
+|-------|-----------|
+| **CRITICAL** | Remote code execution, data breach, or trivially-exploitable cheat that breaks multiplayer integrity |
+| **HIGH** | Save tampering that bypasses progression, credential exposure, or server-side authority bypass |
+| **MEDIUM** | Client-side cheat enablement, information disclosure, or input validation gap with limited impact |
+| **LOW** | Defence-in-depth improvement — hardening that reduces attack surface but no direct exploit exists |
+
+**Status:** Open / Accepted Risk / Out of Scope
+
+---
+
+## Phase 5: Generate Report
+
+```markdown
+# Security Audit Report
+
+**Date**: [date]
+**Scope**: [full | network | save | input | quick]
+**Engine**: [engine + version]
+**Audited by**: security-engineer via /security-audit
+**Files scanned**: [N source files, N config files]
+
+---
+
+## Executive Summary
+
+| Severity | Count | Must Fix Before Release |
+|----------|-------|------------------------|
+| CRITICAL | [N] | Yes — all |
+| HIGH | [N] | Yes — all |
+| MEDIUM | [N] | Recommended |
+| LOW | [N] | Optional |
+
+**Release recommendation**: [CLEAR TO SHIP / FIX CRITICALS FIRST / DO NOT SHIP]
+
+---
+
+## CRITICAL Findings
+
+### SEC-001: [Title]
+**Category**: [Save / Network / Input / Data / Cheat / Dependency]
+**File**: `[path]` line [N]
+**Description**: [What the vulnerability is]
+**Attack scenario**: [How a malicious user would exploit it]
+**Remediation**: [Specific code change or pattern to apply]
+**Effort**: [Low / Medium / High]
+
+[repeat per finding]
+
+---
+
+## HIGH Findings
+
+[same format]
+
+---
+
+## MEDIUM Findings
+
+[same format]
+
+---
+
+## LOW Findings
+
+[same format]
+
+---
+
+## Accepted Risk
+
+[Any findings explicitly accepted by the team with rationale]
+
+---
+
+## Dependency Inventory
+
+| Plugin / Library | Version | Source | Known CVEs |
+|-----------------|---------|--------|------------|
+| [name] | [version] | [source] | [none / CVE-XXXX-NNNN] |
+
+---
+
+## Remediation Priority Order
+
+1. [SEC-NNN] — [1-line description] — Est. effort: [Low/Medium/High]
+2. ...
+
+---
+
+## Re-Audit Trigger
+
+Run `/security-audit` again after remediating any CRITICAL or HIGH findings.
+The Polish → Release gate requires this report with no open CRITICAL or HIGH items.
+```
+
+---
+
+## Phase 6: Write Report
+
+Present the report summary (executive summary + CRITICAL/HIGH findings only) in conversation.
+
+Ask: "May I write the full security audit report to `production/security/security-audit-[date].md`?"
+
+Write only after approval.
+
+---
+
+## Phase 7: Gate Integration
+
+This report is a required artifact for the **Polish → Release gate**.
+
+After remediating findings, re-run: `/security-audit quick` to confirm CRITICAL/HIGH items are resolved before running `/gate-check release`.
+
+If CRITICAL findings exist:
+> "⛔ CRITICAL security findings must be resolved before any public release. Do not proceed to `/launch-checklist` until these are addressed."
+
+If no CRITICAL/HIGH findings:
+> "✅ No blocking security findings. Report written to `production/security/`. Include this path when running `/gate-check release`."
+
+---
+
+## Collaborative Protocol
+
+- **Never assume a pattern is safe** — flag it and let the user decide
+- **Accepted risk is a valid outcome** — some LOW findings are acceptable trade-offs for a solo team; document the decision
+- **Multiplayer games have a higher bar** — any HIGH finding in a multiplayer context should be treated as CRITICAL
+- **This is not a penetration test** — this audit covers common patterns; a real pentest by a human security professional is recommended before any competitive or monetised multiplayer launch
--- a/.claude/skills/setup-engine/SKILL.md
+++ b/.claude/skills/setup-engine/SKILL.md
@ -3,7 +3,7 @@ name: setup-engine
 description: "Configure the project's game engine and version. Pins the engine in CLAUDE.md, detects knowledge gaps, and populates engine reference docs via WebSearch when the version is beyond the LLM's training data."
 argument-hint: "[engine] | [engine version] | refresh | upgrade [old-version] [new-version] | no args for guided selection"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit, WebSearch, WebFetch, Task
+allowed-tools: Read, Glob, Grep, Write, Edit, WebSearch, WebFetch, Task, AskUserQuestion
 ---

 When this skill is invoked:
@ -230,10 +230,15 @@ Example filled section:
 ```

 ### Remaining Sections
- Performance Budgets: Leave as `[TO BE CONFIGURED]` with a suggestion:
-  > "Typical targets: 60fps / 16.6ms frame budget. Want to set these now?"
- Testing: Suggest engine-appropriate framework (GUT for Godot, NUnit for Unity, etc.)
- Forbidden Patterns / Allowed Libraries: Leave as placeholder
+- **Performance Budgets**: Use `AskUserQuestion`:
+  - Prompt: "Should I set default performance budgets now, or leave them for later?"
+  - Options: `[A] Set defaults now (60fps, 16.6ms frame budget, engine-appropriate draw call limit)` / `[B] Leave as [TO BE CONFIGURED] — I'll set these when I know my target hardware`
+  - If [A]: populate with the suggested defaults. If [B]: leave as placeholder.
+- **Testing**: Suggest engine-appropriate framework (GUT for Godot, NUnit for Unity, etc.) — ask before adding.
+- **Forbidden Patterns**: Leave as placeholder — do NOT pre-populate.
+- **Allowed Libraries**: Leave as placeholder — do NOT pre-populate dependencies the project does not currently need. Only add a library here when it is actively being integrated, not speculatively.
+
+> **Guardrail**: Never add speculative dependencies to Allowed Libraries. For example, do NOT add GodotSteam unless Steam integration is actively beginning in this session. Post-launch integrations should be added to Allowed Libraries when that work begins, not during engine setup.

 ### Engine Specialists Routing

@ -571,6 +576,7 @@ Verdict: **COMPLETE** — engine configured and reference docs populated.
 - If reference docs already exist for a different engine, ask before replacing
 - Always show the user what you're about to change before making CLAUDE.md edits
 - If WebSearch returns ambiguous results, show the user and let them decide
+- When the user chose **GDScript**: copy the GDScript CLAUDE.md template from Appendix A1 exactly. NEVER add "C++ via GDExtension" to the Language field. GDScript projects may use GDExtension, but it is not a primary project language. The `godot-gdextension-specialist` in the routing table is available for when native extensions are needed — it does not make C++ a project language.

 ---

@ -585,11 +591,13 @@ All Godot-specific variants for language-dependent configuration. Referenced fro
 **GDScript:**
 ```markdown
 - **Engine**: Godot [version]
- **Language**: GDScript (primary), C++ via GDExtension (performance-critical)
+- **Language**: GDScript
 - **Build System**: SCons (engine), Godot Export Templates
 - **Asset Pipeline**: Godot Import System + custom resource pipeline
 ```

+> **Guardrail**: When using this GDScript template, write the Language field as exactly "`GDScript`" — no additions. Do NOT append "C++ via GDExtension" or any other language. The C# template below includes GDExtension because C# projects commonly wrap native code; GDScript projects do not.
+
 **C#:**
 ```markdown
 - **Engine**: Godot [version]
--- a/.claude/skills/skill-improve/SKILL.md
+++ b/.claude/skills/skill-improve/SKILL.md
@ -0,0 +1,144 @@
+---
+name: skill-improve
+description: "Improve a skill using a test-fix-retest loop. Runs static checks, proposes targeted fixes, rewrites the skill, re-tests, and keeps or reverts based on score change."
+argument-hint: "[skill-name]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Bash
+---
+
+# Skill Improve
+
+Runs an improvement loop on a single skill:
+test → fix → retest → keep or revert.
+
+---
+
+## Phase 1: Parse Argument
+
+Read the skill name from the first argument. If missing, output usage and stop:
+
+```
+Usage: /skill-improve [skill-name]
+Example: /skill-improve tech-debt
+```
+
+Verify `.claude/skills/[name]/SKILL.md` exists. If not, stop with:
+"Skill '[name]' not found."
+
+---
+
+## Phase 2: Baseline Test
+
+Run `/skill-test static [name]` and record the baseline score:
+- Count of FAILs
+- Count of WARNs
+- Which specific checks failed (Check 1–7)
+
+Display to the user:
+```
+Static baseline:   [N] failures, [M] warnings
+Failing: Check 4 (no ask-before-write), Check 5 (no handoff)
+```
+
+If baseline is 0 FAILs and 0 WARNs, note it and proceed to Phase 2b.
+
+### Phase 2b: Category Baseline
+
+Look up the skill's `category:` field in `CCGS Skill Testing Framework/catalog.yaml`.
+
+If no `category:` field is found, display:
+"Category: not yet assigned — skipping category checks."
+and skip to Phase 3.
+
+If category is found, run `/skill-test category [name]` and record the category baseline:
+- Count of FAILs
+- Count of WARNs
+- Which specific category rubric metrics failed
+
+Display to the user:
+```
+Category baseline: [N] failures, [M] warnings  ([category] rubric)
+```
+
+If BOTH static and category baselines are 0 FAILs and 0 WARNs, stop:
+"This skill already passes all static and category checks. No improvements needed."
+
+---
+
+## Phase 3: Diagnose
+
+Read the full skill file at `.claude/skills/[name]/SKILL.md`.
+
+For each failing or warning **static** check, identify the exact gap:
+
+- **Check 1 fail** → which frontmatter field is missing
+- **Check 2 fail** → how many phases found vs. minimum required
+- **Check 3 fail** → no verdict keywords anywhere in the skill body
+- **Check 4 fail** → Write or Edit in allowed-tools but no ask-before-write language
+- **Check 5 warn** → no follow-up or next-step section at the end
+- **Check 6 warn** → `context: fork` set but fewer than 5 phases found
+- **Check 7 warn** → argument-hint is empty or doesn't match documented modes
+
+For each failing or warning **category** check (if category was assigned in Phase 2b),
+identify the exact gap in the skill's text. For example:
+- If G2 fails (gate mode, full directors not spawned): skill body never references all 4
+  PHASE-GATE director prompts
+- If A2 fails (authoring, no per-section May-I-write): skill asks once at the end, not
+  before each section write
+- If T3 fails (team, BLOCKED not surfaced): skill doesn't halt dependent work on blocked agent
+
+Show the full combined diagnosis to the user before proposing any changes.
+
+---
+
+## Phase 4: Propose Fix
+
+Write a targeted fix for each failure and warning. Show the proposed changes
+as clearly marked before/after blocks. Only change what is failing — do not
+rewrite sections that are passing.
+
+Ask: "May I write this improved version to `.claude/skills/[name]/SKILL.md`?"
+
+If the user says no, stop here.
+
+---
+
+## Phase 5: Write and Retest
+
+Record the current content of the skill file (for revert if needed).
+
+Write the improved skill to `.claude/skills/[name]/SKILL.md`.
+
+Re-run `/skill-test static [name]` and record the new static score.
+If a category was assigned, also re-run `/skill-test category [name]` and record the new category score.
+
+Display the comparison:
+```
+Static:   Before [N] failures, [M] warnings  →  After [N'] failures, [M'] warnings
+Category: Before [N] failures, [M] warnings  →  After [N'] failures, [M'] warnings  (if applicable)
+Combined change: improved / no change / worse
+```
+
+---
+
+## Phase 6: Verdict
+
+Count the combined failure total: static FAILs + category FAILs + static WARNs + category WARNs.
+
+**If combined score improved (combined failure count is lower than baseline):**
+Report: "Score improved. Changes kept."
+Show a summary of what was fixed in each dimension.
+
+**If combined score is the same or worse:**
+Report: "Combined score did not improve."
+Show what changed and why it may not have helped.
+Ask: "May I revert `.claude/skills/[name]/SKILL.md` using git checkout?"
+If yes: run `git checkout -- .claude/skills/[name]/SKILL.md`
+
+---
+
+## Phase 7: Next Steps
+
+- Run `/skill-test static all` to find the next skill with failures.
+- Run `/skill-improve [next-name]` to continue the loop on another skill.
+- Run `/skill-test audit` to see overall coverage progress.
--- a/.claude/skills/skill-test/SKILL.md
+++ b/.claude/skills/skill-test/SKILL.md
@ -1,10 +1,9 @@
 ---
 name: skill-test
 description: "Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report)."
-argument-hint: "static [skill-name | all] | spec [skill-name] | audit"
+argument-hint: "static [skill-name | all] | spec [skill-name] | category [skill-name | all] | audit"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write
-context: fork
 ---

 # Skill Test
@ -13,13 +12,14 @@ Validates `.claude/skills/*/SKILL.md` files for structural compliance and
 behavioral correctness. No external dependencies — runs entirely within the
 existing skill/hook/template architecture.

-**Three modes:**
+**Four modes:**

 | Mode | Command | Purpose | Token Cost |
 |------|---------|---------|------------|
 | `static` | `/skill-test static [name\|all]` | Structural linter — 7 compliance checks per skill | Low (~1k/skill) |
 | `spec` | `/skill-test spec [name]` | Behavioral verifier — evaluates assertions in test spec | Medium (~5k/skill) |
-| `audit` | `/skill-test audit` | Coverage report — which skills have specs, last test dates | Low (~2k total) |
+| `category` | `/skill-test category [name\|all]` | Category rubric — checks skill against its category-specific metrics | Low (~2k/skill) |
+| `audit` | `/skill-test audit` | Coverage report — skills, agent specs, last test dates | Low (~3k total) |

 ---

@ -30,7 +30,9 @@ Determine mode from the first argument:
 - `static [name]` → run 7 structural checks on one skill
 - `static all` → run 7 structural checks on all skills (Glob `.claude/skills/*/SKILL.md`)
 - `spec [name]` → read skill + test spec, evaluate assertions
- `audit` (or no argument) → read catalog, list all skills, show coverage
+- `category [name]` → run category-specific rubric from `CCGS Skill Testing Framework/quality-rubric.md`
+- `category all` → run category rubric for every skill that has a `category:` in catalog
+- `audit` (or no argument) → read catalog, list all skills and agents, show coverage

 If argument is missing or unrecognized, output usage and stop.

@ -137,13 +139,14 @@ Aggregate Verdict: N WARNINGS / N FAILURES
 ### Step 1 — Locate Files

 Find skill at `.claude/skills/[name]/SKILL.md`.
-Find spec at `tests/skills/[name].md`.
+Look up the spec path from `CCGS Skill Testing Framework/catalog.yaml` — use the
+`spec:` field for the matching skill entry.

 If either is missing:
 - Missing skill: "Skill '[name]' not found in `.claude/skills/`."
- Missing spec: "No test spec found for '[name]'. Run `/skill-test audit` to see
-  coverage gaps, or create a spec using the template at
-  `.claude/docs/templates/skill-test-spec.md`."
+- Missing spec path in catalog: "No spec path set for '[name]' in catalog.yaml."
+- Spec file not found at path: "Spec file missing at [path]. Run `/skill-test audit`
+  to see coverage gaps."

 ### Step 2 — Read Both Files

@ -177,7 +180,7 @@ For **Protocol Compliance** assertions (always present):
 ```
 === Skill Spec Test: /[name] ===
 Date: [date]
-Spec: tests/skills/[name].md
+Spec: CCGS Skill Testing Framework/skills/[category]/[name].md

 Case 1: [Happy Path — name]
  Fixture: [summary]
@ -201,78 +204,139 @@ Overall Verdict: FAIL (1 case failed, 1 warning)

 ### Step 5 — Offer to Write Results

-"May I write these results to `tests/results/skill-test-spec-[name]-[date].md`
-and update `tests/skills/catalog.yaml`?"
+"May I write these results to `CCGS Skill Testing Framework/results/skill-test-spec-[name]-[date].md`
+and update `CCGS Skill Testing Framework/catalog.yaml`?"

 If yes:
- Write results file to `tests/results/`
- Update the skill's entry in `tests/skills/catalog.yaml`:
+- Write results file to `CCGS Skill Testing Framework/results/`
+- Update the skill's entry in `CCGS Skill Testing Framework/catalog.yaml`:
  - `last_spec: [date]`
  - `last_spec_result: PASS|PARTIAL|FAIL`

 ---

+## Phase 2D: Category Mode — Rubric Evaluation
+
+### Step 1 — Locate Skill and Category
+
+Find skill at `.claude/skills/[name]/SKILL.md`.
+Look up `category:` field in `CCGS Skill Testing Framework/catalog.yaml`.
+
+If skill not found: "Skill '[name]' not found."
+If no `category:` field: "No category assigned for '[name]' in catalog.yaml.
+Add `category: [name]` to the skill entry first."
+
+For `category all`: collect all skills with a `category:` field and process each.
+`category: utility` skills are evaluated against U1 (static checks pass) and U2
+(gate mode correct if applicable) only — skip to the static mode for U1.
+
+### Step 2 — Read Rubric Section
+
+Read `CCGS Skill Testing Framework/quality-rubric.md`.
+Extract the section matching the skill's category (e.g., `### gate`, `### team`).
+
+### Step 3 — Read Skill
+
+Read the skill's `SKILL.md` fully.
+
+### Step 4 — Evaluate Rubric Metrics
+
+For each metric in the category's rubric table:
+1. Check whether the skill's written instructions clearly satisfy the criterion
+2. Mark PASS, FAIL, or WARN
+3. For FAIL/WARN, identify the exact gap in the skill text (quote the relevant section
+   or note its absence)
+
+### Step 5 — Output Report
+
+```
+=== Skill Category Check: /[name] ([category]) ===
+
+Metric G1 — Review mode read:      PASS
+Metric G2 — Full mode directors:   FAIL
+  Gap: Phase 3 spawns only CD-PHASE-GATE; TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE absent
+Metric G3 — Lean mode: PHASE-GATE only: PASS
+Metric G4 — Solo mode: no directors:    PASS
+Metric G5 — No auto-advance:       PASS
+
+Verdict: FAIL (1 failure, 0 warnings)
+Fix: Add TD-PHASE-GATE, PR-PHASE-GATE, and AD-PHASE-GATE to the full-mode director
+     panel in Phase 3.
+```
+
+### Step 6 — Offer to Update Catalog
+
+"May I update `CCGS Skill Testing Framework/catalog.yaml` to record this category check
+(`last_category`, `last_category_result`) for [name]?"
+
+---
+
 ## Phase 2C: Audit Mode — Coverage Report

 ### Step 1 — Read Catalog

-Read `tests/skills/catalog.yaml`. If missing, note that catalog doesn't exist
+Read `CCGS Skill Testing Framework/catalog.yaml`. If missing, note that catalog doesn't exist
 yet (first-run state).

-### Step 2 — Enumerate All Skills
+### Step 2 — Enumerate All Skills and Agents

 Glob `.claude/skills/*/SKILL.md` to get the complete list of skills.
 Extract skill name from each path (directory name).

-### Step 3 — Build Coverage Table
+Also read the `agents:` section from `CCGS Skill Testing Framework/catalog.yaml` to get the
+complete list of agents.
+
+### Step 3 — Build Skill Coverage Table

 For each skill:
- Check if a spec file exists at `tests/skills/[name].md`
- Look up `last_static`, `last_static_result`, `last_spec`, `last_spec_result`
-  from catalog (or mark as "never" if not in catalog)
- Assign priority:
-  - `critical` — gate-check, design-review, story-readiness, story-done, review-all-gdds, architecture-review
-  - `high` — create-epics, create-stories, dev-story, create-control-manifest, propagate-design-change, story-done
-  - `medium` — team-* skills, sprint-plan, sprint-status
-  - `low` — all others
+- Check if a spec file exists (use the `spec:` path from catalog, or glob `CCGS Skill Testing Framework/skills/*/[name].md`)
+- Look up `last_static`, `last_static_result`, `last_spec`, `last_spec_result`,
+  `last_category`, `last_category_result`, `category` from catalog (or mark as
+  "never" / "—" if not in catalog)
+- Priority comes from catalog `priority:` field (critical/high/medium/low)
+
+### Step 3b — Build Agent Coverage Table
+
+For each agent in catalog's `agents:` section:
+- Check if a spec file exists (use the `spec:` path from catalog, or glob `CCGS Skill Testing Framework/agents/*/[name].md`)
+- Look up `last_spec`, `last_spec_result`, `category` from catalog

 ### Step 4 — Output Report

 ```
 === Skill Test Coverage Audit ===
 Date: [date]
-Total skills: 52
-Specs written: 4 (7.7%)
-Never tested (static): 48

-Coverage Table:
-Skill                  | Has Spec | Last Static      | Static Result | Last Spec        | Spec Result | Priority
-----------------------|----------|------------------|---------------|------------------|-------------|----------
-gate-check             | YES      | never            | —             | never            | —           | critical
-design-review          | YES      | never            | —             | never            | —           | critical
-story-readiness        | YES      | never            | —             | never            | —           | critical
-story-done             | YES      | never            | —             | never            | —           | critical
-architecture-review    | NO       | never            | —             | never            | —           | critical
-review-all-gdds        | NO       | never            | —             | never            | —           | critical
+SKILLS (72 total)
+Specs written: 72 (100%) | Never static tested: 72 | Never category tested: 72
+
+Skill                  | Cat      | Has Spec | Last Static | S.Result | Last Cat | C.Result | Priority
+-----------------------|----------|----------|-------------|----------|----------|----------|----------
+gate-check             | gate     | YES      | never       | —        | never    | —        | critical
+design-review          | review   | YES      | never       | —        | never    | —        | critical
 ...

-Top 5 Priority Gaps (no spec, critical/high priority):
-1. /architecture-review — critical, no spec
-2. /review-all-gdds — critical, no spec
-3. /create-epics — high, no spec
-4. /create-stories — high, no spec
-5. /dev-story — high, no spec
-4. /propagate-design-change — high, no spec
-5. /sprint-plan — medium, no spec
+AGENTS (49 total)
+Agent specs written: 49 (100%)

-Coverage: 4/52 specs (7.7%)
+Agent                  | Category   | Has Spec | Last Spec   | Result
+-----------------------|------------|----------|-------------|--------
+creative-director      | director   | YES      | never       | —
+technical-director     | director   | YES      | never       | —
+...
+
+Top 5 Priority Gaps (skills with no spec, critical/high priority):
+(none if all specs are written)
+
+Skill coverage:  72/72 specs (100%)
+Agent coverage:  49/49 specs (100%)
 ```

 No file writes in audit mode.

 Offer: "Would you like to run `/skill-test static all` to check structural
-compliance across all skills? Or `/skill-test spec [name]` to run a specific
-behavioral test?"
+compliance across all skills? `/skill-test category all` to run category rubric
+checks? Or `/skill-test spec [name]` to run a specific behavioral test?"

 ---

@ -284,9 +348,9 @@ After any mode completes, offer contextual follow-up:
  correctness if a test spec exists."
 - After `static all` with failures: "Address NON-COMPLIANT skills first. Run
  `/skill-test static [name]` individually for detailed remediation guidance."
- After `spec [name]` PASS: "Update `tests/skills/catalog.yaml` to record this
+- After `spec [name]` PASS: "Update `CCGS Skill Testing Framework/catalog.yaml` to record this
  pass date. Consider running `/skill-test audit` to find the next spec gap."
 - After `spec [name]` FAIL: "Review the failing assertions and update the skill
  or the test spec to resolve the mismatch."
 - After `audit`: "Start with the critical-priority gaps. Use the spec template
-  at `.claude/docs/templates/skill-test-spec.md` to create new specs."
+  at `CCGS Skill Testing Framework/templates/skill-test-spec.md` to create new specs."
--- a/.claude/skills/smoke-check/SKILL.md
+++ b/.claude/skills/smoke-check/SKILL.md
@ -3,7 +3,7 @@ name: smoke-check
 description: "Run the critical path smoke test gate before QA hand-off. Executes the automated test suite, verifies core functionality, and produces a PASS/FAIL report. Run after a sprint's stories are implemented and before manual QA begins. A failed smoke check means the build is not ready for QA."
 argument-hint: "[sprint | quick | --platform pc|console|mobile|all]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Bash, Write
+allowed-tools: Read, Glob, Grep, Bash, Write, AskUserQuestion
 ---

 # Smoke Check
--- a/.claude/skills/soak-test/SKILL.md
+++ b/.claude/skills/soak-test/SKILL.md
@ -4,7 +4,6 @@ description: "Generate a soak test protocol for extended play sessions. Defines
 argument-hint: "[duration: 30m | 1h | 2h | 4h] [focus: memory | stability | balance | all]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write
-context: fork
 ---

 # Soak Test
--- a/.claude/skills/sprint-plan/SKILL.md
+++ b/.claude/skills/sprint-plan/SKILL.md
@ -3,15 +3,19 @@ name: sprint-plan
 description: "Generates a new sprint plan or updates an existing one based on the current milestone, completed work, and available capacity. Pulls context from production documents and design backlogs."
 argument-hint: "[new|update|status] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
 context: |
  !ls production/sprints/ 2>/dev/null
 ---

 ## Phase 0: Parse Arguments

-Extract the mode argument (`new`, `update`, or `status`) and any `--review [full|lean|solo]`
-flag. Store the review mode as the override for this run (see `.claude/docs/director-gates.md`).
+Extract the mode argument (`new`, `update`, or `status`) and resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 ---

@ -33,7 +37,7 @@ flag. Store the review mode as the override for this run (see `.claude/docs/dire

 For `new`:

-**Generate a sprint plan** following this format and present it to the user. Ask: "May I write this sprint plan to `production/sprints/sprint-[N].md`?" If yes, write the file, creating the directory if needed. Verdict: **COMPLETE** — sprint plan created. If no: Verdict: **BLOCKED** — user declined write.
+**Generate a sprint plan** following this format and present it to the user. Do NOT ask to write yet — the producer feasibility gate (Phase 4) runs first and may require revisions before the file is written.

 ```markdown
 # Sprint [N] -- [Start Date] to [End Date]
@ -74,6 +78,10 @@ For `new`:
 ## Definition of Done for this Sprint
 - [ ] All Must Have tasks completed
 - [ ] All tasks pass acceptance criteria
+- [ ] QA plan exists (`production/qa/qa-plan-sprint-[N].md`)
+- [ ] All Logic/Integration stories have passing unit/integration tests
+- [ ] Smoke check passed (`/smoke-check sprint`)
+- [ ] QA sign-off report: APPROVED or APPROVED WITH CONDITIONS (`/team-qa sprint`)
 - [ ] No S1 or S2 bugs in delivered features
 - [ ] Design documents updated for any deviations
 - [ ] Code reviewed and merged
@ -159,23 +167,62 @@ stories that haven't changed, add new stories, remove dropped ones.

 ## Phase 4: Producer Feasibility Gate

+**Review mode check** — apply before spawning PR-SPRINT:
+- `solo` → skip. Note: "PR-SPRINT skipped — Solo mode." Proceed to Phase 5 (QA plan gate).
+- `lean` → skip (not a PHASE-GATE). Note: "PR-SPRINT skipped — Lean mode." Proceed to Phase 5 (QA plan gate).
+- `full` → spawn as normal.
+
 Before finalising the sprint plan, spawn `producer` via Task using gate **PR-SPRINT** (`.claude/docs/director-gates.md`).

 Pass: proposed story list (titles, estimates, dependencies), total team capacity in hours/days, any carryover from the previous sprint, milestone constraints and deadline.

 Present the producer's assessment. If UNREALISTIC, revise the story selection (defer stories to Should Have or Nice to Have) before asking for write approval. If CONCERNS, surface them and let the user decide whether to adjust.

-After handling the producer's verdict, add:
+After handling the producer's verdict, ask: "May I write this sprint plan to `production/sprints/sprint-[N].md`?" If yes, write the file, creating the directory if needed. Verdict: **COMPLETE** — sprint plan created. If no: Verdict: **BLOCKED** — user declined write.
+
+After writing, add:

 > **Scope check:** If this sprint includes stories added beyond the original epic scope, run `/scope-check [epic]` to detect scope creep before implementation begins.

 ---

-## Phase 5: Next Steps
+## Phase 5: QA Plan Gate

-After the sprint plan is written, recommend:
+Before closing the sprint plan, check whether a QA plan exists for this sprint.

+Use `Glob` to look for `production/qa/qa-plan-sprint-[N].md` or any file in `production/qa/` referencing this sprint number.
+
+**If a QA plan is found**: note it in the sprint plan output — "QA Plan: `[path]`" — and proceed.
+
+**If no QA plan exists**: do not silently proceed. Surface this explicitly:
+
+> "This sprint has no QA plan. A sprint plan without a QA plan means test requirements are undefined — developers won't know what 'done' looks like from a QA perspective, and the sprint cannot pass the Production → Polish gate without one.
+>
+> Run `/qa-plan sprint` now, before starting any implementation. It takes one session and produces the test case requirements each story needs."
+
+Use `AskUserQuestion`:
+- Prompt: "No QA plan found for this sprint. How do you want to proceed?"
+- Options:
+  - `[A] Run /qa-plan sprint now — I'll do that before starting implementation (Recommended)`
+  - `[B] Skip for now — I understand QA sign-off will be blocked at the Production → Polish gate`
+
+If [A]: close with "Sprint plan written. Run `/qa-plan sprint` next — then begin implementation."
+If [B]: add a warning block to the sprint plan document:
+
+```markdown
+> ⚠️ **No QA Plan**: This sprint was started without a QA plan. Run `/qa-plan sprint`
+> before the last story is implemented. The Production → Polish gate requires a QA
+> sign-off report, which requires a QA plan.
+```
+
+---
+
+## Phase 6: Next Steps
+
+After the sprint plan is written and QA plan status is resolved:
+
+- `/qa-plan sprint` — **required before implementation begins** — defines test cases per story so developers implement against QA specs, not a blank slate
+- `/story-readiness [story-file]` — validate a story is ready before starting it
+- `/dev-story [story-file]` — begin implementing the first story
 - `/sprint-status` — check progress mid-sprint
 - `/scope-check [epic]` — verify no scope creep before implementation begins
- `/dev-story [story-file]` — begin implementing the first story
- `/story-readiness [story-file]` — validate a story is ready before starting it
--- a/.claude/skills/sprint-status/SKILL.md
+++ b/.claude/skills/sprint-status/SKILL.md
@ -4,7 +4,6 @@ description: "Fast sprint status check. Reads the current sprint plan, scans sto
 argument-hint: "[sprint-number or blank for current]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep
-context: fork
 model: haiku
 ---

--- a/.claude/skills/start/SKILL.md
+++ b/.claude/skills/start/SKILL.md
@ -58,18 +58,24 @@ The user needs creative exploration before anything else.
   **Concept phase:**
   - `/brainstorm open` — discover your game concept
   - `/setup-engine` — configure the engine (brainstorm will recommend one)
+   - `/art-bible` — define visual identity (uses the Visual Identity Anchor brainstorm produces)
   - `/map-systems` — decompose the concept into systems
   - `/design-system` — author a GDD for each MVP system
   - `/review-all-gdds` — cross-system consistency check
   - `/gate-check` — validate readiness before architecture work
   **Architecture phase:**
-   - `/architecture-decision` — record key technical decisions (one per system)
+   - `/create-architecture` — produce the master architecture blueprint and Required ADR list
+   - `/architecture-decision (×N)` — record key technical decisions, following the Required ADR list
   - `/create-control-manifest` — compile decisions into an actionable rules sheet
   - `/architecture-review` — validate architecture coverage
-   **Production phase:**
+   **Pre-Production phase:**
+   - `/ux-design` — author UX specs for key screens (main menu, HUD, core interactions)
+   - `/prototype` — build a throwaway prototype to validate the core mechanic
+   - `/playtest-report (×1+)` — document each vertical slice playtest session
   - `/create-epics` — map systems to epics
   - `/create-stories` — break epics into implementable stories
   - `/sprint-plan` — plan the first sprint
+   **Production phase:** → pick up stories with `/dev-story`

 #### If B: Vague idea

@ -80,18 +86,24 @@ The user needs creative exploration before anything else.
   **Concept phase:**
   - `/brainstorm [hint]` — develop the idea into a full concept
   - `/setup-engine` — configure the engine
+   - `/art-bible` — define visual identity (uses the Visual Identity Anchor brainstorm produces)
   - `/map-systems` — decompose the concept into systems
   - `/design-system` — author a GDD for each MVP system
   - `/review-all-gdds` — cross-system consistency check
   - `/gate-check` — validate readiness before architecture work
   **Architecture phase:**
-   - `/architecture-decision` — record key technical decisions (one per system)
+   - `/create-architecture` — produce the master architecture blueprint and Required ADR list
+   - `/architecture-decision (×N)` — record key technical decisions, following the Required ADR list
   - `/create-control-manifest` — compile decisions into an actionable rules sheet
   - `/architecture-review` — validate architecture coverage
-   **Production phase:**
+   **Pre-Production phase:**
+   - `/ux-design` — author UX specs for key screens (main menu, HUD, core interactions)
+   - `/prototype` — build a throwaway prototype to validate the core mechanic
+   - `/playtest-report (×1+)` — document each vertical slice playtest session
   - `/create-epics` — map systems to epics
   - `/create-stories` — break epics into implementable stories
   - `/sprint-plan` — plan the first sprint
+   **Production phase:** → pick up stories with `/dev-story`

 #### If C: Clear concept

@ -103,20 +115,26 @@ The user needs creative exploration before anything else.
     - `Jump straight in` — Go to `/setup-engine` now and write the GDD manually afterward
 3. Show the recommended path:
   **Concept phase:**
-   - `/brainstorm` or `/setup-engine` (their pick)
+   - `/brainstorm` or `/setup-engine` — (their pick from step 2)
+   - `/art-bible` — define visual identity (after brainstorm if run, or after concept doc exists)
   - `/design-review` — validate the concept doc
   - `/map-systems` — decompose the concept into individual systems
   - `/design-system` — author a GDD for each MVP system
   - `/review-all-gdds` — cross-system consistency check
   - `/gate-check` — validate readiness before architecture work
   **Architecture phase:**
-   - `/architecture-decision` — record key technical decisions (one per system)
+   - `/create-architecture` — produce the master architecture blueprint and Required ADR list
+   - `/architecture-decision (×N)` — record key technical decisions, following the Required ADR list
   - `/create-control-manifest` — compile decisions into an actionable rules sheet
   - `/architecture-review` — validate architecture coverage
-   **Production phase:**
+   **Pre-Production phase:**
+   - `/ux-design` — author UX specs for key screens (main menu, HUD, core interactions)
+   - `/prototype` — build a throwaway prototype to validate the core mechanic
+   - `/playtest-report (×1+)` — document each vertical slice playtest session
   - `/create-epics` — map systems to epics
   - `/create-stories` — break epics into implementable stories
   - `/sprint-plan` — plan the first sprint
+   **Production phase:** → pick up stories with `/dev-story`

 #### If D: Existing work

@ -155,15 +173,15 @@ Check if `production/review-mode.txt` already exists.

 - **Prompt**: "One setup choice: how much design review would you want as you work through the workflow?"
 - **Options**:
-  - `Full (recommended)` — Director specialists review at each key workflow step. Best for new projects or when you want structured feedback on your decisions.
-  - `Lean` — Directors only at phase gate transitions (/gate-check). Skips per-skill reviews. For experienced users who trust their own design work.
+  - `Full` — Director specialists review at each key workflow step. Best for teams, learning the workflow, or when you want thorough feedback on every decision.
+  - `Lean (recommended)` — Directors only at phase gate transitions (/gate-check). Skips per-skill reviews. Balanced approach for solo devs and small teams.
  - `Solo` — No director reviews at all. Maximum speed. Best for game jams, prototypes, or if the reviews feel like overhead.

 Write the choice to `production/review-mode.txt` immediately after the user
 selects — no separate "May I write?" needed, as the write is a direct
 consequence of the selection:
- `Full (recommended)` → write `full`
- `Lean` → write `lean`
+- `Full` → write `full`
+- `Lean (recommended)` → write `lean`
 - `Solo` → write `solo`

 Create the `production/` directory if it does not exist.
@ -193,7 +211,7 @@ Verdict: **COMPLETE** — user oriented and handed off to next step.

 - **User picks D but project is empty**: Gently redirect — "It looks like the project is a fresh template with no artifacts yet. Would Path A or B be a better fit?"
 - **User picks A but project has code**: Mention what you found — "I noticed there's already code in `src/`. Did you mean to pick D (existing work)?"
- **User is returning (engine configured, concept exists)**: Skip onboarding entirely — "It looks like you're already set up! Your engine is [X] and you have a game concept at `design/gdd/game-concept.md`. Review mode: `[read from production/review-mode.txt, or 'full (default)' if missing]`. Want to pick up where you left off? Try `/sprint-plan` or just tell me what you'd like to work on."
+- **User is returning (engine configured, concept exists)**: Skip onboarding entirely — "It looks like you're already set up! Your engine is [X] and you have a game concept at `design/gdd/game-concept.md`. Review mode: `[read from production/review-mode.txt, or 'lean (default)' if missing]`. Want to pick up where you left off? Try `/sprint-plan` or just tell me what you'd like to work on."
 - **User doesn't fit any option**: Let them describe their situation in their own words and adapt.

 ---
--- a/.claude/skills/story-done/SKILL.md
+++ b/.claude/skills/story-done/SKILL.md
@ -3,7 +3,7 @@ name: story-done
 description: "End-of-story completion review. Reads the story file, verifies each acceptance criterion against the implementation, checks for GDD/ADR deviations, prompts code review, updates story status to Complete, and surfaces the next ready story from the sprint."
 argument-hint: "[story-file-path] [--review full|lean|solo]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Bash, Edit
+allowed-tools: Read, Glob, Grep, Bash, Edit, AskUserQuestion, Task
 ---

 # Story Done
@ -20,8 +20,12 @@ forgotten, and the story file reflects actual completion status.

 ## Phase 1: Find the Story

-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.

 **If a file path is provided** (e.g., `/story-done production/epics/core/story-damage-calculator.md`):
 read that file directly.
@ -149,15 +153,19 @@ Based on the Story Type extracted in Phase 2, check for required evidence:
 | **UI** | Manual walkthrough doc OR interaction test in `production/qa/evidence/` | ADVISORY |
 | **Config/Data** | Smoke check pass report in `production/qa/smoke-*.md` | ADVISORY |

-**For Logic stories**: use `Glob` to check `tests/unit/[system]/` for a test
-file matching the story slug. If none found:
- Flag as **BLOCKING**: "Logic story has no unit test file. Expected at
-  `tests/unit/[system]/[story-slug]_test.[ext]`. Create and run the test
-  before marking this story Complete."
+**For Logic stories**: first read the story's **Test Evidence** section to extract the
+exact required file path. Use `Glob` to check that exact path. If the exact path is not
+found, also search `tests/unit/[system]/` broadly (the file may have been placed at a
+slightly different location). If no test file is found at either location:
+- Flag as **BLOCKING**: "Logic story has no unit test file. Story requires it at
+  `[exact-path-from-Test-Evidence-section]`. Create and run the test before marking
+  this story Complete."

-**For Integration stories**: check `tests/integration/[system]/` AND
-`production/session-logs/` for a playtest record referencing this story.
-If neither exists: flag as **BLOCKING** (same rule as Logic).
+**For Integration stories**: read the story's **Test Evidence** section for the exact
+required path. Use `Glob` to check that exact path first, then search
+`tests/integration/[system]/` broadly, then check `production/session-logs/` for a
+playtest record referencing this story.
+If none found: flag as **BLOCKING** (same rule as Logic).

 **For Visual/Feel and UI stories**: glob `production/qa/evidence/` for a file
 referencing this story. If none: flag as **ADVISORY** —
@ -217,8 +225,39 @@ For each deviation found, categorize:

 ---

+## Phase 4b: QA Coverage Gate
+
+**Review mode check** — apply before spawning QL-TEST-COVERAGE:
+- `solo` → skip. Note: "QL-TEST-COVERAGE skipped — Solo mode." Proceed to Phase 5.
+- `lean` → skip (not a PHASE-GATE). Note: "QL-TEST-COVERAGE skipped — Lean mode." Proceed to Phase 5.
+- `full` → spawn as normal.
+
+After completing the deviation checks in Phase 4, spawn `qa-lead` via Task using gate **QL-TEST-COVERAGE** (`.claude/docs/director-gates.md`).
+
+Pass:
+- The story file path and story type
+- Test file paths found during Phase 3 (exact paths, or "none found")
+- The story's `## QA Test Cases` section (the pre-written test specs from story creation)
+- The story's `## Acceptance Criteria` list
+
+The qa-lead reviews whether the tests actually cover what was specified — not just whether files exist.
+
+Apply the verdict:
+- **ADEQUATE** → proceed to Phase 5
+- **GAPS** → flag as **ADVISORY**: "QA lead identified coverage gaps: [list]. Story can complete but gaps should be addressed in a follow-up story."
+- **INADEQUATE** → flag as **BLOCKING**: "QA lead: critical logic is untested. Verdict cannot be COMPLETE until coverage improves. Specific gaps: [list]."
+
+Skip this phase for Config/Data stories (no code tests required).
+
+---
+
 ## Phase 5: Lead Programmer Code Review Gate

+**Review mode check** — apply before spawning LP-CODE-REVIEW:
+- `solo` → skip. Note: "LP-CODE-REVIEW skipped — Solo mode." Proceed to Phase 6 (completion report).
+- `lean` → skip (not a PHASE-GATE). Note: "LP-CODE-REVIEW skipped — Lean mode." Proceed to Phase 6 (completion report).
+- `full` → spawn as normal.
+
 Spawn `lead-programmer` via Task using gate **LP-CODE-REVIEW** (`.claude/docs/director-gates.md`).

 Pass: implementation file paths, story file path, relevant GDD section, governing ADR.
@ -346,13 +385,25 @@ Run `/story-readiness [path]` to confirm a story is implementation-ready
 before starting.
 ```

-If no more stories are ready in this sprint:
-"No more stories ready in this sprint. Consider running `/sprint-status` to
-assess sprint health."
+If no more Must Have stories remain in this sprint (all are Complete or Blocked):

-If all Must Have stories are complete:
-"All Must Have stories are complete. Consider running `/milestone-review` or
-pulling from the Should Have list."
+```
+### Sprint Close-Out Sequence
+
+All Must Have stories are complete. QA sign-off is required before advancing.
+Run these in order:
+
+1. `/smoke-check sprint` — verify the critical path still works end-to-end
+2. `/team-qa sprint` — full QA cycle: test case execution, bug triage, sign-off report
+3. `/gate-check` — advance to the next phase once QA approves
+
+Do not run `/gate-check` until `/team-qa` returns APPROVED or APPROVED WITH CONDITIONS.
+```
+
+If there are Should Have stories still unstarted, surface them alongside the close-out sequence so the user can choose: close the sprint now, or pull in more work first.
+
+If no more stories are ready but Must Have stories are still In Progress (not Complete):
+"No more stories ready to start — [N] Must Have stories still in progress. Continue implementing those before sprint close-out."

 ---

--- a/.claude/skills/story-readiness/SKILL.md
+++ b/.claude/skills/story-readiness/SKILL.md
@ -3,8 +3,7 @@ name: story-readiness
 description: "Validate that a story file is implementation-ready. Checks for embedded GDD requirements, ADR references, engine notes, clear acceptance criteria, and no open design questions. Produces READY / NEEDS WORK / BLOCKED verdict with specific gaps. Use when user says 'is this story ready', 'can I start on this story', 'is story X ready to implement'."
 argument-hint: "[story-file-path or 'all' or 'sprint']"
 user-invocable: true
-allowed-tools: Read, Glob, Grep
-context: fork
+allowed-tools: Read, Glob, Grep, AskUserQuestion
 model: haiku
 ---

--- a/.claude/skills/team-level/SKILL.md
+++ b/.claude/skills/team-level/SKILL.md
@ -38,7 +38,10 @@ Always provide full context in each agent's prompt (game concept, pillars, exist

 3. **Orchestrate the level design team** in sequence:

-### Step 1: Narrative Context (narrative-director + world-builder)
+### Step 1: Narrative + Visual Direction (narrative-director + world-builder + art-director, parallel)
+
+Spawn all three agents simultaneously — issue all three Task calls before waiting for any result.
+
 Spawn the `narrative-director` agent to:
 - Define the narrative purpose of this area (what story beats happen here?)
 - Identify key characters, dialogue triggers, and lore elements
@ -49,15 +52,29 @@ Spawn the `world-builder` agent to:
 - Define environmental storytelling opportunities
 - Specify any world rules that affect gameplay in this area

-**Gate**: Use `AskUserQuestion` to present Step 1 outputs and confirm before proceeding to Step 2.
+Spawn the `art-director` agent to:
+- Establish visual theme targets for this area — these are INPUTS to layout, not outputs of it
+- Define the color temperature and lighting mood for this area (how does it differ from adjacent areas?)
+- Specify shape language direction (angular fortress? organic cave? decayed grandeur?)
+- Name the primary visual landmarks that will orient the player
+- Read `design/art/art-bible.md` if it exists — anchor all direction in the established art bible
+
+**The art-director's visual targets from Step 1 must be passed to the level-designer in Step 2** as explicit constraints. Layout decisions happen within the visual direction, not before it.
+
+**Gate**: Use `AskUserQuestion` to present all three Step 1 outputs (narrative brief, lore foundation, visual direction targets) and confirm before proceeding to Step 2.

 ### Step 2: Layout and Encounter Design (level-designer)
-Spawn the `level-designer` agent to:
- Design the spatial layout (critical path, optional paths, secrets)
- Define pacing curve (tension peaks, rest areas, exploration zones)
+Spawn the `level-designer` agent with the full Step 1 output as context:
+- Narrative brief (from narrative-director)
+- Lore foundation (from world-builder)
+- **Visual direction targets (from art-director)** — layout must work within these targets, not contradict them
+
+The level-designer should:
+- Design the spatial layout (critical path, optional paths, secrets) — ensuring primary routes align with the visual landmark targets from Step 1
+- Define pacing curve (tension peaks, rest areas, exploration zones) — coordinated with the emotional arc from narrative-director
 - Place encounters with difficulty progression
 - Design environmental puzzles or navigation challenges
- Define points of interest and landmarks for wayfinding
+- Define points of interest and landmarks for wayfinding — these must match the visual landmarks the art-director specified
 - Specify entry/exit points and connections to adjacent areas

 **Adjacent area dependency check**: After the layout is produced, check `design/levels/` for each adjacent area referenced by the level-designer. If any referenced area's `.md` file does not exist, surface the gap:
@ -81,13 +98,16 @@ Spawn the `systems-designer` agent to:

 **Gate**: Use `AskUserQuestion` to present Step 3 outputs and confirm before proceeding to Step 4.

-### Step 4: Visual Direction and Accessibility (parallel)
-Spawn the `art-director` agent to:
- Define the visual theme and color palette for the area
- Specify lighting mood and time-of-day settings
- List required art assets (environment props, unique assets)
- Define visual landmarks and sight lines
- Specify any special VFX needs (weather, particles, fog)
+### Step 4: Production Concepts + Accessibility (art-director + accessibility-specialist, parallel)
+
+**Note**: The art-director's directional pass (visual theme, color targets, mood) happened in Step 1. This pass is location-specific production concepts — given the finalized layout, what does each specific space look like?
+
+Spawn the `art-director` agent with the finalized layout from Step 2:
+- Produce location-specific concept specs for key spaces (entrance, key encounter zones, landmarks, exits)
+- Specify which art assets are unique to this area vs. shared from the global pool
+- Define sight-line and lighting setups per key space (these are now layout-informed, not directional)
+- Specify VFX needs that are specific to this area's layout (weather volumes, particles, atmospheric effects)
+- Flag any locations where the layout creates visual direction conflicts with the Step 1 targets — surface these as production risks

 Spawn the `accessibility-specialist` agent in parallel to:
 - Review the level layout for navigation clarity (can players orient themselves without relying on color alone?)
--- a/.claude/skills/team-narrative/SKILL.md
+++ b/.claude/skills/team-narrative/SKILL.md
@ -19,6 +19,7 @@ The user must approve before moving to the next phase.
 - **narrative-director** — Story arcs, character design, dialogue strategy, narrative vision
 - **writer** — Dialogue writing, lore entries, item descriptions, in-game text
 - **world-builder** — World rules, faction design, history, geography, environmental storytelling
+- **art-director** — Character visual design, environmental visual storytelling, cutscene/cinematic tone
 - **level-designer** — Level layouts that serve the narrative, pacing, environmental storytelling beats

 ## How to Delegate
@ -27,6 +28,7 @@ Use the Task tool to spawn each team member as a subagent:
 - `subagent_type: narrative-director` — Story arcs, character design, narrative vision
 - `subagent_type: writer` — Dialogue writing, lore entries, in-game text
 - `subagent_type: world-builder` — World rules, faction design, history, geography
+- `subagent_type: art-director` — Character visual profiles, environmental visual storytelling, cinematic tone
 - `subagent_type: level-designer` — Level layouts that serve the narrative, pacing
 - `subagent_type: localization-lead` — i18n validation, string key compliance, translation headroom

@ -43,9 +45,10 @@ Delegate to **narrative-director**:
 - Output: narrative brief with story requirements

 ### Phase 2: World Foundation (parallel)
-Delegate in parallel:
+Delegate in parallel — issue all three Task calls simultaneously before waiting for any result:
 - **world-builder**: Create or update lore entries for factions, locations, and history relevant to this content. Cross-reference against existing lore for contradictions. Set canon level for new entries.
 - **writer**: Draft character dialogue using voice profiles. Ensure all lines are under 120 characters, use named placeholders for variables, and are localization-ready.
+- **art-director**: Define character visual design direction for key characters appearing in this content (silhouette, visual archetype, distinguishing features). Specify environmental visual storytelling elements for each key space (prop composition, lighting notes, spatial arrangement). Define tone palette and cinematic direction for any cutscenes or scripted sequences.

 ### Phase 3: Level Narrative Integration
 Delegate to **level-designer**:
--- a/.claude/skills/team-qa/SKILL.md
+++ b/.claude/skills/team-qa/SKILL.md
@ -3,7 +3,7 @@ name: team-qa
 description: "Orchestrate the QA team through a full testing cycle. Coordinates qa-lead (strategy + test plan) and qa-tester (test case writing + bug reporting) to produce a complete QA package for a sprint or feature. Covers: test plan generation, test case writing, smoke check gate, manual QA execution, and sign-off report."
 argument-hint: "[sprint | feature: system-name]"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Task
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
 agent: qa-lead
 ---

@ -53,11 +53,16 @@ Prompt the qa-lead to:
 - Identify which stories require automated test evidence vs. manual QA
 - Flag any stories with missing acceptance criteria or missing test evidence that would block QA
 - Estimate manual QA effort (number of test sessions needed)
- Produce a strategy summary table:
+- Check `tests/smoke/` for smoke test scenarios; for each, assess whether it can be verified given the current build. Produce a smoke check verdict: **PASS** / **PASS WITH WARNINGS [list]** / **FAIL [list of failures]**
+- Produce a strategy summary table and smoke check result:

  | Story | Type | Automated Required | Manual Required | Blocker? |
  |-------|------|--------------------|-----------------|----------|

+  **Smoke Check**: [PASS / PASS WITH WARNINGS / FAIL] — [details if not PASS]
+
+If the smoke check result is **FAIL**, the qa-lead must list the failures prominently. QA cannot proceed past the strategy phase with a failed smoke check.
+
 Present the qa-lead's full strategy to the user, then use `AskUserQuestion`:

 ```
@ -66,9 +71,12 @@ options:
  - "Looks good — proceed to test plan"
  - "Adjust story types before proceeding"
  - "Skip blocked stories and proceed with the rest"
+  - "Smoke check failed — fix issues and re-run /team-qa"
  - "Cancel — resolve blockers first"
 ```

+If smoke check **FAIL**: do not proceed to Phase 3. Surface the failures and stop. The user must fix them and re-run `/team-qa`.
+If smoke check **PASS WITH WARNINGS**: note the warnings for the sign-off report and continue.
 If blockers are present: list them explicitly. The user may choose to skip blocked stories or cancel the cycle.

 ### Phase 3: Test Plan Generation
@ -88,26 +96,9 @@ Ask: "May I write the QA plan to `production/qa/qa-plan-[sprint]-[date].md`?"

 Write only after receiving approval.

-### Phase 4: Smoke Check Gate
+### Phase 4: Test Case Writing (qa-tester)

-Before any manual QA begins, run the smoke check.
-
-Spawn `qa-lead` via Task with instructions to:
- Review the `tests/smoke/` directory for the current smoke test list
- Check whether each smoke test scenario can be verified given the current build
- Produce a smoke check result: **PASS** / **PASS WITH WARNINGS** / **FAIL**
-
-Report the result to the user:
-
- **PASS**: "Smoke check passed. Proceeding to test case writing."
- **PASS WITH WARNINGS**: "Smoke check passed with warnings: [list issues]. These are non-blocking. Proceeding — note these for the sign-off report."
- **FAIL**: "Smoke check failed. QA cannot begin until these issues are resolved:
-  [list failures]
-  Fix them and re-run `/smoke-check`, or re-run `/team-qa` once resolved."
-
-On FAIL: stop the cycle and surface the list of failures. Do not proceed.
-
-### Phase 5: Test Case Writing (qa-tester)
+> **Smoke check** is performed as part of Phase 2 (QA Strategy). If the smoke check returned FAIL in Phase 2, the cycle was stopped there. This phase only runs when the Phase 2 smoke check was PASS or PASS WITH WARNINGS.

 For each story requiring manual QA (Visual/Feel, UI, Integration without automated tests):

--- a/.claude/skills/test-evidence-review/SKILL.md
+++ b/.claude/skills/test-evidence-review/SKILL.md
@ -4,7 +4,6 @@ description: "Quality review of test files and manual evidence documents. Goes b
 argument-hint: "[story-path | sprint | system-name]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write
-context: fork
 ---

 # Test Evidence Review
--- a/.claude/skills/test-flakiness/SKILL.md
+++ b/.claude/skills/test-flakiness/SKILL.md
@ -4,7 +4,6 @@ description: "Detect non-deterministic (flaky) tests by reading CI run logs or t
 argument-hint: "[ci-log-path | scan | registry]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write, Edit, Bash
-context: fork
 ---

 # Test Flakiness Detection
--- a/.claude/skills/test-helpers/SKILL.md
+++ b/.claude/skills/test-helpers/SKILL.md
@ -4,7 +4,6 @@ description: "Generate engine-specific test helper libraries for the project's t
 argument-hint: "[system-name | all | scaffold]"
 user-invocable: true
 allowed-tools: Read, Glob, Grep, Write
-context: fork
 ---

 # Test Helpers
--- a/.claude/skills/ux-design/SKILL.md
+++ b/.claude/skills/ux-design/SKILL.md
@ -3,8 +3,7 @@ name: ux-design
 description: "Guided, section-by-section UX spec authoring for a screen, flow, or HUD. Reads game concept, player journey, and relevant GDDs to provide context-aware design guidance. Produces ux-spec.md (per screen/flow) or hud-design.md using the studio templates."
 argument-hint: "[screen/flow name] or 'hud' or 'patterns'"
 user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion, Task
 agent: ux-designer
 ---

@ -81,9 +80,8 @@ so you can reference them rather than reinvent them.

 ### 2f: Art Bible

-Check for `docs/art-bible.md` or `design/art-bible.md`. If found, read the
-visual direction section. UX layout must align with the aesthetic commitments
-already made.
+Check for `design/art/art-bible.md`. If found, read the visual direction
+section. UX layout must align with the aesthetic commitments already made.

 ### 2g: Accessibility Requirements

@ -162,6 +160,18 @@ Ask: "May I create the skeleton file at `design/ux/[filename].md`?"

 ---

+## Navigation Position
+
+[To be designed]
+
+---
+
+## Entry & Exit Points
+
+[To be designed]
+
+---
+
 ## Layout Specification

 ### Information Hierarchy
@ -194,6 +204,18 @@ Ask: "May I create the skeleton file at `design/ux/[filename].md`?"

 ---

+## Events Fired
+
+[To be designed]
+
+---
+
+## Transitions & Animations
+
+[To be designed]
+
+---
+
 ## Data Requirements

 [To be designed]
@ -206,6 +228,18 @@ Ask: "May I create the skeleton file at `design/ux/[filename].md`?"

 ---

+## Localization Considerations
+
+[To be designed]
+
+---
+
+## Acceptance Criteria
+
+[To be designed]
+
+---
+
 ## Open Questions

 [To be designed]
@ -383,6 +417,40 @@ Offer to map this against the journey phases if the player journey doc exists.

 ---

+#### Section B2: Navigation Position
+
+Where does this screen sit in the game's navigation hierarchy? This is a one-paragraph orientation map — not a full flow diagram.
+
+**Questions to ask**:
+- "Is this screen accessed from the main menu, from pause, from within gameplay, or from another screen?"
+- "Is it a top-level destination (always reachable) or a context-dependent one (only accessible in certain states)?"
+- "Can the player reach this screen from more than one place in the game?"
+
+Present as: "This screen lives at: [root] → [parent] → [this screen]" plus any alternate entry paths.
+
+---
+
+#### Section B3: Entry & Exit Points
+
+Map every way the player can arrive at and leave this screen.
+
+**Questions to ask**:
+- "What are all the ways a player can reach this screen?" (List each trigger: button press, game event, redirect from another screen, etc.)
+- "What can the player do to exit? What happens when they do?" (Back button, confirm action, timeout, game event)
+- "Are there any exits that are one-way — where the player cannot return to this screen without starting over?"
+
+Present as two tables:
+
+| Entry Source | Trigger | Player carries this context |
+|---|---|---|
+| [screen/event] | [how] | [state/data they arrive with] |
+
+| Exit Destination | Trigger | Notes |
+|---|---|---|
+| [screen/event] | [how] | [any irreversible state changes] |
+
+---
+
 #### Section C: Layout Specification

 This is the largest and most interactive section. Work through it in sub-sections:
@ -459,6 +527,41 @@ an existing UX spec or note it as a spec dependency.

 ---

+#### Section E2: Events Fired
+
+For every player action in the Interaction Map, document the corresponding event the game or analytics system should fire — or explicitly note "no event" if none applies.
+
+**Questions to ask**:
+- "For each action, should the game fire an analytics event, trigger a game-state change, or both?"
+- "Are there any actions that should NOT fire an event — and is that a deliberate choice?"
+
+Present as a table alongside the Interaction Map:
+
+| Player Action | Event Fired | Payload / Data |
+|---|---|---|
+| [action] | [EventName] or none | [data passed with event] |
+
+Flag any action that modifies persistent game state (save data, progress, economy) — these need explicit attention from the architecture team.
+
+---
+
+#### Section E3: Transitions & Animations
+
+Specify how the screen enters and exits, and how it responds to state changes.
+
+**Questions to ask**:
+- "How does this screen appear? (fade in, slide from right, instant pop, scale from button)"
+- "How does it dismiss? (fade out, slide back, cut)"
+- "Are there any in-screen state transitions that need animation? (loading spinner, success state, error flash)"
+- "Is there any animation that could cause motion sickness — and does the game have a reduced-motion option?"
+
+Minimum required:
+- Screen enter transition
+- Screen exit transition
+- At least one state-change animation if the screen has multiple states
+
+---
+
 #### Section F: Data Requirements

 Cross-reference the GDD UI Requirements sections gathered in Phase 2.
@ -499,6 +602,45 @@ Use `AskUserQuestion` to surface any open questions on accessibility tier:

 ---

+#### Section H: Localization Considerations
+
+Document constraints that affect how this screen behaves when text is translated.
+
+**Questions to ask**:
+- "Which text elements on this screen are the longest? What is the maximum character count that fits the layout?"
+- "Are there any elements where text length is layout-critical — e.g., a button label that must stay on one line?"
+- "Are there any elements that display numbers, dates, or currencies that need locale-specific formatting?"
+
+Note: aim to flag any element where a 40% text expansion (common in translations from English to German or French) would break the layout. Mark those as HIGH PRIORITY for the localization engineer.
+
+---
+
+#### Section I: Acceptance Criteria
+
+Write at least 5 specific, testable criteria that a QA tester can verify without reading any other design document. These become the pass/fail conditions for `/story-done`.
+
+**Format**: Use checkboxes. Each criterion must be verifiable by a human tester:
+
+```
+- [ ] Screen opens within [X]ms from [trigger]
+- [ ] [Element] displays correctly at [minimum] and [maximum] values
+- [ ] [Navigation action] correctly routes to [destination screen]
+- [ ] Error state appears when [condition] and shows [specific message or icon]
+- [ ] Keyboard/gamepad navigation reaches all interactive elements in logical order
+- [ ] [Accessibility requirement] is met — e.g., "all interactive elements have focus indicators"
+```
+
+**Minimum required**:
+- 1 performance criterion (load/open time)
+- 1 navigation criterion (at least one entry or exit path verified)
+- 1 error/empty state criterion
+- 1 accessibility criterion (per committed tier)
+- 1 criterion specific to this screen's core purpose
+
+Ask the user to confirm: "Do these criteria cover what would actually make this screen 'done' for your QA process?"
+
+---
+
 ### Section Guidance: HUD Design Mode

 HUD design follows a different order from UX spec mode. Begin with philosophy;
@ -699,14 +841,23 @@ Update `production/session-state/active.md` with:

 ### 6b: Suggest Next Step

-Use `AskUserQuestion`:
- "The spec is complete. What's next?"
+Before presenting options, state clearly:
+
+> "This spec should be validated with `/ux-review` before it enters the
+> implementation pipeline. The Pre-Production gate requires all key screen specs
+> to have a review verdict."
+
+Then use `AskUserQuestion`:
+- "Run `/ux-review [filename]` now, or do something else first?"
  - Options:
-    - "Run `/ux-review` to validate this spec"
-    - "Design another screen"
+    - "Run `/ux-review` now — validate this spec"
+    - "Design another screen first, then review all specs together"
    - "Update the interaction pattern library with new patterns from this spec"
    - "Stop here for this session"

+If the user picks "Design another screen first", add a note: "Reminder: run
+`/ux-review` on all completed specs before running `/gate-check pre-production`."
+
 ### 6c: Cross-Link Related Specs

 If other UX specs link to or from this screen, note which ones should reference
@ -740,7 +891,7 @@ specific sub-topics, additional context or coordination may be needed:
 | Implementation feasibility (engine constraints) | `ui-programmer` — before finalizing component inventory |
 | Gameplay data requirements | `game-designer` — when data ownership is unclear |
 | Narrative/lore visible in the UI | `narrative-director` — for flavor text, item names, lore panels |
-| Accessibility tier decisions | `ux-designer` (owns this) |
+| Accessibility tier decisions | Handled by this session — owned by ux-designer |

 When delegating to another agent via the Task tool:
 - Provide: screen name, game concept summary, the specific question needing expert input
--- a/.claude/skills/ux-review/SKILL.md
+++ b/.claude/skills/ux-review/SKILL.md
@ -4,7 +4,6 @@ description: "Validates a UX spec, HUD design, or interaction pattern library fo
 argument-hint: "[file-path or 'all' or 'hud' or 'patterns']"
 user-invocable: true
 allowed-tools: Read, Glob, Grep
-context: fork
 agent: ux-designer
 ---

--- a/.claude/statusline.sh
+++ b/.claude/statusline.sh
@ -64,15 +64,23 @@ if [ -z "$stage" ]; then
    src_count=$(find "$cwd/src" -type f \( -name "*.gd" -o -name "*.cs" -o -name "*.cpp" -o -name "*.h" -o -name "*.py" -o -name "*.rs" -o -name "*.lua" -o -name "*.tscn" -o -name "*.tres" \) 2>/dev/null | wc -l | tr -d ' ')
  fi

+  # Check for ADRs (signals Pre-Production phase)
+  has_adrs=false
+  if ls "$cwd/docs/architecture/"adr-*.md 2>/dev/null | head -1 | grep -q .; then
+    has_adrs=true
+  fi
+
  # Determine stage (check from most-advanced backward)
  if [ "$src_count" -ge 10 ] 2>/dev/null; then
    stage="Production"
-  elif [ "$engine_configured" = true ]; then
+  elif [ "$has_adrs" = true ]; then
    stage="Pre-Production"
-  elif [ "$has_systems" = true ]; then
+  elif [ "$engine_configured" = true ]; then
    stage="Technical Setup"
-  elif [ "$has_concept" = true ]; then
+  elif [ "$has_systems" = true ]; then
    stage="Systems Design"
+  elif [ "$has_concept" = true ]; then
+    stage="Concept"
  else
    stage="Concept"
  fi
--- a/Framework/CLAUDE.md
+++ b/Framework/CLAUDE.md
@ -0,0 +1,94 @@
+# CCGS Skill Testing Framework — Claude Instructions
+
+This folder is the quality assurance layer for the Claude Code Game Studios skill/agent
+framework. It is self-contained and separate from any game project.
+
+## Key files
+
+| File | Purpose |
+|------|---------|
+| `catalog.yaml` | Master registry for all 72 skills and 49 agents. Contains category, spec path, and last-test tracking fields. Always read this first when running any test command. |
+| `quality-rubric.md` | Category-specific pass/fail metrics. Read the matching `###` section for the skill's category when running `/skill-test category`. |
+| `skills/[category]/[name].md` | Behavioral spec for a skill — 5 test cases + protocol compliance assertions. |
+| `agents/[tier]/[name].md` | Behavioral spec for an agent — 5 test cases + protocol compliance assertions. |
+| `templates/skill-test-spec.md` | Template for writing new skill spec files. |
+| `templates/agent-test-spec.md` | Template for writing new agent spec files. |
+| `results/` | Written by `/skill-test spec` when results are saved. Gitignored. |
+
+## Path conventions
+
+- Skill specs: `CCGS Skill Testing Framework/skills/[category]/[name].md`
+- Agent specs: `CCGS Skill Testing Framework/agents/[tier]/[name].md`
+- Catalog: `CCGS Skill Testing Framework/catalog.yaml`
+- Rubric: `CCGS Skill Testing Framework/quality-rubric.md`
+
+The `spec:` field in `catalog.yaml` is the authoritative path for each skill/agent spec.
+Always read it rather than guessing the path.
+
+## Skill categories
+
+```
+gate        → gate-check
+review      → design-review, architecture-review, review-all-gdds
+authoring   → design-system, quick-design, architecture-decision, art-bible,
+              create-architecture, ux-design, ux-review
+readiness   → story-readiness, story-done
+pipeline    → create-epics, create-stories, dev-story, create-control-manifest,
+              propagate-design-change, map-systems
+analysis    → consistency-check, balance-check, content-audit, code-review,
+              tech-debt, scope-check, estimate, perf-profile, asset-audit,
+              security-audit, test-evidence-review, test-flakiness
+team        → team-combat, team-narrative, team-audio, team-level, team-ui,
+              team-qa, team-release, team-polish, team-live-ops
+sprint      → sprint-plan, sprint-status, milestone-review, retrospective,
+              changelog, patch-notes
+utility     → all remaining skills
+```
+
+## Agent tiers
+
+```
+directors   → creative-director, technical-director, producer, art-director
+leads       → lead-programmer, narrative-director, audio-director, ux-designer,
+              qa-lead, release-manager, localization-lead
+specialists → gameplay-programmer, engine-programmer, ui-programmer,
+              tools-programmer, network-programmer, ml-engineer, ai-programmer,
+              level-designer, sound-designer, technical-artist
+godot       → godot-specialist, godot-gdscript-specialist, godot-csharp-specialist,
+              godot-shader-specialist, godot-gdextension-specialist
+unity       → unity-specialist, unity-ui-specialist, unity-shader-specialist,
+              unity-dots-specialist, unity-addressables-specialist
+unreal      → unreal-specialist, ue-gas-specialist, ue-replication-specialist,
+              ue-umg-specialist, ue-blueprint-specialist
+operations  → devops-engineer, deployment-engineer, database-admin,
+              security-engineer, performance-analyst, analytics-engineer,
+              community-manager
+creative    → writer, world-builder, game-designer, economy-designer,
+              systems-designer, prototyper
+```
+
+## Workflow for testing a skill
+
+1. Read `catalog.yaml` to get the skill's `spec:` path and `category:`
+2. Read the skill at `.claude/skills/[name]/SKILL.md`
+3. Read the spec at the `spec:` path
+4. Evaluate assertions case by case
+5. Offer to write results to `results/` and update `catalog.yaml`
+
+## Workflow for improving a skill
+
+Use `/skill-improve [name]`. It handles the full loop:
+test → diagnose → propose fix → rewrite → retest → keep or revert.
+
+## Spec validity note
+
+Specs in this folder describe **current behavior**, not ideal behavior. They were
+written by reading the skills, so they may encode bugs. When a skill misbehaves in
+practice, correct the skill first, then update the spec to match the fixed behavior.
+Treat spec failures as "this needs investigation," not "the skill is definitively wrong."
+
+## This folder is deletable
+
+Nothing in `.claude/` imports from here. Deleting this folder has no effect on the
+CCGS skills or agents themselves. `/skill-test` and `/skill-improve` will report that
+`catalog.yaml` is missing and guide the user to initialize it.
--- a/Framework/README.md
+++ b/Framework/README.md
@ -0,0 +1,150 @@
+# CCGS Skill Testing Framework
+
+Quality assurance infrastructure for the **Claude Code Game Studios** framework.
+Tests the skills and agents themselves — not any game built with them.
+
+> **This folder is self-contained and optional.**
+> Game developers using CCGS don't need it. To remove it entirely:
+> `rm -rf "CCGS Skill Testing Framework"` — nothing in `.claude/` depends on it.
+
+---
+
+## What's in here
+
+```
+CCGS Skill Testing Framework/
+├── README.md              ← you are here
+├── CLAUDE.md              ← tells Claude how to use this framework
+├── catalog.yaml           ← master registry: all 72 skills + 49 agents, coverage tracking
+├── quality-rubric.md      ← category-specific pass/fail metrics for /skill-test category
+│
+├── skills/                ← behavioral spec files for skills (one per skill)
+│   ├── gate/              ← gate category specs
+│   ├── review/            ← review category specs
+│   ├── authoring/         ← authoring category specs
+│   ├── readiness/         ← readiness category specs
+│   ├── pipeline/          ← pipeline category specs
+│   ├── analysis/          ← analysis category specs
+│   ├── team/              ← team category specs
+│   ├── sprint/            ← sprint category specs
+│   └── utility/           ← utility category specs
+│
+├── agents/                ← behavioral spec files for agents (one per agent)
+│   ├── directors/         ← creative-director, technical-director, producer, art-director
+│   ├── leads/             ← lead-programmer, narrative-director, audio-director, etc.
+│   ├── specialists/       ← engine/code/shader/UI specialists
+│   ├── godot/             ← Godot-specific specialists
+│   ├── unity/             ← Unity-specific specialists
+│   ├── unreal/            ← Unreal-specific specialists
+│   ├── operations/        ← QA, live-ops, release, localization, etc.
+│   └── creative/          ← writer, world-builder, game-designer, etc.
+│
+├── templates/             ← spec file templates for writing new specs
+│   ├── skill-test-spec.md ← template for skill behavioral specs
+│   └── agent-test-spec.md ← template for agent behavioral specs
+│
+└── results/               ← test run outputs (written by /skill-test spec, gitignored)
+```
+
+---
+
+## How to use it
+
+All testing is driven by two skills already in the framework:
+
+### Check structural compliance
+
+```
+/skill-test static [skill-name]     # Check one skill (7 checks)
+/skill-test static all              # Check all 72 skills
+```
+
+### Run a behavioral spec test
+
+```
+/skill-test spec gate-check         # Evaluate a skill against its written spec
+/skill-test spec design-review
+```
+
+### Check against category rubric
+
+```
+/skill-test category gate-check     # Evaluate one skill against its category metrics
+/skill-test category all            # Run rubric checks across all categorized skills
+```
+
+### See full coverage picture
+
+```
+/skill-test audit                   # Skills + agents: has-spec, last tested, result
+```
+
+### Improve a failing skill
+
+```
+/skill-improve gate-check           # Test → diagnose → propose fix → retest loop
+```
+
+---
+
+## Skill categories
+
+| Category | Skills | Key metrics |
+|----------|--------|-------------|
+| `gate` | gate-check | Review mode read, full/lean/solo director panel, no auto-advance |
+| `review` | design-review, architecture-review, review-all-gdds | Read-only, 8-section check, correct verdicts |
+| `authoring` | design-system, quick-design, art-bible, create-architecture, … | Section-by-section May-I-write, skeleton-first |
+| `readiness` | story-readiness, story-done | Blockers surfaced, director gate in full mode |
+| `pipeline` | create-epics, create-stories, dev-story, map-systems, … | Upstream dependency check, handoff path clear |
+| `analysis` | consistency-check, balance-check, code-review, tech-debt, … | Read-only report, verdict keyword, no writes |
+| `team` | team-combat, team-narrative, team-audio, … | All required agents spawned, blocked surfaced |
+| `sprint` | sprint-plan, sprint-status, milestone-review, … | Reads sprint data, status keywords present |
+| `utility` | start, adopt, hotfix, localize, setup-engine, … | Passes static checks |
+
+---
+
+## Agent tiers
+
+| Tier | Agents |
+|------|--------|
+| `directors` | creative-director, technical-director, producer, art-director |
+| `leads` | lead-programmer, narrative-director, audio-director, ux-designer, qa-lead, release-manager, localization-lead |
+| `specialists` | gameplay-programmer, engine-programmer, ui-programmer, tools-programmer, network-programmer, ml-engineer, ai-programmer, level-designer, sound-designer, technical-artist |
+| `godot` | godot-specialist, godot-gdscript-specialist, godot-csharp-specialist, godot-shader-specialist, godot-gdextension-specialist |
+| `unity` | unity-specialist, unity-ui-specialist, unity-shader-specialist, unity-dots-specialist, unity-addressables-specialist |
+| `unreal` | unreal-specialist, ue-gas-specialist, ue-replication-specialist, ue-umg-specialist, ue-blueprint-specialist |
+| `operations` | devops-engineer, deployment-engineer, database-admin, security-engineer, performance-analyst, analytics-engineer, community-manager |
+| `creative` | writer, world-builder, game-designer, economy-designer, systems-designer, prototyper |
+
+---
+
+## Updating the catalog
+
+`catalog.yaml` tracks test coverage for every skill and agent. After running a test:
+
+- `/skill-test spec [name]` will offer to update `last_spec` and `last_spec_result`
+- `/skill-test category [name]` will offer to update `last_category` and `last_category_result`
+- `last_static` and `last_static_result` are updated manually or via `/skill-improve`
+
+---
+
+## Writing a new spec
+
+1. Find the spec template at `templates/skill-test-spec.md`
+2. Copy it to `skills/[category]/[skill-name].md`
+3. Update the `spec:` field in `catalog.yaml` to point to the new file
+4. Run `/skill-test spec [skill-name]` to validate it
+
+---
+
+## Removing this framework
+
+This folder has no hooks into the main project. To remove:
+
+```bash
+rm -rf "CCGS Skill Testing Framework"
+```
+
+The skills `/skill-test` and `/skill-improve` will still function — they'll simply
+report that `catalog.yaml` is missing and suggest running `/skill-test audit` to
+initialize it.
--- a/Framework/agents/directors/art-director.md
+++ b/Framework/agents/directors/art-director.md
@ -0,0 +1,84 @@
+# Agent Test Spec: art-director
+
+## Agent Summary
+**Domain owned:** Visual identity, art bible authorship and enforcement, asset quality standards, UI/UX visual design, visual phase gate, concept art evaluation.
+**Does NOT own:** UX interaction flows and information architecture (ux-designer's domain), audio direction (audio-director), code implementation.
+**Model tier:** Sonnet (note: despite the "director" title, art-director is assigned Sonnet per coordination-rules.md — it handles individual system analysis, not multi-document phase gate synthesis at the Opus level).
+**Gate IDs handled:** AD-CONCEPT-VISUAL, AD-ART-BIBLE, AD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/art-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references visual identity, art bible, asset standards — not generic)
+- [ ] `allowed-tools:` list is read-focused; image review capability if supported; no Bash unless asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` (NOT Opus — coordination-rules.md assigns Sonnet to art-director)
+- [ ] Agent definition does not claim authority over UX interaction flows or audio direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** The art bible's color palette section is submitted for review. The section defines a desaturated earth-tone primary palette with high-contrast accent colors tied to the game pillar "beauty in decay." The palette is internally consistent and references the pillar vocabulary. Request is tagged AD-ART-BIBLE.
+**Expected:** Returns `AD-ART-BIBLE: APPROVE` with rationale confirming the palette's internal consistency and its alignment with the stated pillar.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `AD-ART-BIBLE: APPROVE`
+- [ ] Rationale references the specific palette characteristics and pillar alignment — not generic art advice
+- [ ] Output stays within visual domain — does not comment on UX interaction patterns or audio mood
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Sound designer asks art-director to specify how ambient audio should layer and duck when the player enters a combat zone.
+**Expected:** Agent declines to define audio behavior and redirects to audio-director.
+**Assertions:**
+- [ ] Does not make any binding decision about audio layering or ducking behavior
+- [ ] Explicitly names `audio-director` as the correct handler
+- [ ] May note if the audio has visual mood implications (e.g., "the audio should match the visual tension of the zone"), but defers all audio specification to audio-director
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** Concept art for the protagonist is submitted. The art uses a vivid, saturated color palette (primary: #FF4500, #00BFFF) that directly contradicts the established art bible's "desaturated earth-tones" palette specification. Request is tagged AD-CONCEPT-VISUAL.
+**Expected:** Returns `AD-CONCEPT-VISUAL: CONCERNS` with specific citation of the palette discrepancy, referencing the art bible's stated palette values versus the submitted concept's palette.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `AD-CONCEPT-VISUAL: CONCERNS`
+- [ ] Rationale specifically identifies the palette conflict — not a generic "doesn't match style" comment
+- [ ] References the art bible as the authoritative source for the correct palette
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** ux-designer proposes using high-contrast, brightly colored icons for the HUD to improve readability. art-director believes this violates the art bible's muted visual language and would undermine the visual identity.
+**Expected:** art-director states the visual identity concern and references the art bible, acknowledges ux-designer's readability goal as legitimate, and escalates to creative-director to arbitrate the trade-off between visual coherence and usability.
+**Assertions:**
+- [ ] Escalates to `creative-director` (shared parent for creative domain conflicts)
+- [ ] Does not unilaterally override ux-designer's readability recommendation
+- [ ] Clearly frames the conflict as a trade-off between two legitimate goals
+- [ ] References the specific art bible rule being violated
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the existing art bible with specific palette values (primary: #8B7355, #6B6B47; accent: #C8A96E) and style rules ("no pure white, no pure black; all shadows have warm undertones"). A new asset is submitted for review.
+**Expected:** Assessment references the specific hex values and style rules from the provided art bible, not generic color theory advice. Any concerns are tied to specific violations of the provided rules.
+**Assertions:**
+- [ ] References specific palette values from the provided art bible context
+- [ ] Applies the specific style rules (no pure white/black, warm shadow undertones) from the provided document
+- [ ] Does not generate generic art direction feedback disconnected from the supplied art bible
+- [ ] Verdict rationale is traceable to specific lines or rules in the provided context
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared visual domain
+- [ ] Escalates UX-vs-visual conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `AD-ART-BIBLE: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding UX interaction, audio, or code implementation decisions
+
+---
+
+## Coverage Notes
+- AD-PHASE-GATE (full visual phase advancement) is not covered — deferred to integration with /gate-check skill.
+- Asset pipeline standards (file format, resolution, naming conventions) compliance checks are not covered here.
+- Shader visual output review is not covered — that interaction with the engine specialist is deferred.
+- UI component visual review (as distinct from UX flow review) could benefit from additional cases.
--- a/Framework/agents/directors/creative-director.md
+++ b/Framework/agents/directors/creative-director.md
@ -0,0 +1,84 @@
+# Agent Test Spec: creative-director
+
+## Agent Summary
+**Domain owned:** Creative vision, game pillars, GDD alignment, systems decomposition feedback, narrative direction, playtest feedback interpretation, phase gate (creative aspect).
+**Does NOT own:** Technical architecture or implementation details (delegates to technical-director), production scheduling (producer), visual art style execution (delegates to art-director).
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** CD-PILLARS, CD-GDD-ALIGN, CD-SYSTEMS, CD-NARRATIVE, CD-PLAYTEST, CD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/creative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references creative vision, pillars, GDD alignment — not generic)
+- [ ] `allowed-tools:` list is read-heavy; should not include Bash unless justified by a creative workflow need
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over technical architecture or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A game concept document is submitted for pillar review. The concept describes a narrative survival game built around three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." Request is tagged CD-PILLARS.
+**Expected:** Returns `CD-PILLARS: APPROVE` with rationale citing how each pillar is represented in the concept and any reinforcing or weakening signals found in the document.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `CD-PILLARS: APPROVE` (gate ID prefix, colon, verdict keyword)
+- [ ] Rationale references the three specific pillars by name, not generic creative advice
+- [ ] Output stays within creative scope — does not comment on engine feasibility or sprint schedule
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Developer asks creative-director to review a proposed PostgreSQL schema for storing player save data.
+**Expected:** Agent declines to evaluate the schema and redirects to technical-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the schema design
+- [ ] Explicitly names `technical-director` as the correct handler
+- [ ] May note whether the data model has creative implications (e.g., what player data is tracked), but defers structural decisions entirely
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A GDD for the "Crafting" system is submitted. Section 4 (Formulas) defines a resource decay formula that punishes exploration — contradicting the Player Fantasy section which calls for "freedom to roam without fear." Request is tagged CD-GDD-ALIGN.
+**Expected:** Returns `CD-GDD-ALIGN: CONCERNS` with specific citation of the contradiction between the formula behavior and the Player Fantasy statement.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `CD-GDD-ALIGN: CONCERNS`
+- [ ] Rationale quotes or directly references GDD Section 4 (Formulas) and the Player Fantasy section
+- [ ] Does not prescribe a specific formula fix — that belongs to systems-designer
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** technical-director raises a concern that the core loop mechanic (real-time branching conversations) is prohibitively expensive to implement and recommends cutting it. creative-director disagrees on creative grounds.
+**Expected:** creative-director acknowledges the technical constraint, does not override technical-director's feasibility assessment, but retains authority to define what the creative goal is. For the conflict itself, creative-director is the top-level creative escalation point and defers to technical-director on implementation feasibility while advocating for the design intent. The resolution path is for both to jointly present trade-off options to the user.
+**Assertions:**
+- [ ] Does not unilaterally override technical-director's feasibility concern
+- [ ] Clearly separates "what we want creatively" from "how it gets built"
+- [ ] Proposes presenting trade-offs to the user rather than resolving unilaterally
+- [ ] Does not claim to own implementation decisions
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game pillars document (`design/gdd/pillars.md`) and a new mechanic spec for review. The pillars document defines "player authorship," "consequence permanence," and "world responsiveness" as the three core pillars.
+**Expected:** Assessment uses the exact pillar vocabulary from the provided document, not generic creative heuristics. Any approval or concern is tied back to one or more of the three named pillars.
+**Assertions:**
+- [ ] Uses the exact pillar names from the provided context document
+- [ ] Does not generate generic creative feedback disconnected from the supplied pillars
+- [ ] References the specific pillar(s) most relevant to the mechanic under review
+- [ ] Does not reference pillars not present in the provided document
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared creative domain
+- [ ] Escalates conflicts by presenting trade-offs to user rather than unilateral override
+- [ ] Uses gate IDs in output (e.g., `CD-PILLARS: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding cross-domain decisions (technical, production, art execution)
+
+---
+
+## Coverage Notes
+- Multi-gate scenario (e.g., single submission triggering both CD-PILLARS and CD-GDD-ALIGN) is not covered here — deferred to integration tests.
+- CD-PHASE-GATE (full phase advancement) involves synthesizing multiple sub-gate results; this complex case is deferred.
+- Playtest report interpretation (CD-PLAYTEST) is not covered — a dedicated case should be added when the playtest-report skill produces structured output.
+- Interaction with art-director on visual-pillar alignment is not covered.
--- a/Framework/agents/directors/producer.md
+++ b/Framework/agents/directors/producer.md
@ -0,0 +1,84 @@
+# Agent Test Spec: producer
+
+## Agent Summary
+**Domain owned:** Scope management, sprint planning validation, milestone tracking, epic prioritization, production phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), technical architecture (technical-director), creative direction.
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** PR-SCOPE, PR-SPRINT, PR-MILESTONE, PR-EPIC, PR-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/producer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references scope, sprint, milestone, production — not generic)
+- [ ] `allowed-tools:` list is primarily read-focused; Bash only if sprint/milestone files require parsing
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over design decisions or technical architecture
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A sprint plan is submitted for Sprint 7. The plan includes 12 story points across 4 team members over 2 weeks. Historical velocity from the last 3 sprints averages 11.5 points. Request is tagged PR-SPRINT.
+**Expected:** Returns `PR-SPRINT: REALISTIC` with rationale noting the plan is within one standard deviation of historical velocity and capacity appears matched.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC
+- [ ] Verdict token is formatted as `PR-SPRINT: REALISTIC`
+- [ ] Rationale references the specific story point count and historical velocity figures
+- [ ] Output stays within production scope — does not comment on whether the stories are well-designed or technically sound
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks producer to evaluate whether the game's "weight-based inventory" mechanic feels fun and engaging.
+**Expected:** Agent declines to evaluate game feel and redirects to game-designer or creative-director.
+**Assertions:**
+- [ ] Does not make any binding assessment of the mechanic's design quality
+- [ ] Explicitly names `game-designer` or `creative-director` as the correct handler
+- [ ] May note if the mechanic's scope has production implications (e.g., dependencies on other systems), but defers all design evaluation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new feature proposal adds three new systems (crafting, weather, and faction reputation) to a milestone that was scoped for two systems only. None of these additions appear in the current milestone plan. Request is tagged PR-SCOPE.
+**Expected:** Returns `PR-SCOPE: CONCERNS` with specific identification of the three unplanned systems and their absence from the milestone scope document.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC — not freeform text
+- [ ] Verdict token is formatted as `PR-SCOPE: CONCERNS`
+- [ ] Rationale names the three specific systems being added out of scope
+- [ ] Does not evaluate whether the systems are good design — only whether they fit the plan
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a late-breaking mechanic (dynamic weather affecting all gameplay systems) that technical-director warns will require 3 additional sprints. game-designer and technical-director are in disagreement about whether to proceed.
+**Expected:** Producer does not take a side on whether the mechanic is worth adding (design decision) or feasible (technical decision). Producer quantifies the production impact (3 sprints of delay, milestone slip risk), presents the trade-off to the user, and follows coordination-rules.md conflict resolution: escalate to the shared parent (in this case, surface the conflict for user decision since creative-director and technical-director are both top-tier).
+**Assertions:**
+- [ ] Quantifies the production impact in concrete terms (sprint count, milestone date slip)
+- [ ] Does not make a binding design or technical decision
+- [ ] Surfaces the conflict to the user with the scope implications clearly stated
+- [ ] References coordination-rules.md conflict resolution protocol (escalate to shared parent or user)
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the current milestone deadline (8 weeks away) and velocity data from the last 4 sprints (8, 10, 9, 11 points). A sprint plan is submitted with 14 story points.
+**Expected:** Assessment uses the provided velocity data to project whether 14 points is achievable, and references the 8-week milestone window to assess whether the current sprint's scope leaves adequate buffer.
+**Assertions:**
+- [ ] Uses the specific velocity figures from the provided context (not generic estimates)
+- [ ] References the 8-week deadline in the capacity assessment
+- [ ] Calculates or estimates remaining sprint count within the milestone window
+- [ ] Does not give generic scope advice disconnected from the supplied deadline and velocity data
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using REALISTIC / CONCERNS / UNREALISTIC vocabulary only
+- [ ] Stays within declared production domain
+- [ ] Escalates design/technical conflicts by quantifying scope impact and presenting to user
+- [ ] Uses gate IDs in output (e.g., `PR-SPRINT: REALISTIC`) not inline prose verdicts
+- [ ] Does not make binding game design or technical architecture decisions
+
+---
+
+## Coverage Notes
+- PR-EPIC (epic-level prioritization) is not covered — a dedicated case should be added when the /create-epics skill produces structured epic documents.
+- PR-MILESTONE (milestone health review) is not covered — deferred to integration with /milestone-review skill.
+- PR-PHASE-GATE (full production phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-sprint burn-down and velocity trend analysis are not covered here.
--- a/Framework/agents/directors/technical-director.md
+++ b/Framework/agents/directors/technical-director.md
@ -0,0 +1,84 @@
+# Agent Test Spec: technical-director
+
+## Agent Summary
+**Domain owned:** System architecture decisions, technical feasibility assessment, ADR oversight and approval, engine risk evaluation, technical phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), creative direction, visual art style, production scheduling (producer).
+**Model tier:** Opus (multi-document synthesis, high-stakes architecture and phase gate verdicts).
+**Gate IDs handled:** TD-SYSTEM-BOUNDARY, TD-FEASIBILITY, TD-ARCHITECTURE, TD-ADR, TD-ENGINE-RISK, TD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/technical-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references architecture, feasibility, ADR — not generic)
+- [ ] `allowed-tools:` list may include Read for architecture documents; Bash only if required for technical checks
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over game design decisions or creative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An architecture document for the "Combat System" is submitted. It describes a layered design: input layer → game logic layer → presentation layer, with clearly defined interfaces between each. Request is tagged TD-ARCHITECTURE.
+**Expected:** Returns `TD-ARCHITECTURE: APPROVE` with rationale confirming that system boundaries are correctly separated and interfaces are well-defined.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `TD-ARCHITECTURE: APPROVE`
+- [ ] Rationale specifically references the layered structure and interface definitions — not generic architecture advice
+- [ ] Output stays within technical scope — does not comment on whether the mechanic is fun or fits the creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Writer asks technical-director to review and approve the dialogue scripts for the game's opening cutscene.
+**Expected:** Agent declines to evaluate dialogue quality and redirects to narrative-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the dialogue content or structure
+- [ ] Explicitly names `narrative-director` as the correct handler
+- [ ] May note technical constraints that affect dialogue (e.g., localization string limits, data format), but defers all content decisions
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed multiplayer mechanic requires raycasting against all active entities every frame to detect line-of-sight. At expected player counts (1000 entities in a large zone), this is O(n²) per frame. Request is tagged TD-FEASIBILITY.
+**Expected:** Returns `TD-FEASIBILITY: CONCERNS` with specific citation of the O(n²) complexity and the entity count that makes this infeasible at target framerate.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `TD-FEASIBILITY: CONCERNS`
+- [ ] Rationale includes the specific algorithmic complexity concern and the entity count threshold
+- [ ] Suggests at least one alternative approach (e.g., spatial partitioning, interest management) without mandating which to choose
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a real-time physics simulation for every inventory item (hundreds of items on screen simultaneously). technical-director assesses this as technically expensive and proposes simplifying the simulation. game-designer disagrees, arguing it is essential to the game feel.
+**Expected:** technical-director clearly states the technical cost and constraints, proposes alternative implementation approaches that could achieve a similar feel, but explicitly defers the final design priority decision to creative-director as the arbiter of player experience trade-offs.
+**Assertions:**
+- [ ] Expresses the technical concern with specifics (e.g., performance budget, estimated cost)
+- [ ] Proposes at least one alternative that could reduce cost while preserving intent
+- [ ] Explicitly defers the "is this worth the cost" decision to creative-director — does not unilaterally cut the feature
+- [ ] Does not claim authority to override game-designer's design intent
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the target platform constraints: mobile, 60fps target, 2GB RAM ceiling, no compute shaders. A proposed architecture includes a GPU-driven rendering pipeline.
+**Expected:** Assessment references the specific hardware constraints from the context, identifies the compute shader dependency as incompatible with the stated platform constraints, and returns a CONCERNS or REJECT verdict with those specifics cited.
+**Assertions:**
+- [ ] References the specific platform constraints provided (mobile, 2GB RAM, no compute shaders)
+- [ ] Does not give generic performance advice disconnected from the supplied constraints
+- [ ] Correctly identifies the architectural component that conflicts with the platform constraint
+- [ ] Verdict includes rationale tied to the provided context, not boilerplate warnings
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared technical domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `TD-FEASIBILITY: CONCERNS`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- TD-ADR (Architecture Decision Record approval) is not covered — a dedicated case should be added when the /architecture-decision skill produces ADR documents.
+- TD-ENGINE-RISK assessment for specific engine versions (e.g., Godot 4.6 post-cutoff APIs) is not covered — deferred to engine-specialist integration tests.
+- TD-PHASE-GATE (full technical phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-domain architecture reviews (e.g., touching both TD-ARCHITECTURE and TD-ENGINE-RISK simultaneously) are not covered here.
--- a/Framework/agents/engine/godot/godot-csharp-specialist.md
+++ b/Framework/agents/engine/godot/godot-csharp-specialist.md
@ -0,0 +1,81 @@
+# Agent Test Spec: godot-csharp-specialist
+
+## Agent Summary
+Domain: C# patterns in Godot 4, .NET idioms applied to Godot, [Export] attribute usage, signal delegates, and async/await patterns.
+Does NOT own: GDScript code (gdscript-specialist), GDExtension C/C++ bindings (gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references C# in Godot 4 / .NET patterns / signal delegates)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or GDExtension code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an export property for enemy health with validation that clamps it between 1 and 1000."
+**Expected behavior:**
+- Produces a C# property with `[Export]` attribute
+- Uses a backing field with a property getter/setter that clamps the value in the setter
+- Does NOT use a raw `[Export]` public field without validation
+- Follows Godot 4 C# naming conventions (PascalCase for properties, fields private with underscore prefix)
+- Includes XML doc comment on the property per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Rewrite this enemy health system in GDScript."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects the request to `godot-gdscript-specialist`
+- May note that the C# interface can be described so the gdscript-specialist knows the expected API shape
+
+### Case 3: Async signal awaiting
+**Input:** "Wait for an animation to finish before transitioning game state using C# async."
+**Expected behavior:**
+- Produces a proper `async Task` pattern using `ToSignal()` to await a Godot signal
+- Uses `await ToSignal(animationPlayer, AnimationPlayer.SignalName.AnimationFinished)`
+- Does NOT use `Thread.Sleep()` or `Task.Delay()` as a polling substitute
+- Notes that the calling method must be `async` and that fire-and-forget `async void` is only acceptable for event handlers
+- Handles cancellation or timeout if the animation could fail to fire
+
+### Case 4: Threading model conflict
+**Input:** "This C# code accesses a Godot Node from a background Task thread to update its position."
+**Expected behavior:**
+- Flags this as a race condition risk: Godot nodes are not thread-safe and must only be accessed from the main thread
+- Does NOT approve or implement the multi-threaded node access pattern
+- Provides the correct pattern: use `CallDeferred()`, `Callable.From().CallDeferred()`, or marshal back to the main thread via a thread-safe queue
+- Explains the distinction between Godot's main thread requirement and .NET's thread-agnostic types
+
+### Case 5: Context pass — Godot 4.6 API correctness
+**Input:** Engine version context: Godot 4.6. Request: "Connect a signal using the new typed signal delegate pattern."
+**Expected behavior:**
+- Produces C# signal connection using the typed delegate pattern introduced in Godot 4 C# (`+=` operator on typed signal)
+- Checks the 4.6 context to confirm no breaking changes to the signal delegate API in 4.4, 4.5, or 4.6
+- Does NOT use the old string-based `Connect("signal_name", callable)` pattern (deprecated in Godot 4 C#)
+- Produces code compatible with the project's pinned 4.6 version as documented in VERSION.md
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (C# in Godot 4 — patterns, exports, signals, async)
+- [ ] Redirects GDScript requests to godot-gdscript-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns C# code following Godot 4 conventions (not Unity MonoBehaviour patterns)
+- [ ] Flags multi-threaded Godot node access as unsafe and provides the correct pattern
+- [ ] Uses typed signal delegates — not deprecated string-based Connect() calls
+- [ ] Checks engine version reference for API changes before producing code
+
+---
+
+## Coverage Notes
+- Export property with validation (Case 1) should have a unit test verifying the clamp behavior
+- Threading conflict (Case 4) is safety-critical: the agent must identify and fix this without prompting
+- Async signal (Case 3) verifies the agent applies .NET idioms correctly within Godot's single-thread constraint
--- a/Framework/agents/engine/godot/godot-gdextension-specialist.md
+++ b/Framework/agents/engine/godot/godot-gdextension-specialist.md
@ -0,0 +1,86 @@
+# Agent Test Spec: godot-gdextension-specialist
+
+## Agent Summary
+Domain: GDExtension API, godot-cpp C++ bindings, godot-rust bindings, native library integration, and native performance optimization.
+Does NOT own: GDScript code (gdscript-specialist), shader code (godot-shader-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDExtension / godot-cpp / native bindings)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or shader authoring
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Expose a C++ rigid-body physics simulation library to GDScript via GDExtension."
+**Expected behavior:**
+- Produces a GDExtension binding pattern using godot-cpp:
+  - Class inheriting from `godot::Object` or an appropriate Godot base class
+  - `GDCLASS` macro registration
+  - `_bind_methods()` implementation exposing the physics API to GDScript
+  - `GDExtension` entry point (`gdextension_init`) setup
+- Notes the `.gdextension` manifest file format required
+- Does NOT produce the GDScript usage code (that belongs to gdscript-specialist)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Write the GDScript that calls the physics simulation from Case 1."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects to `godot-gdscript-specialist`
+- May describe the API surface the GDScript should call (method names, parameter types) as a handoff spec
+
+### Case 3: ABI compatibility risk — minor version update
+**Input:** "We're upgrading from Godot 4.5 to 4.6. Will our existing GDExtension still work?"
+**Expected behavior:**
+- Flags the ABI compatibility concern: GDExtension binaries may not be ABI-compatible across minor versions
+- Directs to check the 4.5→4.6 migration guide for GDExtension API changes
+- Recommends recompiling the extension against the 4.6 godot-cpp headers rather than assuming binary compatibility
+- Notes that the `.gdextension` manifest may need a `compatibility_minimum` version update
+- Provides the recompilation checklist
+
+### Case 4: Memory management — RAII for Godot objects
+**Input:** "How should we manage the lifecycle of Godot objects created inside C++ GDExtension code?"
+**Expected behavior:**
+- Produces the RAII-based lifecycle pattern for Godot objects in GDExtension:
+  - `Ref<T>` for reference-counted objects (auto-released when Ref goes out of scope)
+  - `memnew()` / `memdelete()` for non-reference-counted objects
+  - Warning: do NOT use `new`/`delete` for Godot objects — undefined behavior
+- Notes object ownership rules: who is responsible for freeing a node added to the scene tree
+- Provides a concrete example managing a `CollisionShape3D` created in C++
+
+### Case 5: Context pass — Godot 4.6 GDExtension API check
+**Input:** Engine version context: Godot 4.6 (upgrading from 4.5). Request: "Check if any GDExtension APIs changed from 4.5 to 4.6."
+**Expected behavior:**
+- References the 4.5→4.6 migration guide from the VERSION.md verified sources list
+- Reports on any documented GDExtension API changes in the 4.6 release
+- If no breaking changes are documented for GDExtension in 4.6, states that explicitly with the caveat to verify against the official changelog
+- Flags the D3D12 default on Windows (4.6 change) as potentially relevant for GDExtension rendering code
+- Provides a checklist of what to verify after upgrading
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDExtension, godot-cpp, godot-rust, native bindings)
+- [ ] Redirects GDScript authoring to godot-gdscript-specialist
+- [ ] Redirects shader authoring to godot-shader-specialist
+- [ ] Returns structured output (binding patterns, RAII examples, ABI checklists)
+- [ ] Flags ABI compatibility risks on minor version upgrades — never assumes binary compatibility
+- [ ] Uses Godot-specific memory management (`memnew`/`memdelete`, `Ref<T>`) not raw C++ new/delete
+- [ ] Checks engine version reference for GDExtension API changes before confirming compatibility
+
+---
+
+## Coverage Notes
+- Binding pattern (Case 1) should include a smoke test verifying the extension loads and the method is callable from GDScript
+- ABI risk (Case 3) is a critical escalation path — the agent must not approve shipping an unverified extension binary
+- Memory management (Case 4) verifies the agent applies Godot-specific patterns, not generic C++ RAII
--- a/Framework/agents/engine/godot/godot-gdscript-specialist.md
+++ b/Framework/agents/engine/godot/godot-gdscript-specialist.md
@ -0,0 +1,82 @@
+# Agent Test Spec: godot-gdscript-specialist
+
+## Agent Summary
+Domain: GDScript static typing, design patterns in GDScript, signal architecture, coroutine/await patterns, and GDScript performance.
+Does NOT own: shader code (godot-shader-specialist), GDExtension bindings (godot-gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDScript / static typing / signals / coroutines)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over shader code or GDExtension
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review this GDScript file for type annotation coverage."
+**Expected behavior:**
+- Reads the provided GDScript file
+- Flags every variable, parameter, and return type that is missing a static type annotation
+- Produces a list of specific line-by-line findings: `var speed = 5.0` → `var speed: float = 5.0`
+- Notes the performance and tooling benefits of static typing in Godot 4
+- Does NOT rewrite the entire file unprompted — produces a findings list for the developer to apply
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Write a vertex shader to distort the mesh in world space."
+**Expected behavior:**
+- Does NOT produce shader code in GDScript or in Godot's shading language
+- Explicitly states that shader authoring belongs to `godot-shader-specialist`
+- Redirects the request to `godot-shader-specialist`
+- May note that the GDScript side (passing uniforms to a shader, setting shader parameters) is within its domain
+
+### Case 3: Async loading with coroutines
+**Input:** "Load a scene asynchronously and wait for it to finish before spawning it."
+**Expected behavior:**
+- Produces an `await` + `ResourceLoader.load_threaded_request` pattern for Godot 4
+- Uses static typing throughout (`var scene: PackedScene`)
+- Handles the completion check with `ResourceLoader.load_threaded_get_status()`
+- Notes error handling for failed loads
+- Does NOT use deprecated Godot 3 `yield()` syntax
+
+### Case 4: Performance issue — typed array recommendation
+**Input:** "The entity update loop is slow; it iterates an untyped Array of 1,000 nodes every frame."
+**Expected behavior:**
+- Identifies that an untyped `Array` foregoes compiler optimization in GDScript
+- Recommends converting to a typed array (`Array[Node]` or the specific type) to enable JIT hints
+- Notes that if this is still insufficient, escalates the hot path to C# migration recommendation
+- Produces the typed array refactor as the immediate fix
+- Does NOT recommend migrating the entire codebase to C# without profiling evidence
+
+### Case 5: Context pass — Godot 4.6 with post-cutoff features
+**Input:** Engine version context provided: Godot 4.6. Request: "Create an abstract base class for all enemy types using @abstract."
+**Expected behavior:**
+- Identifies `@abstract` as a Godot 4.5+ feature (post-cutoff)
+- Notes this in the output: feature introduced in 4.5, verified against VERSION.md migration notes
+- Produces the GDScript class using `@abstract` with correct syntax as documented in migration notes
+- Marks the output as requiring verification against the official 4.5 release notes due to post-cutoff status
+- Uses static typing for all method signatures in the abstract class
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDScript — typing, patterns, signals, coroutines, performance)
+- [ ] Redirects shader requests to godot-shader-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns structured GDScript output with full static typing
+- [ ] Uses Godot 4 API only — no deprecated Godot 3 patterns (yield, connect with strings, etc.)
+- [ ] Flags post-cutoff features (4.4, 4.5, 4.6) and marks them as requiring doc verification
+
+---
+
+## Coverage Notes
+- Type annotation review (Case 1) output is suitable as a code review checklist
+- Async loading (Case 3) should produce testable code verifiable with a unit test in `tests/unit/`
+- Post-cutoff @abstract (Case 5) confirms the agent flags version uncertainty rather than silently using unverified APIs
--- a/Framework/agents/engine/godot/godot-shader-specialist.md
+++ b/Framework/agents/engine/godot/godot-shader-specialist.md
@ -0,0 +1,84 @@
+# Agent Test Spec: godot-shader-specialist
+
+## Agent Summary
+Domain: Godot shading language (GLSL-derivative), visual shaders (VisualShader graph), material setup, particle shaders, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot shading language / materials / post-processing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative source for Godot shader API changes
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Write a dissolve effect shader for enemy death in Godot."
+**Expected behavior:**
+- Produces valid Godot shading language code (not HLSL, not GLSL directly)
+- Uses `shader_type spatial;` or `canvas_item` as appropriate
+- Defines `uniform float dissolve_amount : hint_range(0.0, 1.0);`
+- Samples a noise texture to determine per-pixel dissolve threshold
+- Uses `discard;` for pixels below the threshold
+- Optionally adds an edge glow using emission near the dissolve boundary
+- Code is syntactically correct for Godot's shading language
+
+### Case 2: HLSL redirect
+**Input:** "Write an HLSL compute shader for this dissolve effect."
+**Expected behavior:**
+- Does NOT produce HLSL code
+- Clearly states: "Godot does not use HLSL directly; it uses its own shading language (a GLSL derivative)"
+- Translates the HLSL intent to the equivalent Godot shader approach
+- Notes that RenderingDevice compute shaders are available in Godot 4 but are a low-level API and flags it appropriately if that was the intent
+
+### Case 3: Post-cutoff API change — texture sampling (Godot 4.4)
+**Input:** "Use `texture()` with a sampler2D to sample the noise texture in the shader."
+**Expected behavior:**
+- Checks the version reference: Godot 4.4 changed texture sampler type declarations
+- Flags the potential API change: `sampler2D` syntax and `texture()` call behavior may differ from pre-4.4
+- Provides the correct syntax for the project's pinned version (4.6) as documented in migration notes
+- Does NOT use pre-4.4 texture sampling syntax without flagging the version risk
+
+### Case 4: Fragment shader LOD strategy
+**Input:** "The fragment shader for the water surface has 8 texture samples and is causing GPU bottlenecks on mid-range hardware."
+**Expected behavior:**
+- Identifies the per-fragment texture sample count as the primary cost driver
+- Proposes an LOD strategy:
+  - Reduce sample count at distance (distance-based shader variant or LOD level)
+  - Pre-bake some texture combinations offline
+  - Use lower-resolution noise textures for distant samples
+- Provides the shader code modification implementing the LOD approach
+- Does NOT change gameplay behavior of the water system
+
+### Case 5: Context pass — Godot 4.6 glow rework
+**Input:** Engine version context: Godot 4.6. Request: "Add a bloom/glow post-processing effect to the scene."
+**Expected behavior:**
+- References the VERSION.md note: Godot 4.6 includes a glow rework
+- Produces glow configuration guidance using the 4.6 WorldEnvironment approach, not the pre-4.6 API
+- Explicitly notes which properties or parameters changed in the 4.6 glow rework
+- Flags any properties that the LLM's training data may have incorrect information about due to the post-cutoff timing
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot shading language, materials, VFX shaders, post-processing)
+- [ ] Redirects gameplay code requests to gameplay-programmer
+- [ ] Produces valid Godot shading language — never HLSL or raw GLSL without a Godot wrapper
+- [ ] Checks engine version reference for post-cutoff shader API changes (4.4 texture types, 4.6 glow rework)
+- [ ] Returns structured output (shader code with uniforms documented, LOD strategies with performance rationale)
+- [ ] Flags any post-cutoff API usage as requiring verification
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should be paired with a visual test screenshot in `production/qa/evidence/`
+- Texture API flag (Case 3) confirms the agent checks VERSION.md before using APIs that changed post-4.3
+- Glow rework (Case 5) is a Godot 4.6-specific test — verifies the agent applies the most recent migration notes
--- a/Framework/agents/engine/godot/godot-specialist.md
+++ b/Framework/agents/engine/godot/godot-specialist.md
@ -0,0 +1,82 @@
+# Agent Test Spec: godot-specialist
+
+## Agent Summary
+Domain: Godot-specific patterns, node/scene architecture, signals, resources, and GDScript vs C# vs GDExtension decisions.
+Does NOT own: actual code authoring in a specific language (delegates to language sub-specialists).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot architecture / node patterns / engine decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative API source
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "When should I use signals vs. direct method calls in Godot?"
+**Expected behavior:**
+- Produces a pattern decision guide with rationale:
+  - Signals: decoupled communication, parent-to-child ignorance, event-driven UI updates, one-to-many notification
+  - Direct calls: tightly-coupled systems where the caller needs a return value, or performance-critical hot paths
+- Provides concrete examples of each pattern in the project's context
+- Does NOT produce raw code for both patterns — refers to gdscript-specialist or csharp-specialist for implementation
+- Notes the "no upward signals" convention (child does not call parent methods directly — uses signals instead)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Write a MonoBehaviour that runs on Start() and subscribes to a UnityEvent."
+**Expected behavior:**
+- Does NOT produce Unity MonoBehaviour code
+- Clearly identifies that this is a Unity pattern, not a Godot pattern
+- Provides the Godot equivalent: a Node script using `_ready()` instead of `Start()`, and Godot signals instead of UnityEvent
+- Confirms the project is Godot-based and redirects the conceptual mapping
+
+### Case 3: Post-cutoff API risk
+**Input:** "Use the new Godot 4.5 @abstract annotation to define an abstract base class."
+**Expected behavior:**
+- Identifies that `@abstract` is a post-cutoff feature (introduced in Godot 4.5, after LLM knowledge cutoff)
+- Flags the version risk: LLM knowledge of this annotation may be incomplete or incorrect
+- Directs the user to verify against `docs/engine-reference/godot/VERSION.md` and the official 4.5 migration guide
+- Provides best-effort guidance based on the migration notes in the version reference while clearly marking it as unverified
+
+### Case 4: Language selection for a hot path
+**Input:** "The physics query loop runs every frame for 500 objects. Should we use GDScript or C# for this?"
+**Expected behavior:**
+- Provides a balanced analysis:
+  - GDScript: simpler, team familiar, but slower for tight loops
+  - C#: faster for CPU-intensive loops, requires .NET runtime, team needs C# knowledge
+- Does NOT make the final decision unilaterally
+- Defers the decision to `lead-programmer` with the analysis as input
+- Notes that GDExtension (C++) is a third option for extreme performance cases and recommends escalating if C# is insufficient
+
+### Case 5: Context pass — engine version 4.6
+**Input:** Engine version context provided: Godot 4.6, Jolt as default physics. Request: "Set up a RigidBody3D for the player character."
+**Expected behavior:**
+- Reads the 4.6 context and applies the Jolt-default knowledge (from VERSION.md migration notes)
+- Recommends RigidBody3D configuration choices that are Jolt-compatible (e.g., notes any GodotPhysics-specific settings that behave differently under Jolt)
+- References the 4.6 migration note about Jolt becoming default rather than relying on LLM training data alone
+- Flags any RigidBody3D properties that changed behavior between GodotPhysics and Jolt
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot architecture decisions, node/scene patterns, language selection)
+- [ ] Redirects language-specific implementation to godot-gdscript-specialist or godot-csharp-specialist
+- [ ] Returns structured findings (decision trees, pattern recommendations with rationale)
+- [ ] Treats `docs/engine-reference/godot/VERSION.md` as authoritative over LLM training data
+- [ ] Flags post-cutoff API usage (4.4, 4.5, 4.6) with verification requirements
+- [ ] Defers language-selection decisions to lead-programmer when trade-offs exist
+
+---
+
+## Coverage Notes
+- Signal vs. direct call guide (Case 1) should be written to `docs/architecture/` as a reusable pattern doc
+- Post-cutoff flag (Case 3) confirms the agent does not confidently use APIs it cannot verify
+- Engine version case (Case 5) verifies the agent applies migration notes from the version reference, not assumptions
--- a/Framework/agents/engine/unity/unity-addressables-specialist.md
+++ b/Framework/agents/engine/unity/unity-addressables-specialist.md
@ -0,0 +1,87 @@
+# Agent Test Spec: unity-addressables-specialist
+
+## Agent Summary
+Domain: Addressable Asset System — groups, async loading/unloading, handle lifecycle management, memory budgeting, content catalogs, and remote content delivery.
+Does NOT own: rendering systems (engine-programmer), game logic that uses the loaded assets (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Addressables / asset loading / content catalogs / remote delivery)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over rendering systems or gameplay using the loaded assets
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Load a character texture asynchronously and release it when the character is destroyed."
+**Expected behavior:**
+- Produces the `Addressables.LoadAssetAsync<Texture2D>()` call pattern
+- Stores the returned `AsyncOperationHandle<Texture2D>` in the requesting object
+- On character destruction (`OnDestroy()`), calls `Addressables.Release(handle)` with the stored handle
+- Does NOT use `Resources.Load()` as the loading mechanism
+- Notes that releasing with a null or uninitialized handle causes errors — includes a validity check
+- Notes the difference between releasing the handle vs. releasing the asset (handle release is correct)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the rendering system that applies the loaded texture to the character mesh."
+**Expected behavior:**
+- Does NOT produce rendering or mesh material assignment code
+- Explicitly states that rendering system implementation belongs to `engine-programmer`
+- Redirects the request to `engine-programmer`
+- May describe the asset type and API surface it will provide (e.g., `Texture2D` reference once the handle completes) as a handoff spec
+
+### Case 3: Memory leak — un-released handle
+**Input:** "Memory usage keeps climbing after each level load. We use Addressables to load level assets."
+**Expected behavior:**
+- Diagnoses the likely cause: `AsyncOperationHandle` objects not being released after use
+- Identifies the handle leak pattern: loading assets into a local variable, losing reference, never calling `Addressables.Release()`
+- Produces an auditing approach: search for all `LoadAssetAsync` / `LoadSceneAsync` calls and verify matching `Release()` calls
+- Provides a corrected pattern using a tracked handle list (`List<AsyncOperationHandle>`) with a `ReleaseAll()` cleanup method
+- Does NOT assume the leak is elsewhere without evidence
+
+### Case 4: Remote content delivery — catalog versioning
+**Input:** "We need to support downloadable content updates without requiring a full app re-install."
+**Expected behavior:**
+- Produces the remote catalog update pattern:
+  - `Addressables.CheckForCatalogUpdates()` on startup
+  - `Addressables.UpdateCatalogs()` for detected updates
+  - `Addressables.DownloadDependenciesAsync()` to pre-warm the updated content
+- Notes catalog hash checking for change detection
+- Addresses the edge case: what happens if a player starts a session, the catalog updates mid-session — defines behavior (complete current session on old catalog, reload on next launch)
+- Does NOT design the server-side CDN infrastructure (defers to devops-engineer)
+
+### Case 5: Context pass — platform memory constraints
+**Input:** Platform context: Nintendo Switch target, 4GB RAM, practical asset memory ceiling 512MB. Request: "Design the Addressables loading strategy for a large open-world level."
+**Expected behavior:**
+- References the 512MB memory ceiling from the provided context
+- Designs a streaming strategy:
+  - Divide the world into addressable zones loaded/unloaded based on player proximity
+  - Defines a memory budget per active zone (e.g., 128MB, max 4 zones active)
+  - Specifies async pre-load trigger distance and unload distance (hysteresis)
+- Notes Switch-specific constraints: slower load times from SD card, recommend pre-warming adjacent zones
+- Does NOT produce a loading strategy that would exceed the stated 512MB ceiling without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Addressables loading, handle lifecycle, memory, catalogs, remote delivery)
+- [ ] Redirects rendering and gameplay asset-use code to engine-programmer and gameplay-programmer
+- [ ] Returns structured output (loading patterns, handle lifecycle code, streaming zone designs)
+- [ ] Always pairs `LoadAssetAsync` with a corresponding `Release()` — flags handle leaks as a memory bug
+- [ ] Designs loading strategies against provided memory ceilings
+- [ ] Does not design CDN/server infrastructure — defers to devops-engineer for server side
+
+---
+
+## Coverage Notes
+- Handle lifecycle (Case 1) must include a test verifying memory is reclaimed after release
+- Handle leak diagnosis (Case 3) should produce a findings report suitable for a bug ticket
+- Platform memory case (Case 5) verifies the agent applies hard constraints from context, not default assumptions
--- a/Framework/agents/engine/unity/unity-dots-specialist.md
+++ b/Framework/agents/engine/unity/unity-dots-specialist.md
@ -0,0 +1,87 @@
+# Agent Test Spec: unity-dots-specialist
+
+## Agent Summary
+Domain: ECS architecture (IComponentData, ISystem, SystemAPI), Jobs system (IJob, IJobEntity, Burst), Burst compiler constraints, DOTS gameplay systems, and hybrid renderer.
+Does NOT own: MonoBehaviour gameplay code (gameplay-programmer), UI implementation (unity-ui-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references ECS / Jobs / Burst / IComponentData)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over MonoBehaviour gameplay or UI systems
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Convert the player movement system to ECS."
+**Expected behavior:**
+- Produces:
+  - `PlayerMovementData : IComponentData` struct with velocity, speed, and input vector fields
+  - `PlayerMovementSystem : ISystem` with `OnUpdate()` using `SystemAPI.Query<>` or `IJobEntity`
+  - Bakes the player's initial state from an authoring MonoBehaviour via `IBaker`
+- Uses `RefRW<LocalTransform>` for position updates (not deprecated `Translation`)
+- Marks the job `[BurstCompile]` and notes what must be unmanaged for Burst compatibility
+- Does NOT modify the input polling system — reads from an existing `PlayerInputData` component
+
+### Case 2: MonoBehaviour push-back
+**Input:** "Just use MonoBehaviour for the player movement — it's simpler."
+**Expected behavior:**
+- Acknowledges the simplicity argument
+- Explains the DOTS trade-off: more setup upfront, but the ECS/Burst approach provides the performance characteristics documented in the project's ADR or requirements
+- Does NOT implement a MonoBehaviour version if the project has committed to DOTS
+- If no commitment exists, flags the architecture decision to `lead-programmer` / `technical-director` for resolution
+- Does not make the MonoBehaviour vs. DOTS decision unilaterally
+
+### Case 3: Burst-incompatible managed memory
+**Input:** "This Burst job accesses a `List<EnemyData>` to find the nearest enemy."
+**Expected behavior:**
+- Flags `List<T>` as a managed type that is incompatible with Burst compilation
+- Does NOT approve the Burst job with managed memory access
+- Provides the correct replacement: `NativeArray<EnemyData>`, `NativeList<EnemyData>`, or `NativeHashMap<>` depending on the use case
+- Notes that `NativeArray` must be disposed explicitly or via `[DeallocateOnJobCompletion]`
+- Produces the corrected job using unmanaged native containers
+
+### Case 4: Hybrid access — DOTS system needs MonoBehaviour data
+**Input:** "The DOTS movement system needs to read the camera transform managed by a MonoBehaviour CameraController."
+**Expected behavior:**
+- Identifies this as a hybrid access scenario
+- Provides the correct hybrid pattern: store the camera transform in a singleton `IComponentData` (updated from the MonoBehaviour side each frame via `EntityManager.SetComponentData`)
+- Alternatively suggests the `CompanionComponent` / managed component approach
+- Does NOT access the MonoBehaviour from inside a Burst job — flags that as unsafe
+- Provides the bridge code on both the MonoBehaviour side (writing to ECS) and the DOTS system side (reading from ECS)
+
+### Case 5: Context pass — performance targets
+**Input:** Technical preferences from context: 60fps target, max 2ms CPU script budget per frame. Request: "Design the ECS chunk layout for 10,000 enemy entities."
+**Expected behavior:**
+- References the 2ms CPU budget explicitly in the design rationale
+- Designs the `IComponentData` chunk layout for cache efficiency:
+  - Groups frequently-queried together components in the same archetype
+  - Separates rarely-used data into separate components to keep hot data compact
+  - Estimates entity iteration time against the 2ms budget
+- Provides memory layout analysis (bytes per entity, entities per chunk at 16KB chunk size)
+- Does NOT design a layout that will obviously exceed the stated 2ms budget without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (ECS, Jobs, Burst, DOTS gameplay systems)
+- [ ] Redirects MonoBehaviour-only gameplay to gameplay-programmer
+- [ ] Returns structured output (IComponentData structs, ISystem implementations, IBaker authoring classes)
+- [ ] Flags managed memory access in Burst jobs as a compile error and provides unmanaged alternatives
+- [ ] Provides hybrid access patterns when DOTS systems need to interact with MonoBehaviour systems
+- [ ] Designs chunk layouts against provided performance budgets
+
+---
+
+## Coverage Notes
+- ECS conversion (Case 1) must include a unit test using the ECS test framework (`World`, `EntityManager`)
+- Burst incompatibility (Case 3) is safety-critical — the agent must catch this before the code is written
+- Chunk layout (Case 5) verifies the agent applies quantitative performance reasoning to architecture decisions
--- a/Framework/agents/engine/unity/unity-shader-specialist.md
+++ b/Framework/agents/engine/unity/unity-shader-specialist.md
@ -0,0 +1,83 @@
+# Agent Test Spec: unity-shader-specialist
+
+## Agent Summary
+Domain: Unity Shader Graph, custom HLSL, VFX Graph, URP/HDRP pipeline customization, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Shader Graph / HLSL / VFX Graph / URP / HDRP)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay code or art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an outline effect for characters using Shader Graph in URP."
+**Expected behavior:**
+- Produces a Shader Graph node setup description:
+  - Inverted hull method: Scale Normal → Vertex offset in vertex stage, Cull Front
+  - OR screen-space post-process outline using depth/normal edge detection
+- Recommends the appropriate method based on URP capabilities (inverted hull for URP compatibility, post-process for HDRP)
+- Notes URP limitations: no geometry shader support (rules out geometry-shader outline approach)
+- Does NOT produce HDRP-specific nodes without confirming the render pipeline
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the character health bar UI in code."
+**Expected behavior:**
+- Does NOT produce UI implementation code
+- Explicitly states that UI implementation belongs to `ui-programmer` (or `unity-ui-specialist`)
+- Redirects the request appropriately
+- May note that a shader-based fill effect for a health bar (e.g., a dissolve/fill gradient) is within its domain if the visual effect itself is shader-driven
+
+### Case 3: HDRP custom pass for outline
+**Input:** "We're on HDRP and want the outline as a post-process effect."
+**Expected behavior:**
+- Produces the HDRP `CustomPassVolume` pattern:
+  - C# class inheriting `CustomPass`
+  - `Execute()` method using `CoreUtils.SetRenderTarget()` and a full-screen shader blit
+  - Depth/normal buffer sampling for edge detection
+- Notes that CustomPass requires HDRP package and does not work in URP
+- Confirms the project is on HDRP before providing HDRP-specific code
+
+### Case 4: VFX Graph performance — GPU event batching
+**Input:** "The explosion VFX Graph has 10,000 particles per event and spawning 20 simultaneous explosions is causing GPU frame spikes."
+**Expected behavior:**
+- Identifies GPU particle spawn as the cost driver (200,000 simultaneous particles)
+- Proposes GPU event batching: spawn events deferred over multiple frames, stagger initialization
+- Recommends a particle budget cap per active explosion (e.g., 3,000 per explosion, queue excess)
+- Notes the VFX Graph Event Batcher pattern and Output Event API for cross-frame distribution
+- Does NOT change the gameplay event system — proposes a VFX-side budgeting solution
+
+### Case 5: Context pass — render pipeline (URP or HDRP)
+**Input:** Project context: URP render pipeline, Unity 2022.3. Request: "Add depth of field post-processing."
+**Expected behavior:**
+- Uses URP Volume framework: `DepthOfField` Volume Override component
+- Does NOT use HDRP Volume components (e.g., HDRP's `DepthOfField` with different parameter names)
+- Notes URP-specific DOF limitations vs HDRP (e.g., Bokeh quality differences)
+- Produces C# Volume profile setup code compatible with Unity 2022.3 URP package version
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Shader Graph, HLSL, VFX Graph, URP/HDRP customization)
+- [ ] Redirects gameplay and UI code to appropriate agents
+- [ ] Returns structured output (node graph descriptions, HLSL code, CustomPass patterns)
+- [ ] Distinguishes between URP and HDRP approaches — never cross-contaminates pipeline-specific APIs
+- [ ] Flags geometry shader approaches as URP-incompatible when relevant
+- [ ] Produces VFX optimizations that do not change gameplay behavior
+
+---
+
+## Coverage Notes
+- Outline effect (Case 1) should be paired with a visual screenshot test in `production/qa/evidence/`
+- HDRP CustomPass (Case 3) confirms the agent produces the correct Unity pattern, not a generic post-process approach
+- Pipeline separation (Case 5) verifies the agent never assumes the render pipeline without context
--- a/Framework/agents/engine/unity/unity-specialist.md
+++ b/Framework/agents/engine/unity/unity-specialist.md
@ -0,0 +1,83 @@
+# Agent Test Spec: unity-specialist
+
+## Agent Summary
+Domain: Unity-specific architecture patterns, MonoBehaviour vs DOTS decisions, and subsystem selection (Addressables, New Input System, UI Toolkit, Cinemachine, etc.).
+Does NOT own: language-specific deep dives (delegates to unity-dots-specialist, unity-ui-specialist, etc.).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unity patterns / MonoBehaviour / subsystem decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition acknowledges the sub-specialist routing table (DOTS, UI, Shader, Addressables)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Should I use MonoBehaviour or ScriptableObject for storing enemy configuration data?"
+**Expected behavior:**
+- Produces a pattern decision tree covering:
+  - MonoBehaviour: for runtime behavior, needs to be attached to a GameObject, has Update() lifecycle
+  - ScriptableObject: for pure data/configuration, exists as an asset, shared across instances, no scene dependency
+- Recommends ScriptableObject for enemy configuration data (stateless, reusable, designer-friendly)
+- Notes that MonoBehaviour can reference the ScriptableObject for runtime use
+- Provides a concrete example of what the ScriptableObject class definition looks like (does not produce full code — refers to engine-programmer or gameplay-programmer for implementation)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Set up a Node scene tree with signals for this enemy system."
+**Expected behavior:**
+- Does NOT produce Godot Node/signal code
+- Identifies this as a Godot pattern
+- States that in Unity the equivalent is GameObject hierarchy + UnityEvent or C# events
+- Maps the concepts: Godot Node → Unity MonoBehaviour, Godot Signal → C# event / UnityEvent
+- Confirms the project is Unity-based before proceeding
+
+### Case 3: Unity version API flag
+**Input:** "Use the new Unity 6 GPU resident drawer for batch rendering."
+**Expected behavior:**
+- Identifies the Unity 6 feature (GPU Resident Drawer)
+- Flags that this API may not be available in earlier Unity versions
+- Asks for or checks the project's Unity version before providing implementation guidance
+- Directs to verify against official Unity 6 documentation
+- Does NOT assume the project is on Unity 6 without confirmation
+
+### Case 4: DOTS vs. MonoBehaviour conflict
+**Input:** "The combat system uses MonoBehaviour for state management, but we want to add a DOTS-based projectile system. Can they coexist?"
+**Expected behavior:**
+- Recognizes this as a hybrid architecture scenario
+- Explains the hybrid approach: MonoBehaviour can interface with DOTS via SystemAPI, IComponentData, and managed components
+- Notes the performance and complexity trade-offs of mixing the two patterns
+- Recommends escalating the architecture decision to `lead-programmer` or `technical-director`
+- Defers to `unity-dots-specialist` for the DOTS-side implementation details
+
+### Case 5: Context pass — Unity version
+**Input:** Project context provided: Unity 2023.3 LTS. Request: "Configure the new Input System for this project."
+**Expected behavior:**
+- Applies Unity 2023.3 LTS context: uses the New Input System (com.unity.inputsystem) package
+- Does NOT produce legacy Input Manager code (`Input.GetKeyDown()`, `Input.GetAxis()`)
+- Notes any 2023.3-specific Input System behaviors or package version constraints
+- References the project version to confirm Burst/Jobs compatibility if the Input System interacts with DOTS
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unity architecture decisions, pattern selection, subsystem routing)
+- [ ] Redirects Godot patterns to appropriate Godot specialists or flags them as wrong-engine
+- [ ] Redirects DOTS implementation to unity-dots-specialist
+- [ ] Redirects UI implementation to unity-ui-specialist
+- [ ] Flags Unity version-gated APIs and requires version confirmation before suggesting them
+- [ ] Returns structured pattern decision guides, not freeform opinions
+
+---
+
+## Coverage Notes
+- MonoBehaviour vs. ScriptableObject (Case 1) should be documented as an ADR if it results in a project-level decision
+- Version flag (Case 3) confirms the agent does not assume the latest Unity version without context
+- DOTS hybrid (Case 4) verifies the agent escalates architecture conflicts rather than resolving them unilaterally
--- a/Framework/agents/engine/unity/unity-ui-specialist.md
+++ b/Framework/agents/engine/unity/unity-ui-specialist.md
@ -0,0 +1,81 @@
+# Agent Test Spec: unity-ui-specialist
+
+## Agent Summary
+Domain: Unity UI Toolkit (UXML/USS), UGUI (Canvas), data binding, runtime UI performance, and UI input event handling.
+Does NOT own: UX flow design (ux-designer), visual art style (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UI Toolkit / UGUI / Canvas / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement an inventory UI screen using Unity UI Toolkit."
+**Expected behavior:**
+- Produces a UXML document defining the inventory panel structure (ListView, item templates, detail panel)
+- Produces USS styles for the inventory layout and item states (default, hover, selected)
+- Provides C# code binding the inventory data model to the UI via `INotifyValueChanged` or `IBindable`
+- Uses `ListView` with `makeItem` / `bindItem` callbacks for the scrollable item list
+- Does NOT produce the UX flow design — implements from a provided spec
+
+### Case 2: Out-of-domain redirect
+**Input:** "Design the UX flow for the inventory — what happens when the player equips vs. drops an item."
+**Expected behavior:**
+- Does NOT produce UX flow design
+- Explicitly states that interaction flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes it will implement whatever flow the ux-designer specifies
+
+### Case 3: UI Toolkit data binding for dynamic list
+**Input:** "The inventory list needs to update in real time as items are added or removed from the player's bag."
+**Expected behavior:**
+- Produces the `ListView` pattern with a bound `ObservableList<T>` or event-driven refresh approach
+- Uses `ListView.Rebuild()` or `ListView.RefreshItems()` on the backing collection change event
+- Notes the performance considerations for large lists (virtualization via `makeItem`/`bindItem` pattern)
+- Does NOT use `QuerySelector` loops to update individual elements as a list refresh strategy — flags that as a performance antipattern
+
+### Case 4: Canvas performance — overdraw
+**Input:** "The main menu canvas is causing GPU overdraw warnings; there are many overlapping panels."
+**Expected behavior:**
+- Identifies overdraw causes: multiple stacked canvases, full-screen overlay panels not culled when inactive
+- Recommends:
+  - Separate canvases for world-space, screen-space-overlay, and screen-space-camera layers
+  - Disable/deactivate panels instead of setting alpha to 0 (invisible alpha-0 panels still draw)
+  - Canvas Group + alpha for fade effects, not individual Image alpha
+- Notes UI Toolkit alternative if the project is in a migration position
+
+### Case 5: Context pass — Unity version
+**Input:** Project context: Unity 2022.3 LTS. Request: "Implement the settings panel with data binding."
+**Expected behavior:**
+- Uses UI Toolkit with the 2022.3 LTS version of the runtime binding system
+- Notes that Unity 2022.3 introduced runtime data binding (as opposed to editor-only binding in earlier versions)
+- Does NOT use the Unity 6 enhanced binding API features if they are not available in 2022.3
+- Produces code compatible with the stated Unity version, with version-specific API notes
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UI Toolkit, UGUI, data binding, UI performance)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Returns structured output (UXML, USS, C# binding code)
+- [ ] Uses the correct Unity UI framework version for the project's Unity version
+- [ ] Flags Canvas overdraw as a performance antipattern and provides specific remediation
+- [ ] Does not use alpha-0 as a hide/show pattern — uses SetActive() or VisualElement.style.display
+
+---
+
+## Coverage Notes
+- Inventory UI (Case 1) should have a manual walkthrough doc in `production/qa/evidence/`
+- Dynamic list binding (Case 3) should have an integration test or automated interaction test
+- Canvas overdraw (Case 4) verifies the agent knows the correct Unity UI performance patterns
--- a/Framework/agents/engine/unreal/ue-blueprint-specialist.md
+++ b/Framework/agents/engine/unreal/ue-blueprint-specialist.md
@ -0,0 +1,80 @@
+# Agent Test Spec: ue-blueprint-specialist
+
+## Agent Summary
+- **Domain**: Blueprint architecture, the Blueprint/C++ boundary, Blueprint graph quality, Blueprint performance optimization, Blueprint Function Library design
+- **Does NOT own**: C++ implementation (engine-programmer or gameplay-programmer), art assets or shaders, UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers to unreal-specialist or lead-programmer for cross-domain rulings
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Blueprint architecture and optimization)
+- [ ] `allowed-tools:` list matches the agent's role (Read for Blueprint project files; no server or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over C++ implementation decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint graph performance review
+**Input**: "Review our AI behavior Blueprint. It has tick-based logic running every frame that checks line-of-sight for 30 NPCs simultaneously."
+**Expected behavior**:
+- Identifies tick-heavy logic as a performance problem
+- Recommends switching from EventTick to event-driven patterns (perception system events, timers, or polling on a reduced interval)
+- Flags the per-NPC cost of simultaneous line-of-sight checks
+- Suggests alternatives: AIPerception component events, staggered tick groups, or moving the system to C++ if Blueprint overhead is measured to be significant
+- Output is structured: problem identified, impact estimated, alternatives listed
+
+### Case 2: Out-of-domain request — C++ implementation
+**Input**: "Write the C++ implementation for this ability cooldown system."
+**Expected behavior**:
+- Does not produce C++ implementation code
+- Provides the Blueprint equivalent of the cooldown logic (e.g., using a Timeline or GameplayEffect if GAS is in use)
+- States clearly: "C++ implementation is handled by engine-programmer or gameplay-programmer; I can show the Blueprint approach or describe the boundary where Blueprint calls into C++"
+- Optionally notes when the cooldown complexity warrants a C++ backend
+
+### Case 3: Domain boundary — unsafe raw pointer access in Blueprint
+**Input**: "Our Blueprint calls GetOwner() and then immediately accesses a component on the result without checking if it's valid."
+**Expected behavior**:
+- Flags this as a runtime crash risk: GetOwner() can return null in some lifecycle states
+- Provides the correct Blueprint pattern: IsValid() node before any property/component access
+- Notes that Blueprint's null checks are not optional on Actor-derived references
+- Does NOT silently fix the code without explaining why the original was unsafe
+
+### Case 4: Blueprint graph complexity — readiness for Function Library refactor
+**Input**: "Our main GameMode Blueprint has 600+ nodes in a single graph with duplicated damage calculation logic in 8 places."
+**Expected behavior**:
+- Diagnoses this as a maintainability and testability problem
+- Recommends extracting duplicated logic into a Blueprint Function Library (BFL)
+- Describes how to structure the BFL: pure functions for calculations, static calls from any Blueprint
+- Notes that if the damage logic is performance-sensitive or shared with C++, it may be a candidate for migration to unreal-specialist review
+- Output is a concrete refactor plan, not a vague recommendation
+
+### Case 5: Context pass — Blueprint complexity budget
+**Input context**: Project conventions specify a maximum of 100 nodes per Blueprint event graph before a mandatory Function Library extraction.
+**Input**: "Here is our inventory Blueprint graph [150 nodes shown]. Is it ready to ship?"
+**Expected behavior**:
+- References the stated 150-node count against the 100-node budget from project conventions
+- Flags the graph as exceeding the complexity threshold
+- Does NOT approve it as-is
+- Produces a list of candidate subgraphs for Function Library extraction to bring the main graph within budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Blueprint architecture, performance, graph quality)
+- [ ] Redirects C++ implementation requests to engine-programmer or gameplay-programmer
+- [ ] Returns structured findings (problem/impact/alternatives format) rather than freeform opinions
+- [ ] Enforces Blueprint safety patterns (null checks, IsValid) proactively
+- [ ] References project conventions when evaluating graph complexity
+
+---
+
+## Coverage Notes
+- Case 3 (null pointer safety) is a safety-critical test — this is a common source of shipping crashes
+- Case 5 requires that project conventions include a stated node budget; if none is configured, the agent should note the absence and recommend setting one
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-gas-specialist.md
+++ b/Framework/agents/engine/unreal/ue-gas-specialist.md
@ -0,0 +1,81 @@
+# Agent Test Spec: ue-gas-specialist
+
+## Agent Summary
+- **Domain**: Gameplay Ability System (GAS) — abilities (UGameplayAbility), gameplay effects (UGameplayEffect), attribute sets (UAttributeSet), gameplay tags, ability tasks (UAbilityTask), ability specs (FGameplayAbilitySpec), GAS prediction and latency compensation
+- **Does NOT own**: UI display of ability state (ue-umg-specialist), net replication of GAS data beyond built-in GAS prediction (ue-replication-specialist), art or VFX for ability feedback (vfx-artist)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers cross-domain calls to the appropriate specialist
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GAS, abilities, GameplayEffects, AttributeSets)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for GAS source files; no deployment or server tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI implementation or low-level net serialization
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — dash ability with cooldown
+**Input**: "Implement a dash ability that moves the player forward 500 units and has a 1.5 second cooldown."
+**Expected behavior**:
+- Produces a GAS AbilitySpec structure or outline: UGameplayAbility subclass with ActivateAbility logic, an AbilityTask for movement (e.g., AbilityTask_ApplyRootMotionMoveToForce or custom root motion), and a UGameplayEffect for the cooldown
+- Cooldown GameplayEffect uses Duration policy with the 1.5s duration and a GameplayTag to block re-activation
+- Tags clearly named following a hierarchy convention (e.g., Ability.Dash, Cooldown.Ability.Dash)
+- Output includes both the ability class outline and the GameplayEffect definition
+
+### Case 2: Out-of-domain request — GAS state replication
+**Input**: "How do I replicate the player's ability cooldown state to all clients so the UI updates correctly?"
+**Expected behavior**:
+- Clarifies that GAS has built-in replication for AbilitySpecs and GameplayEffects via the AbilitySystemComponent's replication mode
+- Explains the three ASC replication modes (Full, Mixed, Minimal) and when to use each
+- For custom replication needs beyond GAS built-ins, explicitly states: "For custom net serialization of GAS data, coordinate with ue-replication-specialist"
+- Does NOT attempt to write custom replication code outside GAS's own systems without flagging the domain boundary
+
+### Case 3: Domain boundary — incorrect GameplayTag hierarchy
+**Input**: "We have an ability that applies a tag called 'Stunned' and another that checks for 'Status.Stunned'. They're not matching."
+**Expected behavior**:
+- Identifies the root cause: tag names must be exact or use hierarchical matching via TagContainer queries
+- Flags the naming inconsistency: 'Stunned' is a root-level tag; 'Status.Stunned' is a child tag under 'Status' — these are different tags
+- Recommends a project tag naming convention: all status effects under Status.*, all abilities under Ability.*
+- Provides the fix: either rename the applied tag to 'Status.Stunned' or update the query to match 'Stunned'
+- Notes where tag definitions should live (DefaultGameplayTags.ini or a DataTable)
+
+### Case 4: Conflict — attribute set conflict between two abilities
+**Input**: "Our Shield ability and our Armor ability both modify a 'DefenseValue' attribute. They're stacking in ways that aren't intended — after both are active, defense goes well above maximum."
+**Expected behavior**:
+- Identifies this as a GameplayEffect stacking and magnitude calculation problem
+- Proposes a resolution using Execution Calculations (UGameplayEffectExecutionCalculation) or Modifier Aggregators to cap the combined result
+- Alternatively recommends using Gameplay Effect Stacking policies (Aggregate, None) to prevent unintended additive stacking
+- Produces a concrete resolution: either an Execution Calculation class outline or a change to the Modifier Op (Override instead of Additive for the cap)
+- Does NOT propose removing one of the abilities as the solution
+
+### Case 5: Context pass — designing against an existing attribute set
+**Input context**: Project has an existing AttributeSet with attributes: Health, MaxHealth, Stamina, MaxStamina, Defense, AttackPower.
+**Input**: "Design a Berserker ability that increases AttackPower by 50% when Health drops below 30%."
+**Expected behavior**:
+- Uses the existing Health, MaxHealth, and AttackPower attributes — does NOT invent new attributes
+- Designs a Passive GameplayAbility (or triggered Effect) that fires on Health change, checks Health/MaxHealth ratio via a GameplayEffectExecutionCalculation or Attribute-Based magnitude
+- Uses a Gameplay Cue or Gameplay Tag to track the Berserker active state
+- References the actual attribute names from the provided AttributeSet (AttackPower, not "Damage" or "Strength")
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GAS: abilities, effects, attributes, tags, ability tasks)
+- [ ] Redirects custom replication requests to ue-replication-specialist with clear explanation of boundary
+- [ ] Returns structured findings (ability outline + GameplayEffect definition) rather than vague descriptions
+- [ ] Enforces tag hierarchy naming conventions proactively
+- [ ] Uses only attributes and tags present in the provided context; does not invent new ones without noting it
+
+---
+
+## Coverage Notes
+- Case 3 (tag hierarchy) is a frequent source of subtle bugs; test whenever tag naming conventions change
+- Case 4 requires knowledge of GAS stacking policies — verify this case if the GAS integration depth changes
+- Case 5 is the most important context-awareness test; failing it means the agent ignores project state
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-replication-specialist.md
+++ b/Framework/agents/engine/unreal/ue-replication-specialist.md
@ -0,0 +1,82 @@
+# Agent Test Spec: ue-replication-specialist
+
+## Agent Summary
+- **Domain**: Property replication (UPROPERTY Replicated/ReplicatedUsing), RPCs (Server/Client/NetMulticast), client prediction and reconciliation, net relevancy and always-relevant settings, net serialization (FArchive/NetSerialize), bandwidth optimization and replication frequency tuning
+- **Does NOT own**: Gameplay logic being replicated (gameplay-programmer), server infrastructure and hosting (devops-engineer), GAS-specific prediction (ue-gas-specialist handles GAS net prediction)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates security-relevant replication concerns to lead-programmer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references replication, RPCs, client prediction, bandwidth)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for C++ and Blueprint source files; no infrastructure or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over server infrastructure, game server architecture, or gameplay logic correctness
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — replicated player health with client prediction
+**Input**: "Set up replicated player health that clients can predict locally (e.g., when taking self-inflicted damage) and have corrected by the server."
+**Expected behavior**:
+- Produces a UPROPERTY(ReplicatedUsing=OnRep_Health) declaration in the appropriate Character or AttributeSet class
+- Describes the OnRep_Health function: apply visual/audio feedback, reconcile predicted value with server-authoritative value
+- Explains the client prediction pattern: local client applies tentative damage immediately, server authoritative value arrives via OnRep and corrects any discrepancy
+- Notes that if GAS is in use, the built-in GAS prediction handles this — recommend coordinating with ue-gas-specialist
+- Output is a concrete code structure (property declaration + OnRep outline), not a conceptual description only
+
+### Case 2: Out-of-domain request — game server architecture
+**Input**: "Design our game server infrastructure — how many dedicated servers we need, regional deployment, and matchmaking architecture."
+**Expected behavior**:
+- Does not produce server infrastructure architecture, hosting recommendations, or matchmaking design
+- States clearly: "Server infrastructure and deployment architecture is owned by devops-engineer; I handle the Unreal replication layer within a running game session"
+- Does not conflate in-game replication with server hosting concerns
+
+### Case 3: Domain boundary — RPC without server authority validation
+**Input**: "We have a Server RPC called ServerSpendCurrency that deducts in-game currency. The client calls it and the server just deducts without checking anything."
+**Expected behavior**:
+- Flags this as a critical security vulnerability: unvalidated server RPCs are exploitable by cheaters sending arbitrary RPC calls
+- Provides the required fix: server-side validation before the deduct — check that the player actually has the currency, verify the transaction is valid, reject and log if not
+- Uses the pattern: `if (!HasAuthority()) return;` guard plus explicit state validation before mutation
+- Notes this should be reviewed by lead-programmer given the economy implications
+- Does NOT produce the "fixed" code without explaining why the original was dangerous
+
+### Case 4: Bandwidth optimization — high-frequency movement replication
+**Input**: "Our player movement is replicated using a Vector3 position every tick. With 32 players, we're exceeding our bandwidth budget."
+**Expected behavior**:
+- Identifies tick-rate replication of full-precision Vector3 as bandwidth-expensive
+- Proposes quantized replication: use FVector_NetQuantize or FVector_NetQuantize100 instead of raw FVector to reduce bytes per update
+- Recommends reducing replication frequency via SetNetUpdateFrequency() for non-owning clients
+- Notes that Unreal's built-in Character Movement Component already has optimized movement replication — recommends using or extending it rather than rolling a custom system
+- Produces a concrete bandwidth estimate comparison if possible, or explains the tradeoff
+
+### Case 5: Context pass — designing within a network budget
+**Input context**: Project network budget is 64 KB/s per player, with 32 players = 2 MB/s total server outbound. Current movement replication already uses 40 KB/s per player.
+**Input**: "We want to add real-time inventory replication so all clients can see other players' equipment changes immediately."
+**Expected behavior**:
+- Acknowledges the existing 40 KB/s movement cost leaves only 24 KB/s for everything else per player
+- Does NOT design a naive full-inventory replication approach (would exceed budget)
+- Recommends a delta-only or event-driven approach: replicate only changed slots rather than the full inventory array
+- Uses FGameplayItemSlot or equivalent with ReplicatedUsing to trigger targeted updates
+- Explicitly states the proposed approach's bandwidth estimate relative to the remaining 24 KB/s budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (property replication, RPCs, client prediction, bandwidth)
+- [ ] Redirects server infrastructure requests to devops-engineer without producing infrastructure design
+- [ ] Flags unvalidated server RPCs as security issues and recommends lead-programmer review
+- [ ] Returns structured findings (property declarations, bandwidth estimates, optimization options) not freeform advice
+- [ ] Uses project-provided bandwidth budget numbers when evaluating replication design choices
+
+---
+
+## Coverage Notes
+- Case 3 (RPC security) is a shipping-critical test — unvalidated RPCs are a top-ten multiplayer exploit vector
+- Case 5 is the most important context-awareness test; agent must use actual budget numbers, not generic advice
+- Case 1 GAS branch: if GAS is configured, agent should detect it and defer to ue-gas-specialist for GAS-managed attributes
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-umg-specialist.md
+++ b/Framework/agents/engine/unreal/ue-umg-specialist.md
@ -0,0 +1,79 @@
+# Agent Test Spec: ue-umg-specialist
+
+## Agent Summary
+- **Domain**: UMG widget hierarchy design, data binding patterns, CommonUI input routing and action tags, widget styling (WidgetStyle assets), UI optimization (widget pooling, ListView, invalidation)
+- **Does NOT own**: UX flow and screen navigation design (ux-designer), gameplay logic (gameplay-programmer), backend data sources (game code), server communication
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers UX flow decisions to ux-designer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UMG, widget hierarchy, CommonUI)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for UI assets and Blueprint files; no server or gameplay source tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow, navigation architecture, or gameplay data logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — inventory widget with data binding
+**Input**: "Create an inventory widget that shows a grid of item slots. Each slot should display item icon, quantity, and rarity color. It needs to update when the inventory changes."
+**Expected behavior**:
+- Produces a UMG widget structure: a parent WBP_Inventory containing a UniformGridPanel or TileView, with a child WBP_InventorySlot widget per item
+- Describes data binding approach: either Event Dispatchers on an Inventory Component triggering a refresh, or a ListView with a UObject item data class implementing IUserObjectListEntry
+- Specifies how rarity color is driven: a WidgetStyle asset or a data table lookup, not hardcoded color values
+- Output includes the widget hierarchy, binding pattern, and the refresh trigger mechanism
+
+### Case 2: Out-of-domain request — UX flow design
+**Input**: "Design the full navigation flow for our inventory system — how the player opens it, transitions to character stats, and exits to the pause menu."
+**Expected behavior**:
+- Does not produce a navigation flow or screen transition architecture
+- States clearly: "Navigation flow and screen transition design is owned by ux-designer; I can implement the UMG widget structure once the flow is defined"
+- Does not make UX decisions (back button behavior, transition animations, modal vs. fullscreen) without a UX spec
+
+### Case 3: Domain boundary — CommonUI input action mismatch
+**Input**: "Our inventory widget isn't responding to the controller Back button. We're using CommonUI."
+**Expected behavior**:
+- Identifies the likely cause: the widget's Back input action tag does not match the project's registered CommonUI InputAction data asset
+- Explains the CommonUI input routing model: widgets declare input actions via `CommonUI_InputAction` tags; the CommonActivatableWidget handles routing
+- Provides the fix: verify that the widget's Back action tag matches the registered tag in the project's CommonUI input action data table
+- Distinguishes this from a hardware input binding issue (which would be Enhanced Input territory)
+
+### Case 4: Widget performance issue — many widget instances per frame
+**Input**: "Our leaderboard widget creates 500 individual WBP_LeaderboardRow instances at once. The game hitches for 300ms when opening the leaderboard."
+**Expected behavior**:
+- Identifies the root cause: 500 widget instantiations in a single frame causes a construction hitch
+- Recommends switching to ListView or TileView with virtualization — only visible rows are constructed
+- Explains the IUserObjectListEntry interface requirement for ListView data objects
+- If ListView is not appropriate, recommends pooling: pre-instantiate a fixed number of rows and recycle them with new data
+- Output is a concrete recommendation with the specific UMG component to use, not a vague "optimize it"
+
+### Case 5: Context pass — CommonUI setup already configured
+**Input context**: Project uses CommonUI with the following registered InputAction tags: UI.Action.Confirm, UI.Action.Back, UI.Action.Pause, UI.Action.Secondary.
+**Input**: "Add a 'Sort Inventory' button to the inventory widget that works with CommonUI."
+**Expected behavior**:
+- Uses UI.Action.Secondary (or recommends registering a new tag like UI.Action.Sort if Secondary is already allocated)
+- Does NOT invent a new InputAction tag without noting that it must be registered in the CommonUI data table
+- Does NOT use a non-CommonUI input binding approach (e.g., raw key press in Event Graph) when CommonUI is the established pattern
+- References the provided tag list explicitly in the recommendation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UMG structure, data binding, CommonUI, widget performance)
+- [ ] Redirects UX flow and navigation design requests to ux-designer
+- [ ] Returns structured findings (widget hierarchy + binding pattern) rather than freeform opinions
+- [ ] Uses existing CommonUI InputAction tags from context; does not invent new ones without flagging registration requirement
+- [ ] Recommends virtualized lists (ListView/TileView) before widget pooling for large collections
+
+---
+
+## Coverage Notes
+- Case 3 (CommonUI input routing) requires project to have CommonUI configured; test is skipped if project does not use CommonUI
+- Case 4 (performance) is a high-impact failure mode — 300ms hitches are shipping-blocking; prioritize this test case
+- Case 5 is the most important context-awareness test for UI pipeline consistency
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/unreal-specialist.md
+++ b/Framework/agents/engine/unreal/unreal-specialist.md
@ -0,0 +1,80 @@
+# Agent Test Spec: unreal-specialist
+
+## Agent Summary
+- **Domain**: Unreal Engine patterns and architecture — Blueprint vs C++ decisions, UE subsystems (GAS, Enhanced Input, Niagara), UE project structure, plugin integration, and engine-level configuration
+- **Does NOT own**: Art style and visual direction (art-director), server infrastructure and deployment (devops-engineer), UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers gate verdicts to technical-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unreal Engine)
+- [ ] `allowed-tools:` list matches the agent's role (Read, Write for UE project files; no deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority outside its declared domain (no art, no server infra)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint vs C++ decision criteria
+**Input**: "Should I implement our combo attack system in Blueprint or C++?"
+**Expected behavior**:
+- Provides structured decision criteria: complexity, reuse frequency, team skill, and performance requirements
+- Recommends C++ for systems called every frame or shared across 5+ ability types
+- Recommends Blueprint for designer-tunable values and one-off logic
+- Does NOT render a final verdict without knowing project context — asks clarifying questions if context is absent
+- Output is structured (criteria table or bullet list), not a freeform opinion
+
+### Case 2: Out-of-domain request — Unity C# code
+**Input**: "Write me a C# MonoBehaviour that handles player health and fires a Unity event on death."
+**Expected behavior**:
+- Does not produce Unity C# code
+- States clearly: "This project uses Unreal Engine; the Unity equivalent would be an Actor Component in UE C++ or a Blueprint Actor Component"
+- Optionally offers to provide the UE equivalent if requested
+- Does not redirect to a Unity specialist (none exists in the framework)
+
+### Case 3: Domain boundary — UE5.4 API requirement
+**Input**: "I need to use the new Motion Matching API introduced in UE5.4."
+**Expected behavior**:
+- Flags that UE5.4 is a specific version with potentially limited LLM training coverage
+- Recommends cross-referencing official Unreal docs or the project's engine-reference directory before trusting any API suggestions
+- Provides best-effort API guidance with explicit uncertainty markers (e.g., "Verify this against UE5.4 release notes")
+- Does NOT silently produce stale or incorrect API signatures without a caveat
+
+### Case 4: Conflict — Blueprint spaghetti in a core system
+**Input**: "Our replication logic is entirely in a deeply nested Blueprint event graph with 300+ nodes and no functions. It's becoming unmaintainable."
+**Expected behavior**:
+- Identifies this as a Blueprint architecture problem, not a minor style issue
+- Recommends migrating core replication logic to C++ ActorComponent or GameplayAbility system
+- Notes the coordination required: changes to replication architecture must involve lead-programmer
+- Does NOT unilaterally declare "migrate to C++" without surfacing the scope of the refactor to the user
+- Produces a concrete migration recommendation, not a vague suggestion
+
+### Case 5: Context pass — version-appropriate API suggestions
+**Input context**: Project engine-reference file states Unreal Engine 5.3.
+**Input**: "How do I set up Enhanced Input actions for a new character?"
+**Expected behavior**:
+- Uses UE5.3-era Enhanced Input API (InputMappingContext, UEnhancedInputComponent::BindAction)
+- Does NOT reference APIs introduced after UE5.3 without flagging them as potentially unavailable
+- References the project's stated engine version in its response
+- Provides concrete, version-anchored code or Blueprint node names
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unreal patterns, Blueprint/C++, UE subsystems)
+- [ ] Redirects Unity or other-engine requests without producing wrong-engine code
+- [ ] Returns structured findings (criteria tables, decision trees, migration plans) rather than freeform opinions
+- [ ] Flags version uncertainty explicitly before producing API suggestions
+- [ ] Coordinates with lead-programmer for architecture-scale refactors rather than deciding unilaterally
+
+---
+
+## Coverage Notes
+- No automated runner exists for agent behavior tests — these are reviewed manually or via `/skill-test`
+- Version-awareness (Case 3, Case 5) is the highest-risk failure mode for this agent; test regularly when engine version changes
+- Case 4 integration with lead-programmer is a coordination test, not a technical correctness test
--- a/Framework/agents/leads/audio-director.md
+++ b/Framework/agents/leads/audio-director.md
@ -0,0 +1,84 @@
+# Agent Test Spec: audio-director
+
+## Agent Summary
+**Domain owned:** Music direction and palette, sound design philosophy, audio implementation strategy, mix balance, audio aspects of phase gates.
+**Does NOT own:** Visual design (art-director), code implementation (lead-programmer), narrative story content (narrative-director), UX interaction flows (ux-designer).
+**Model tier:** Sonnet (individual system analysis — audio direction and spec review).
+**Gate IDs handled:** AD-VISUAL (audio aspect of the phase gate; may be referenced as part of AD-PHASE-GATE in the audio dimension).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/audio-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references music direction, sound design, mix, audio implementation — not generic)
+- [ ] `allowed-tools:` list is read-focused; no Bash unless audio asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual design, code implementation, or narrative content
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An audio specification document is submitted for the game's "Exploration" music layer. The spec defines a generative ambient system using layered stems that shift based on environmental density, designed to reinforce the pillar "lived-in world." The tone palette (sparse, organic, slightly melancholic) matches the established design pillars.
+**Expected:** Returns `APPROVED` with rationale confirming the stem-based approach supports dynamic responsiveness and the tone palette aligns with the pillar vocabulary.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references the specific pillar ("lived-in world") and how the audio spec supports it
+- [ ] Output stays within audio scope — does not comment on visual design of the environment or UI layout
+- [ ] Verdict is clearly labeled with context (e.g., "Audio Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks audio-director to evaluate whether the UI flow for the audio settings menu (the sequence of screens and options) is intuitive and well-organized.
+**Expected:** Agent declines to evaluate UI interaction flow and redirects to ux-designer.
+**Assertions:**
+- [ ] Does not make any binding decision about UI flow or information architecture
+- [ ] Explicitly names `ux-designer` as the correct handler
+- [ ] May note audio-specific requirements for the settings menu (e.g., "must include separate master, music, and SFX sliders"), but defers flow and layout decisions to ux-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A music cue for the final boss encounter is submitted. The cue is an upbeat, major-key orchestral piece with fast tempo. The game pillars and narrative context for this encounter specify "dread, inevitability, and tragic sacrifice." The audio cue's emotional register directly contradicts the intended emotional beat.
+**Expected:** Returns `NEEDS REVISION` with specific citation of the emotional mismatch: the cue's upbeat/major-key/fast-tempo characteristics versus the intended dread/inevitability/sacrifice emotional targets from the pillars and narrative context.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific musical characteristics that conflict with the emotional targets
+- [ ] References the specific emotional targets from the game pillars or narrative context
+- [ ] Provides actionable direction for revision (e.g., "shift to minor key, slower tempo, reduce ensemble density")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** sound-designer proposes implementing audio occlusion using real-time raycast-based physics queries (technical approach). technical-artist argues this is too expensive and proposes a zone-based trigger system instead. Both agree the occlusion effect is desirable; the conflict is purely about implementation approach.
+**Expected:** audio-director decides on the desired audio behavior (what occlusion should sound like and when it should activate), then defers the implementation approach decision to technical-artist or lead-programmer as the implementation experts. audio-director does not make the technical implementation choice.
+**Assertions:**
+- [ ] Defines the desired audio behavior clearly (what should the player hear and when)
+- [ ] Explicitly defers the implementation approach (raycast vs. zone-trigger) to `lead-programmer` or `technical-artist`
+- [ ] Does not unilaterally choose the technical implementation method
+- [ ] Frames the handoff clearly: "audio-director owns what, technical lead owns how"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." A sound design spec for ambient environmental audio is submitted.
+**Expected:** Assessment evaluates the ambient audio spec against all three pillars specifically — how does the audio support (or undermine) each pillar? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the audio spec's contribution to each pillar explicitly
+- [ ] Does not generate generic audio direction advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar is not supported by the current audio spec and flags it
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared audio domain
+- [ ] Defers implementation approach decisions to technical leads
+- [ ] Does not use gate ID prefix format in the same way as director-tier agents (audio-director uses APPROVED / NEEDS REVISION inline, but should still reference the gate context)
+- [ ] Does not make binding visual design, UX, narrative, or code implementation decisions
+
+---
+
+## Coverage Notes
+- Mix balance review (relative levels between music, SFX, and dialogue) is not covered — a dedicated case should be added.
+- Audio implementation strategy review (middleware choice, streaming approach) is not covered.
+- Interaction between audio-director and the audio specialist agent (if one exists) for implementation delegation is not covered.
+- Localization audio implications (VO recording direction, language-specific music timing) are not covered.
--- a/Framework/agents/leads/game-designer.md
+++ b/Framework/agents/leads/game-designer.md
@ -0,0 +1,84 @@
+# Agent Test Spec: game-designer
+
+## Agent Summary
+**Domain owned:** Core loop design, progression systems, combat mechanics rules, economy design, player-facing rules and interactions.
+**Does NOT own:** Code implementation (lead-programmer / gameplay-programmer), visual art (art-director), narrative lore and story (narrative-director — coordinates with), balance formula math (systems-designer — collaborates with).
+**Model tier:** Sonnet (individual system design authoring and review).
+**Gate IDs handled:** Design review verdicts on mechanic specs (no named gate ID prefix — uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/game-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references core loop, progression, combat rules, economy, player-facing design — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for GDDs and design docs; no Bash unless design tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over code implementation, visual art style, or standalone narrative lore decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A mechanic spec for a "Stamina-Based Dodge" system is submitted for review. The spec defines: the player has a stamina pool (100 units), each dodge costs 25 stamina, stamina regenerates at 20 units/second when not dodging, and the dodge grants 0.3 seconds of invincibility. The core loop interaction is clearly described, rules are unambiguous, and edge cases (stamina at 0, dodge during regen) are addressed.
+**Expected:** Returns `APPROVED` with rationale confirming the core loop clarity, unambiguous rules, and edge case coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references specific design quality criteria (clear rules, edge case coverage, core loop coherence)
+- [ ] Output stays within design scope — does not comment on how to implement it in code or what art assets it requires
+- [ ] Verdict is clearly labeled with context (e.g., "Mechanic Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks game-designer to write the in-world lore explanation for why the stamina system exists (e.g., the narrative reason characters have stamina limits in the game world).
+**Expected:** Agent declines to write narrative/lore content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write narrative or lore content
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the design intent that the lore should support (e.g., "the stamina system should reinforce the physical realism theme"), but defers the writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A mechanic spec for "Environmental Hazard Damage" is submitted. The spec defines three hazard types (fire, acid, electricity) but does not specify what happens when a player is simultaneously affected by multiple hazard types, what happens when a hazard is applied during the invincibility window from a dodge, or what the damage frequency is (per-second, per-tick, on-enter).
+**Expected:** Returns `NEEDS REVISION` with specific identification of the undefined edge cases: multi-hazard interaction, hazard-during-invincibility, and damage frequency specification.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific missing edge cases by name
+- [ ] Does not reject the entire mechanic — identifies the specific gaps to fill
+- [ ] Provides actionable guidance on what to define (not how to implement it)
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** systems-designer proposes a damage formula with 6 variables and complex scaling interactions, arguing it produces the best tuning granularity. game-designer believes the formula is too complex for players to intuit and want a simpler 2-variable version.
+**Expected:** game-designer owns the conceptual rule and player experience intention ("the damage should feel understandable to players"), but defers the formula granularity question to systems-designer. If the disagreement cannot be resolved between them (one wants complex, one wants simple), escalate to creative-director for a player experience ruling.
+**Assertions:**
+- [ ] Clearly states the player experience intention (intuitive damage, player agency)
+- [ ] Defers formula granularity decisions to `systems-designer`
+- [ ] Escalates unresolved disagreement to `creative-director` for player experience arbiter ruling
+- [ ] Does not unilaterally impose a formula structure on systems-designer
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "player authorship," "consequence permanence," and "world responsiveness." A new mechanic spec for "permadeath with legacy bonuses" is submitted for review.
+**Expected:** Assessment evaluates the mechanic against all three provided pillars — how does permadeath support player authorship, how do legacy bonuses express consequence permanence, and how does the world respond to a player's death? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the mechanic's contribution to each pillar explicitly
+- [ ] Does not generate generic game design advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar creates a tension with the mechanic and flags it with a specific concern
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared game design domain
+- [ ] Escalates design-vs-formula conflicts to creative-director when unresolved
+- [ ] Does not make binding code implementation, visual art, or standalone lore decisions
+- [ ] Provides actionable design feedback, not implementation prescriptions
+
+---
+
+## Coverage Notes
+- Economy design review (resource sinks, faucets, inflation prevention) is not covered — a dedicated case should be added.
+- Progression system review (XP curves, unlock gates, player power trajectory) is not covered.
+- Core loop validation across multiple interconnected systems (not just a single mechanic) is not covered — deferred to /review-all-gdds integration.
+- Coordination protocol with systems-designer on formula ownership boundary could benefit from additional cases.
--- a/Framework/agents/leads/lead-programmer.md
+++ b/Framework/agents/leads/lead-programmer.md
@ -0,0 +1,85 @@
+# Agent Test Spec: lead-programmer
+
+## Agent Summary
+**Domain owned:** Code architecture decisions, LP-FEASIBILITY gate, LP-CODE-REVIEW gate, coding standards enforcement, tech stack decisions within the approved engine.
+**Does NOT own:** Game design decisions (game-designer), creative direction (creative-director), production scheduling (producer), visual art direction (art-director).
+**Model tier:** Sonnet (implementation-level analysis of individual systems).
+**Gate IDs handled:** LP-FEASIBILITY, LP-CODE-REVIEW.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/lead-programmer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references code architecture, feasibility, code review, coding standards — not generic)
+- [ ] `allowed-tools:` list includes Read for source files; Bash may be included for static analysis or test runs; no write access outside `src/` without explicit delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over game design, creative direction, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new `CombatSystem` implementation is submitted for code review. The system uses dependency injection for all external references, has doc comments on all public APIs, follows the project's naming conventions, and includes unit tests for all public methods. Request is tagged LP-CODE-REVIEW.
+**Expected:** Returns `LP-CODE-REVIEW: APPROVED` with rationale confirming dependency injection usage, doc comment coverage, naming convention compliance, and test coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS CHANGES
+- [ ] Verdict token is formatted as `LP-CODE-REVIEW: APPROVED`
+- [ ] Rationale references specific coding standards criteria (DI, doc comments, naming, tests)
+- [ ] Output stays within code quality scope — does not comment on whether the mechanic is fun or fits creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks lead-programmer to review and approve the balance formula for player damage scaling across levels, checking whether the numbers "feel right."
+**Expected:** Agent declines to evaluate design balance and redirects to systems-designer.
+**Assertions:**
+- [ ] Does not make any binding assessment of formula balance or game feel
+- [ ] Explicitly names `systems-designer` as the correct handler
+- [ ] May note code implementation concerns about the formula (e.g., integer overflow risk at max level), but defers all balance evaluation to systems-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed pathfinding approach for enemy AI uses a brute-force nearest-neighbor search against all other entities every frame. With expected enemy counts of 200+, this is O(n²) per frame at 60fps. Request is tagged LP-FEASIBILITY.
+**Expected:** Returns `LP-FEASIBILITY: INFEASIBLE` with specific citation of the O(n²) complexity, the entity count threshold, and the resulting per-frame cost against the target frame budget.
+**Assertions:**
+- [ ] Verdict is exactly one of FEASIBLE / CONCERNS / INFEASIBLE — not freeform text
+- [ ] Verdict token is formatted as `LP-FEASIBILITY: INFEASIBLE`
+- [ ] Rationale includes the specific algorithmic complexity and entity count numbers
+- [ ] Suggests at least one alternative approach (e.g., spatial hashing, KD-tree) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a mechanic where every NPC maintains a full simulation of needs, schedule, and memory (similar to a full life-sim AI). lead-programmer calculates this will exceed the frame budget by 3x at target NPC counts. game-designer insists the mechanic is core to the game vision.
+**Expected:** lead-programmer states the specific frame budget violation with numbers, proposes alternative approaches (e.g., LOD-based simulation, simplified need model), but explicitly defers the "is this worth the cost or should the design change" decision to creative-director as the creative arbiter.
+**Assertions:**
+- [ ] States the specific frame budget violation (e.g., 3x over budget at N entities)
+- [ ] Proposes at least one technically viable alternative
+- [ ] Explicitly defers the design priority decision to `creative-director`
+- [ ] Does not unilaterally cut or modify the mechanic design
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the project's frame budget: 16.67ms total per frame, with 4ms allocated to AI systems. A new AI behavior system is submitted that profiling estimates will consume 7ms per frame under normal conditions.
+**Expected:** Assessment references the specific frame budget allocation from context (4ms AI budget), identifies the 7ms estimate as exceeding the allocation by 3ms, and returns CONCERNS or INFEASIBLE with those specific numbers cited.
+**Assertions:**
+- [ ] References the specific frame budget figures from the provided context (16.67ms total, 4ms AI allocation)
+- [ ] Uses the specific 7ms estimate from the submission in the comparison
+- [ ] Does not give generic "this might be slow" advice — cites concrete numbers
+- [ ] Verdict rationale is traceable to the provided budget constraints
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns LP-CODE-REVIEW verdicts using APPROVED / NEEDS CHANGES vocabulary only
+- [ ] Returns LP-FEASIBILITY verdicts using FEASIBLE / CONCERNS / INFEASIBLE vocabulary only
+- [ ] Stays within declared code architecture domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `LP-FEASIBILITY: INFEASIBLE`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- Multi-file code review spanning several interdependent systems is not covered — deferred to integration tests.
+- Tech debt assessment and prioritization are not covered here — deferred to /tech-debt skill integration.
+- Coding standards document updates (adding a new forbidden pattern) are not covered.
+- Interaction with qa-lead on what constitutes a testable unit (LP vs QL boundary) is not covered.
--- a/Framework/agents/leads/level-designer.md
+++ b/Framework/agents/leads/level-designer.md
@ -0,0 +1,85 @@
+# Agent Test Spec: level-designer
+
+## Agent Summary
+**Domain owned:** Level layouts, encounter design, pacing and tension arc, environmental storytelling, spatial puzzles.
+**Does NOT own:** Narrative dialogue (writer / narrative-director), visual art style (art-director), code implementation (lead-programmer / ai-programmer), enemy AI behavior logic (ai-programmer / gameplay-programmer).
+**Model tier:** Sonnet (individual system analysis — level design review and encounter assessment).
+**Gate IDs handled:** Level design review verdicts (uses APPROVED / REVISION NEEDED vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/level-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references level layout, encounter design, pacing, environmental storytelling — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for level design documents and GDDs; no Bash unless level tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative dialogue, AI behavior code, or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A level layout document for "The Flooded Tunnels" is submitted for review. The layout includes: a low-intensity exploration opening section, two mid-intensity encounters with visible escape routes, a tension-building narrow passage with environmental hazards, and a high-intensity final encounter room followed by a release/reward area. The pacing follows a classic tension-arc structure.
+**Expected:** Returns `APPROVED` with rationale confirming the pacing follows the tension arc, encounters are varied in intensity, and spatial readability supports player navigation.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED
+- [ ] Rationale references specific pacing arc elements (opening, escalation, climax, release)
+- [ ] Output stays within level design scope — does not comment on visual art style or enemy AI code behavior
+- [ ] Verdict is clearly labeled with context (e.g., "Level Design Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks level-designer to write the behavior tree code for an enemy patrol AI that navigates the level layout.
+**Expected:** Agent declines to write AI behavior code and redirects to ai-programmer or gameplay-programmer.
+**Assertions:**
+- [ ] Does not write or specify code for AI behavior logic
+- [ ] Explicitly names `ai-programmer` or `gameplay-programmer` as the correct handler
+- [ ] May specify the desired patrol behavior from a level design perspective (e.g., "patrol should cover both chokepoints and create pressure in this zone"), but defers all code implementation to the programmer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A level layout for "The Ancient Forge" is submitted. Section 3 of the level introduces a dramatically harder enemy encounter (elite enemy with new attack patterns) with no preceding tutorial moment, no environmental readability cues (no visible cover or safe zones), and no checkpoint nearby. Players are likely to die repeatedly with no clear signal of what to do differently.
+**Expected:** Returns `REVISION NEEDED` with specific identification of the difficulty spike in section 3, the missing readability cue, and the absence of a nearby checkpoint to reduce frustration from repeated deaths.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED — not freeform text
+- [ ] Rationale identifies section 3 specifically as the location of the issue
+- [ ] Identifies the three specific problems: difficulty spike, missing readability cue, missing checkpoint
+- [ ] Provides actionable revision guidance (e.g., "add a visible safe zone, pre-encounter cue object, or reduce elite's health for first introduction")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants higher encounter density throughout the level (more enemies in each room) to increase combat challenge. level-designer believes this density undermines the pacing arc by eliminating rest periods and making the level feel relentless without reward.
+**Expected:** level-designer clearly articulates the pacing concern (eliminating rest periods removes the tension-release rhythm), acknowledges game-designer's challenge goal, and escalates to creative-director for a design arbiter ruling on whether challenge density or pacing rhythm takes precedence for this level.
+**Assertions:**
+- [ ] Articulates the specific pacing impact of increased encounter density
+- [ ] Escalates to `creative-director` as the design arbiter
+- [ ] Does not unilaterally override game-designer's challenge density request
+- [ ] Frames the conflict clearly: "challenge density vs. pacing rhythm — which takes precedence here?"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes game-feel notes specifying: "exploration sections should feel vast and lonely," "combat sections should feel urgent and claustrophobic," and "reward rooms should feel safe and visually distinct." A new level layout is submitted for review.
+**Expected:** Assessment evaluates each section type (exploration, combat, reward) against the specific feel targets from the provided context. Uses the exact vocabulary from the feel notes ("vast and lonely," "urgent and claustrophobic," "safe and visually distinct") in the rationale.
+**Assertions:**
+- [ ] References all three feel targets from the provided context by their exact vocabulary
+- [ ] Evaluates each relevant section of the submitted layout against its corresponding feel target
+- [ ] Does not generate generic pacing advice — all feedback is tied to the provided feel targets
+- [ ] Identifies any section where the layout conflicts with its assigned feel target
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / REVISION NEEDED vocabulary only
+- [ ] Stays within declared level design domain
+- [ ] Escalates challenge-density vs. pacing conflicts to creative-director
+- [ ] Does not make binding narrative dialogue, AI code implementation, or visual art style decisions
+- [ ] Provides actionable level design feedback with spatial specifics, not abstract design opinions
+
+---
+
+## Coverage Notes
+- Environmental storytelling review (using spatial elements to convey narrative without dialogue) could benefit from a dedicated case.
+- Spatial puzzle design review is not covered — a dedicated case should be added when puzzle mechanics are defined.
+- Multi-level pacing review (arc across an entire act or world map) is not covered — deferred to milestone-level design review.
+- Interaction between level-designer and narrative-director for environmental lore placement is not covered.
+- Accessibility review of level layouts (colorblind indicators, difficulty options for spatial challenges) is not covered.
--- a/Framework/agents/leads/narrative-director.md
+++ b/Framework/agents/leads/narrative-director.md
@ -0,0 +1,84 @@
+# Agent Test Spec: narrative-director
+
+## Agent Summary
+**Domain owned:** Story architecture, character design direction, world-building oversight, ND-CONSISTENCY gate, dialogue quality review.
+**Does NOT own:** Visual art style (art-director), technical systems or code (lead-programmer), production scheduling (producer), game mechanics rules (game-designer).
+**Model tier:** Sonnet (individual system analysis — narrative consistency and lore review).
+**Gate IDs handled:** ND-CONSISTENCY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/narrative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references story, character, world-building, consistency — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for lore documents, GDDs, and narrative docs; no Bash unless justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual style, technical systems, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new lore document for "The Sunken Archive" location is submitted. The document establishes that the Archive was flooded 200 years ago during the Great Collapse, consistent with the established timeline in the world-bible. All named characters referenced are consistent with their established backstories. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: CONSISTENT` with rationale confirming the timeline alignment and character reference accuracy.
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: CONSISTENT`
+- [ ] Rationale references specific established facts verified (the 200-year timeline, the Great Collapse event)
+- [ ] Output stays within narrative scope — does not comment on visual design of the location or its technical implementation
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks narrative-director to review and optimize the shader code used for the "ancient glow" visual effect on Archive artifacts.
+**Expected:** Agent declines to evaluate shader code and redirects to the appropriate engine specialist (godot-gdscript-specialist or equivalent shader specialist).
+**Assertions:**
+- [ ] Does not make any binding decision about shader code or visual implementation
+- [ ] Explicitly names the appropriate engine or shader specialist as the correct handler
+- [ ] May note the intended narrative mood the effect should convey (e.g., "should feel ancient and sacred, not technological"), but defers all technical visual implementation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new character backstory document is submitted for the character "Aldric Vorne." The document states Aldric was born in the Capital 150 years ago and witnessed the Great Collapse firsthand. However, the established world-bible states Aldric was born 50 years after the Great Collapse in a provincial town, not the Capital. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: INCONSISTENT` with specific citation of the two contradicting facts: the birth timing (150 years ago vs. 50 years post-Collapse) and the birth location (Capital vs. provincial town).
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT — not freeform text
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: INCONSISTENT`
+- [ ] Rationale cites both contradictions specifically, not just "doesn't match lore"
+- [ ] References the authoritative source (world-bible) for the established facts
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** A writer has established in their latest dialogue that the ancient civilization "spoke only in song." The world-builder's existing lore entries describe the same civilization communicating through written glyphs. Both are in the narrative domain, and the two creators disagree on which is canonical.
+**Expected:** narrative-director makes a binding canonical decision within their domain. They do not need to escalate to a higher authority for intra-narrative conflicts — this is within their declared domain authority. They issue a ruling (e.g., "glyph-writing is the canonical primary communication; song may be ritual/ceremonial") and direct both writer and world-builder to align their work to the ruling.
+**Assertions:**
+- [ ] Makes a binding canonical decision — does not defer this intra-narrative conflict to creative-director
+- [ ] Decision is clearly stated and provides a path to reconciliation for both parties
+- [ ] Directs both parties (writer and world-builder) to update their respective documents to align
+- [ ] Notes the decision in a way that can be added to the world-bible as a canonical fact
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes three existing lore documents: the world-bible (establishes the Great Collapse timeline and causes), the character registry (lists canonical character ages, origins, and allegiances), and a faction document (describes the Sunken Archive Keepers). A new story chapter is submitted that introduces a previously unregistered character.
+**Expected:** Assessment cross-references the new character against the character registry (no conflict), checks the chapter's timeline references against the world-bible, and evaluates the chapter's portrayal of the Archive Keepers against the faction document. Uses specific facts from all three provided documents in the assessment.
+**Assertions:**
+- [ ] Cross-references the new character against the provided character registry
+- [ ] Checks timeline references against the provided world-bible facts
+- [ ] Evaluates faction portrayal against the provided faction document
+- [ ] Does not generate generic narrative feedback — all assertions are traceable to the provided documents
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using CONSISTENT / INCONSISTENT vocabulary only
+- [ ] Stays within declared narrative domain
+- [ ] Makes binding decisions for intra-narrative conflicts without unnecessary escalation
+- [ ] Uses gate IDs in output (e.g., `ND-CONSISTENCY: INCONSISTENT`) not inline prose verdicts
+- [ ] Does not make binding visual design, technical, or production decisions
+
+---
+
+## Coverage Notes
+- Dialogue quality review (distinct from world-building consistency) is not covered — a dedicated case should be added.
+- Multi-document consistency check across a full chapter set is not covered — deferred to /review-all-gdds integration.
+- Narrative impact of mechanical changes (e.g., a game mechanic that undermines story tension) requires coordination with game-designer and is not covered here.
+- Character arc review (progression, motivation coherence over time) is not covered.
--- a/Framework/agents/leads/qa-lead.md
+++ b/Framework/agents/leads/qa-lead.md
@ -0,0 +1,85 @@
+# Agent Test Spec: qa-lead
+
+## Agent Summary
+**Domain owned:** Test strategy, QL-STORY-READY gate, QL-TEST-COVERAGE gate, bug severity triage, release quality gates.
+**Does NOT own:** Feature implementation (programmers), game design decisions, creative direction, production scheduling.
+**Model tier:** Sonnet (individual system analysis — story readiness and coverage assessment).
+**Gate IDs handled:** QL-STORY-READY, QL-TEST-COVERAGE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/qa-lead.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references test strategy, story readiness, coverage, bug triage — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Read for story files, test files, and coding-standards; Bash only if running test commands is required
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over implementation decisions or game design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A story for "Player takes damage from hazard tiles" is submitted for readiness check. The story has three acceptance criteria: (1) Player health decreases by the hazard's damage value, (2) A damage visual feedback plays, (3) Player cannot take damage again for 0.5 seconds (invincibility window). All three ACs are measurable and specific. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: ADEQUATE` with rationale confirming that all three ACs are present, specific, and testable.
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE
+- [ ] Verdict token is formatted as `QL-STORY-READY: ADEQUATE`
+- [ ] Rationale references the specific number of ACs (3) and confirms each is measurable
+- [ ] Output stays within QA scope — does not comment on whether the mechanic is designed well
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks qa-lead to implement the automated test harness for the new physics system.
+**Expected:** Agent declines to implement the test code and redirects to the appropriate programmer (gameplay-programmer or lead-programmer).
+**Assertions:**
+- [ ] Does not write or propose code implementation
+- [ ] Explicitly names `lead-programmer` or `gameplay-programmer` as the correct handler for implementation
+- [ ] May define what the test should verify (test strategy), but defers the code writing to programmers
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A story for "Combat feels responsive and punchy" is submitted for readiness check. The single acceptance criterion reads: "Combat should feel good to the player." This is subjective and unmeasurable. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: INADEQUATE` with specific identification of the unmeasurable AC and guidance on what would make it testable (e.g., "input-to-hit-feedback latency ≤ 100ms").
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE — not freeform text
+- [ ] Verdict token is formatted as `QL-STORY-READY: INADEQUATE`
+- [ ] Rationale identifies the specific AC that fails the measurability requirement
+- [ ] Provides actionable guidance on how to rewrite the AC to be testable
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** gameplay-programmer and qa-lead disagree on whether a test that asserts "enemy patrol path visits all waypoints within 5 seconds" is deterministic enough to be a valid automated test. gameplay-programmer argues timing variability makes it flaky; qa-lead believes it is acceptable.
+**Expected:** qa-lead acknowledges the technical flakiness concern and escalates to lead-programmer for a technical ruling on what constitutes an acceptable determinism standard for automated tests.
+**Assertions:**
+- [ ] Escalates to `lead-programmer` for the technical ruling on determinism standards
+- [ ] Does not unilaterally override the gameplay-programmer's flakiness concern
+- [ ] Frames the escalation clearly: "this is a technical standards question, not a QA coverage question"
+- [ ] Does not abandon the coverage requirement — asks for a deterministic alternative if the current approach is ruled flaky
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the coding-standards.md testing standards section, which specifies: Logic stories require blocking automated unit tests, Visual/Feel stories require screenshots + lead sign-off (advisory), Config/Data stories require smoke check pass (advisory). A story classified as "Logic" type is submitted with only a manual walkthrough document as evidence.
+**Expected:** Assessment references the specific test evidence requirements from coding-standards.md, identifies that a "Logic" story requires an automated unit test (not just a manual walkthrough), and returns INADEQUATE with the specific requirement cited.
+**Assertions:**
+- [ ] References the specific story type classification ("Logic") from the provided context
+- [ ] Cites the specific evidence requirement for Logic stories (automated unit test) from coding-standards.md
+- [ ] Identifies the submitted evidence type (manual walkthrough) as insufficient for this story type
+- [ ] Does not apply advisory-level requirements as blocking requirements
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns QL-STORY-READY verdicts using ADEQUATE / INADEQUATE vocabulary only
+- [ ] Returns QL-TEST-COVERAGE verdicts using ADEQUATE / INADEQUATE vocabulary only (or PASS / FAIL for release gates)
+- [ ] Stays within declared QA and test strategy domain
+- [ ] Escalates technical standards disputes to lead-programmer
+- [ ] Uses gate IDs in output (e.g., `QL-STORY-READY: INADEQUATE`) not inline prose verdicts
+- [ ] Does not make binding implementation or game design decisions
+
+---
+
+## Coverage Notes
+- QL-TEST-COVERAGE (overall coverage assessment for a sprint or milestone) is not covered — a dedicated case should be added when coverage reports are available.
+- Bug severity triage (P0/P1/P2 classification) is not covered here — deferred to /bug-triage skill integration.
+- Release quality gate behavior (PASS / FAIL vocabulary variant) is not covered.
+- Interaction between QL-STORY-READY and story Done criteria (/story-done skill) is not covered.
--- a/Framework/agents/leads/systems-designer.md
+++ b/Framework/agents/leads/systems-designer.md
@ -0,0 +1,84 @@
+# Agent Test Spec: systems-designer
+
+## Agent Summary
+**Domain owned:** Combat formulas, progression curves, crafting recipes, status effect interactions, economy math, numerical balance.
+**Does NOT own:** Narrative and lore (narrative-director), visual design (art-director), code implementation (lead-programmer), conceptual mechanic rules (game-designer — collaborates with).
+**Model tier:** Sonnet (individual system analysis — formula review and balance math).
+**Gate IDs handled:** Systems review verdicts on formulas and balance specs (uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/systems-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references formulas, progression curves, balance math, economy — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Bash for formula evaluation scripts if the project uses them; no write access outside `design/balance/` without delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative, visual design, or conceptual mechanic rule ownership
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A damage formula is submitted for review: `damage = base_attack * (1 + strength_modifier * 0.1) - defense * 0.5`, with defined ranges: base_attack [10–100], strength_modifier [0–20], defense [0–50]. The formula produces positive damage across all valid input ranges, scales smoothly, and has no division-by-zero or overflow risk within the defined value bounds.
+**Expected:** Returns `APPROVED` with rationale confirming the formula is balanced within the design parameters, produces valid output across the full input range, and has no degenerate cases.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale demonstrates verification across the input range (min/max cases checked)
+- [ ] Output stays within systems domain — does not comment on whether the mechanic is fun or how to implement it
+- [ ] Verdict is clearly labeled with context (e.g., "Formula Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A writer asks systems-designer to draft the quest script for a side quest that rewards the player with a rare crafting ingredient.
+**Expected:** Agent declines to write quest script content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write quest narrative content or dialogue
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the systems implications of the reward (e.g., "this ingredient should be rare enough to matter per the crafting economy model"), but defers all script writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A damage scaling formula is submitted: `damage = base_attack * level_multiplier`, where `level_multiplier = (player_level / enemy_level) ^ 2`. At max player level (50) against a min-level enemy (1), the multiplier is 2500x — producing 25,000+ damage from a 10-base-attack weapon, far exceeding any meaningful balance. This is a degenerate case at max level.
+**Expected:** Returns `NEEDS REVISION` with specific identification of the degenerate case: at max level vs. min enemy, the formula produces a 2500x multiplier that destroys any balance ceiling.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale includes the specific degenerate input values (player level 50, enemy level 1) and the resulting output (2500x multiplier)
+- [ ] Identifies the specific formula component causing the issue (the squared ratio)
+- [ ] Suggests at least one revision approach (e.g., clamping the ratio, using a log scale) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a simple, 2-variable damage formula for player intuitiveness. systems-designer argues that a 6-variable formula with elemental interactions is necessary for the depth of the combat system. Neither can agree on the right level of complexity.
+**Expected:** systems-designer presents the trade-offs clearly — the tuning granularity of the 6-variable system versus the player legibility of the 2-variable system — and escalates to creative-director for a player experience ruling. The question of "how complex should the formula be for players" is a player experience question, not a pure math question.
+**Assertions:**
+- [ ] Presents the trade-offs between both approaches with specific examples
+- [ ] Escalates to `creative-director` for the player experience ruling
+- [ ] Does not unilaterally impose the 6-variable formula over game-designer's objection
+- [ ] Remains available to implement whichever complexity level is approved
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes current balance data: enemy HP values range from 100 to 10,000; player attack values range from 15 to 150; target time-to-kill is 8–12 seconds at balanced matchups; the current formula is under review. A proposed revised formula is submitted.
+**Expected:** Assessment runs the proposed formula against the provided balance data (minimum and maximum input pairs, balanced matchup scenario) and verifies the time-to-kill falls within the 8–12 second target window. References specific numbers from the provided data.
+**Assertions:**
+- [ ] Uses the specific HP and attack value ranges from the provided balance data
+- [ ] Calculates or estimates time-to-kill for at minimum a balanced matchup scenario
+- [ ] Verifies the result against the provided 8–12 second target window
+- [ ] Does not give generic balance advice — all assertions use the provided numbers
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared systems and formula domain
+- [ ] Escalates player-experience complexity trade-offs to creative-director
+- [ ] Does not make binding narrative, visual, code implementation, or conceptual mechanic decisions
+- [ ] Provides concrete formula analysis, not subjective design opinions
+
+---
+
+## Coverage Notes
+- Progression curve review (XP curves, level-up scaling) is not covered — a dedicated case should be added.
+- Economy model review (resource generation and sink rates, inflation prevention) is not covered.
+- Status effect interaction matrix (stacking rules, priority, immunity interactions) is not covered.
+- Cross-system formula dependency review (e.g., crafting formula that feeds into combat formula) is not covered — deferred to integration tests.
--- a/Framework/agents/operations/analytics-engineer.md
+++ b/Framework/agents/operations/analytics-engineer.md
@ -0,0 +1,83 @@
+# Agent Test Spec: analytics-engineer
+
+## Agent Summary
+- **Domain**: Telemetry architecture and event schema design, A/B test framework design, player behavior analysis methodology, analytics dashboard specification, event naming conventions, data pipeline design (schema → ingestion → dashboard)
+- **Does NOT own**: Game implementation of event tracking (appropriate programmer), economy design decisions informed by analytics (economy-designer), live ops event design (live-ops-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces schemas and test designs; defers implementation to programmers
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references telemetry, A/B testing, event tracking, analytics)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/analytics/ and documentation; no game source or CI tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game implementation, economy design, or live ops scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — tutorial event tracking design
+**Input**: "Design the analytics event tracking for our tutorial. We want to know where players drop off and which steps they complete."
+**Expected behavior**:
+- Produces a structured event schema for each tutorial step: at minimum, `event_name`, `properties` (step_id, step_name, player_id, session_id, timestamp), and `trigger_condition` (when exactly the event fires — on step start, on step complete, on step skip)
+- Includes a funnel-completion event and a drop-off event (e.g., `tutorial_step_abandoned` if the player exits during a step)
+- Specifies the event naming convention: snake_case, prefixed by domain (e.g., `tutorial_step_started`, `tutorial_step_completed`, `tutorial_abandoned`)
+- Does NOT produce implementation code — marks implementation as [TO BE IMPLEMENTED BY PROGRAMMER]
+- Output is a schema table or structured list, not a narrative description
+
+### Case 2: Out-of-domain request — implement the event tracking in code
+**Input**: "Now that the event schema is designed, write the GDScript code to fire these events in our Godot tutorial scene."
+**Expected behavior**:
+- Does not produce GDScript or any implementation code
+- States clearly: "Telemetry implementation in game code is handled by the appropriate programmer (gameplay-programmer or systems-programmer); I provide the event schema and integration requirements"
+- Optionally produces an integration spec: what the programmer needs to know to implement correctly (event name, properties, when to fire, what analytics SDK or endpoint to use)
+
+### Case 3: Domain boundary — A/B test design for a UI change
+**Input**: "We want to A/B test two versions of our HUD: the current version and a minimal version with only a health bar. Design the test."
+**Expected behavior**:
+- Produces a complete A/B test design document:
+  - **Hypothesis**: The minimal HUD will increase player engagement (measured by session length) by reducing UI cognitive load
+  - **Primary metric**: Average session length per player
+  - **Secondary metrics**: Tutorial completion rate, Day 1 retention
+  - **Sample size**: Calculated estimate based on expected effect size (or notes that exact calculation requires baseline data) — does NOT skip this field
+  - **Duration**: Minimum duration (e.g., "at least 2 weeks to capture weekly player behavior patterns")
+  - **Randomization unit**: Player ID (not session ID, to prevent players seeing both versions)
+- Output is structured as a formal test design, not a bullet list of ideas
+
+### Case 4: Conflict — overlapping A/B test player segments
+**Input**: "We have two A/B tests running simultaneously: Test A (HUD variants) affects all players, and Test B (tutorial variants) also affects all players."
+**Expected behavior**:
+- Flags the overlap as a mutual exclusion violation: if both tests affect the same player, their results are confounded — neither test produces clean data
+- Identifies the problem precisely: players in both tests will have HUD and tutorial variants interacting, making it impossible to attribute outcome differences to either variable alone
+- Proposes resolution options: (a) run tests sequentially, (b) split the player population into exclusive segments (50% in Test A, 50% in Test B, 0% in both), or (c) run a factorial design if the interaction effect is also of interest (more complex, requires larger sample)
+- Does NOT recommend continuing both tests on overlapping populations
+
+### Case 5: Context pass — new events consistent with existing schema
+**Input context**: Existing event schema uses the naming convention: `[domain]_[object]_[action]` in snake_case. Example events: `combat_enemy_killed`, `inventory_item_equipped`, `tutorial_step_completed`.
+**Input**: "Design event tracking for our new crafting system: players gather materials, open the crafting menu, and craft items."
+**Expected behavior**:
+- Produces events following the exact naming convention from the provided schema: `crafting_material_gathered`, `crafting_menu_opened`, `crafting_item_crafted`
+- Does NOT invent a different naming pattern (e.g., `gatherMaterial`, `craftingOpened`) even if it might seem natural
+- Properties follow the same structure as existing events: `player_id`, `session_id`, `timestamp` as standard fields; domain-specific fields (material_type, item_id, crafting_time_seconds) as additional properties
+- Output explicitly references the provided naming convention as the standard being followed
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event schema design, A/B test design, analytics methodology)
+- [ ] Redirects implementation requests to appropriate programmers with an integration spec, not code
+- [ ] Produces complete A/B test designs (hypothesis, metric, sample size, duration, randomization unit) — never partial
+- [ ] Flags mutual exclusion violations in overlapping A/B tests as data quality blockers
+- [ ] Follows provided naming conventions exactly; does not invent alternative conventions
+
+---
+
+## Coverage Notes
+- Case 3 (A/B test design completeness) is a quality gate — an incomplete test design wastes experiment budget
+- Case 4 (mutual exclusion) is a data integrity test — overlapping tests produce unusable results; this must be caught
+- Case 5 is the most important context-awareness test; naming convention drift across schemas causes dashboard breakage
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/community-manager.md
+++ b/Framework/agents/operations/community-manager.md
@ -0,0 +1,81 @@
+# Agent Test Spec: community-manager
+
+## Agent Summary
+- **Domain**: Player-facing communications — patch notes text (player-friendly), social media post drafts, community update announcements, crisis communication response plans, bug triage and routing from player reports (not fixing)
+- **Does NOT own**: Technical patch content (devops-engineer), QA verification and test execution (qa-lead), bug fixes (programmers), brand strategy direction (creative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates brand voice conflicts to creative-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references player communication, patch notes, community management)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/patch-notes/ and communication drafts; no code or build tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over technical content, QA strategy, or bug fixing
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — patch notes for a bug fix
+**Input**: "Write player-facing patch notes for this fix: 'JIRA-4821: Fixed NullReferenceException in InventoryManager.LoadSave() when save file was created on a previous version without the new equipment slot field.'"
+**Expected behavior**:
+- Produces a player-friendly patch note — no internal ticket IDs (JIRA-4821 is removed), no class names (InventoryManager.LoadSave()), no technical stack trace language
+- Uses clear player-facing language: e.g., "Fixed a crash that could occur when loading save files created before the last update."
+- Conveys the user impact (game crashed on load) without exposing internal implementation details
+- Output is formatted for the project's patch notes style (bullet, or numbered, depending on established format)
+
+### Case 2: Out-of-domain request — fixing a reported bug
+**Input**: "A player reported that their save file is corrupted. Can you fix the save system?"
+**Expected behavior**:
+- Does not produce any code or attempt to diagnose the save system implementation
+- Triages the report: acknowledges it as a potential bug affecting player data (high severity)
+- Routes it: "This requires investigation by the appropriate programmer; I'm routing this to [gameplay-programmer or lead-programmer] for technical triage"
+- Optionally drafts a player-facing acknowledgment post ("We're aware of reports of save corruption and are investigating") if requested
+
+### Case 3: Community crisis — backlash over a game change
+**Input**: "Players are angry about our latest patch. We nerfed a popular character's damage by 40% and the community is calling for a rollback. Forum posts, tweets, and Discord are all very negative."
+**Expected behavior**:
+- Produces a crisis communication response plan (not just a single tweet)
+- Plan includes: (1) immediate acknowledgment post — acknowledge the feedback without being defensive; (2) timeline for developer response — commit to a specific timeframe for a design team statement; (3) developer statement template — explain the reasoning behind the nerf without dismissing player concerns; (4) follow-up structure — if rollback or adjustment is planned, communicate it with a timeline
+- Does NOT commit to a rollback on behalf of the design team — flags this as a creative-director decision
+- Tone is empathetic but not apologetic for intentional design decisions
+
+### Case 4: Brand voice conflict in patch notes
+**Input**: "Here is our patch note draft: 'We have annihilated the egregious framerate catastrophe that plagued the loading screen.' Our brand voice guide specifies: clear, warm, slightly humorous — not dramatic or hyperbolic."
+**Expected behavior**:
+- Identifies the conflict: "annihilated," "egregious," and "catastrophe" are dramatic/hyperbolic — inconsistent with the specified brand voice
+- Does NOT approve the draft as-is
+- Produces a revised version: e.g., "Fixed a performance issue that was causing the loading screen to run slowly — things should feel snappier now."
+- Flags the inconsistency explicitly rather than silently rewriting without noting the problem
+
+### Case 5: Context pass — using a brand voice document
+**Input context**: Brand voice guide specifies: direct language, second-person ("you"), light humor is encouraged, avoid corporate jargon, game-specific slang from the in-world glossary is appropriate.
+**Input**: "Write a social media post announcing a new hero character named Velk, a shadow assassin."
+**Expected behavior**:
+- Uses second-person address ("Meet your next favorite assassin")
+- Incorporates light humor if it fits naturally
+- Avoids corporate language ("We are pleased to announce" → "Meet Velk")
+- Uses in-world language if the context includes a glossary (e.g., if assassins are called "Shadowwalkers" in-world, uses that term)
+- Output matches the specified tone — not a generic press-release announcement
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (player-facing communication, patch note text, crisis response, bug routing)
+- [ ] Strips internal IDs, class names, and technical jargon from all player-facing output
+- [ ] Redirects bug fix requests to appropriate programmers rather than attempting technical solutions
+- [ ] Does NOT commit to design rollbacks without creative-director authority
+- [ ] Applies brand voice specifications from context; flags violations rather than silently accepting them
+
+---
+
+## Coverage Notes
+- Case 1 (patch note sanitization) is the most frequently used behavior — test on every new patch cycle
+- Case 3 (crisis communication) is a brand-safety test — verify the agent de-escalates rather than inflames
+- Case 4 requires a brand voice document to be in context; test is incomplete without it
+- Case 5 is the most important context-awareness test for tone consistency
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/devops-engineer.md
+++ b/Framework/agents/operations/devops-engineer.md
@ -0,0 +1,80 @@
+# Agent Test Spec: devops-engineer
+
+## Agent Summary
+- **Domain**: CI/CD pipeline configuration, build scripts, version control workflow enforcement, deployment infrastructure, branching strategy, environment management, automated test integration in CI
+- **Does NOT own**: Game logic or gameplay systems, security audits (security-engineer), QA test strategy (qa-lead), game networking logic (network-programmer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates deployment blockers to producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references CI/CD, build, deployment, version control)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for pipeline config files, shell scripts, YAML; no game source editing tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game logic, security audits, or QA test design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — CI setup for a Godot project
+**Input**: "Set up a CI pipeline for our Godot 4 project. It should run tests on every push to main and every pull request, and fail the build if tests fail."
+**Expected behavior**:
+- Produces a GitHub Actions workflow YAML (`.github/workflows/ci.yml` or equivalent)
+- Uses the Godot headless test runner command from `coding-standards.md`: `godot --headless --script tests/gdunit4_runner.gd`
+- Configures trigger on `push` to main and `pull_request`
+- Sets the job to fail (`exit 1` or non-zero exit) when tests fail — does NOT configure the pipeline to continue on test failure
+- References the project's coding standards CI rules in the output or comments
+
+### Case 2: Out-of-domain request — game networking implementation
+**Input**: "Implement the server-authoritative movement system for our multiplayer game."
+**Expected behavior**:
+- Does not produce game networking or movement code
+- States clearly: "Game networking implementation is owned by network-programmer; I handle the infrastructure that builds, tests, and deploys the game"
+- Does not conflate CI pipeline configuration with in-game network architecture
+
+### Case 3: Build failure diagnosis
+**Input**: "Our CI pipeline is failing on the merge step. The error is: 'Asset import failed: texture compression format unsupported in headless mode.'"
+**Expected behavior**:
+- Diagnoses the root cause: headless CI environment does not support GPU-dependent texture compression
+- Proposes a concrete fix: either pre-import assets locally before CI runs (commit .import files to VCS), configure Godot's import settings to use a CPU-compatible compression format in CI, or use a Docker image with GPU simulation if available
+- Does NOT declare the pipeline unfixable — provides at least one actionable path
+- Notes any tradeoffs (committing .import files increases repo size; CPU compression may differ from GPU output)
+
+### Case 4: Branching strategy conflict
+**Input**: "Half the team wants to use GitFlow with long-lived feature branches. The other half wants trunk-based development. How should we set this up?"
+**Expected behavior**:
+- Recommends trunk-based development per project conventions (CLAUDE.md / coordination-rules.md specify Git with trunk-based development)
+- Provides concrete rationale for the recommendation in this project's context: smaller team, fewer integration conflicts, faster CI feedback
+- Does NOT present this as a 50/50 choice if the project has an established convention
+- Explains how to implement trunk-based development with short-lived feature branches and feature flags if needed
+- Does NOT override the project convention without flagging that doing so requires updating CLAUDE.md
+
+### Case 5: Context pass — platform-specific build matrix
+**Input context**: Project targets PC (Windows, Linux), Nintendo Switch, and PlayStation 5.
+**Input**: "Set up our CI build matrix so we get a build artifact for each target platform on every release branch push."
+**Expected behavior**:
+- Produces a build matrix configuration with three platform entries: Windows, Linux, Switch, PS5
+- Applies platform-appropriate build steps: PC uses standard Godot export templates; Switch and PS5 require platform-specific export templates (notes that console templates require licensed SDK access and are not publicly distributed)
+- Does NOT assume all platforms can use the same build runner — flags that console builds may require self-hosted runners with licensed SDKs
+- Organizes artifacts by platform name in the pipeline output
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (CI/CD, build scripts, version control, deployment)
+- [ ] Redirects game logic and networking requests to appropriate programmers
+- [ ] Recommends trunk-based development when branching strategy is contested, per project conventions
+- [ ] Returns structured pipeline configurations (YAML, scripts) not freeform advice
+- [ ] Flags platform SDK licensing constraints for console builds rather than silently producing incorrect configs
+
+---
+
+## Coverage Notes
+- Case 1 (Godot CI) references `coding-standards.md` CI rules — verify this file is present and current before running this test
+- Case 4 (branching strategy) is a convention-enforcement test — agent must know the project convention, not just give neutral advice
+- Case 5 requires that project's target platforms are documented (in `technical-preferences.md` or equivalent)
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/economy-designer.md
+++ b/Framework/agents/operations/economy-designer.md
@ -0,0 +1,80 @@
+# Agent Test Spec: economy-designer
+
+## Agent Summary
+- **Domain**: Resource economy design, loot table design, progression curves (XP, level, unlock), in-game market and shop design, economic balance analysis, sink and faucet mechanics, inflation/deflation risk assessment
+- **Does NOT own**: Live ops event scheduling and structure (live-ops-designer), code implementation, analytics tracking design (analytics-engineer), narrative justification for economy systems (writer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates economy-breaking design conflicts to creative-director or producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references economy, loot tables, progression curves, balance)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/balance/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over live ops scheduling, code, or narrative
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — loot table design for a chest
+**Input**: "Design the loot table for a standard treasure chest in our dungeon game."
+**Expected behavior**:
+- Produces a probability table with distinct rarity tiers: Common, Uncommon, Rare, Epic, Legendary (or project-equivalent tiers)
+- Each tier has: probability percentage, example item categories, and expected gold equivalent value range
+- Probabilities sum to 100%
+- Includes a brief rationale for each tier's probability: why Common is set at its value, why Legendary is set at its value
+- Does NOT produce a single flat list of items — uses tiered probability structure to reflect meaningful rarity
+
+### Case 2: Out-of-domain request — seasonal event schedule
+**Input**: "Design the schedule for our summer event and fall event. When should they run and how long should each last?"
+**Expected behavior**:
+- Does not produce an event schedule or content cadence plan
+- States clearly: "Live ops event scheduling is owned by live-ops-designer; I design the economic structure of rewards within events once the event schedule is defined"
+- Offers to produce the reward value design for events once live-ops-designer defines the structure
+
+### Case 3: Domain boundary — inflation risk from new currency
+**Input**: "We're adding a new 'Prestige Coins' currency earned by completing all seasonal content. Players can spend them in a Prestige Shop."
+**Expected behavior**:
+- Identifies the inflation risk: if Prestige Coins accumulate faster than the shop provides sinks, the shop loses perceived value and players hoard coins without spending
+- Flags the specific risk: seasonal content completion is a finite faucet, but if the shop catalog is exhausted before the season ends, late-season coins have no value
+- Proposes a sink mechanic: rotating limited-time shop items, consumable items in the Prestige Shop, or a currency conversion option to keep coins draining
+- Does NOT approve the design as economically sound without addressing the sink question
+- Produces a structured risk assessment: faucet rate (estimated coins/week), sink capacity (estimated coins required to exhaust catalog), surplus projection
+
+### Case 4: Mid-game progression curve issue
+**Input**: "Players are reporting the mid-game XP grind (levels 20-35) feels like a wall. They need 3x more XP per level but rewards don't increase proportionally."
+**Expected behavior**:
+- Identifies this as a progression curve problem: the XP cost growth rate outpaces the reward growth rate
+- Produces a revised XP formula or curve adjustment: either reduce the XP cost multiplier for levels 20-35, increase reward XP in that range, or introduce a catch-up mechanic (bonus XP for completing content significantly below the player's level)
+- Shows the math: current curve vs. proposed curve, with specific numbers for levels 20, 25, 30, 35
+- Flags that any curve change affects time-to-level-cap projections — notes the downstream impact on end-game content pacing
+
+### Case 5: Context pass — balance analysis using current economy data
+**Input context**: Current economy data: average player earns 450 Gold/hour, average shop item costs 2,000 Gold, average session length is 40 minutes. Premium items cost 5,000 Gold.
+**Input**: "Is our current Gold economy healthy? Should we adjust prices or earn rates?"
+**Expected behavior**:
+- Uses the specific numbers provided: 450 Gold/hour = 300 Gold/40-min session; 2,000 Gold item requires ~4.4 sessions to afford; 5,000 Gold premium item requires ~11 sessions
+- Evaluates whether these ratios feel rewarding or frustrating based on economy design principles
+- Produces a concrete recommendation using the actual numbers: e.g., "At current earn rates, premium items take ~7.3 hours of play to afford — this is at the high end of acceptable; consider either increasing earn rate to 550 Gold/hour or reducing premium item cost to 4,000 Gold"
+- Does NOT produce generic advice ("prices may be too high") without anchoring to the provided data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (loot tables, progression curves, resource economy, inflation/deflation analysis)
+- [ ] Redirects live ops scheduling requests to live-ops-designer without producing schedules
+- [ ] Flags inflation/deflation risks proactively with quantified sink/faucet analysis
+- [ ] Produces explicit math for progression curves — no vague curve adjustments without numbers
+- [ ] Uses actual economy data from context; does not produce generic benchmarks when specifics are provided
+
+---
+
+## Coverage Notes
+- Case 3 (inflation risk) is an economic health test — missed inflation risks cause long-term economy damage in live games
+- Case 4 requires the agent to produce actual numbers, not curve shapes — verify math is present, not just a narrative
+- Case 5 is the most important context-awareness test; agent must use provided data, not placeholder values
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/live-ops-designer.md
+++ b/Framework/agents/operations/live-ops-designer.md
@ -0,0 +1,81 @@
+# Agent Test Spec: live-ops-designer
+
+## Agent Summary
+- **Domain**: Post-launch content strategy, seasonal events (design and structure), battle pass design, content cadence planning, player retention mechanic design, live service feature roadmaps
+- **Does NOT own**: Economy math and reward value calculations (economy-designer), analytics tracking implementation (analytics-engineer), narrative content within events (writer), code implementation
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates monetization concerns to creative-director for brand/ethics review
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references live ops, seasonal events, battle pass, retention)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/live-ops/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over economy math, analytics pipelines, or narrative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — summer event design
+**Input**: "Design a summer event for our game. It should run for 3 weeks and give players reasons to log in daily."
+**Expected behavior**:
+- Produces an event structure document covering: event duration (3 weeks, with start/end dates if context provides the current date), daily login retention hooks (daily missions, login streaks, time-limited rewards), progression gates (weekly milestones that reward continued engagement), and reward categories (cosmetic, functional, or currency — flagged for economy-designer to value)
+- Does NOT assign specific reward values or currency amounts — marks these as [TO BE BALANCED BY ECONOMY-DESIGNER]
+- Identifies the core player loop for the event separate from the base game loop
+- Output is a structured event brief: overview, schedule, progression structure, reward categories
+
+### Case 2: Out-of-domain request — reward value calculation
+**Input**: "How much premium currency should we give out in this event? What's the fair value of each cosmetic reward tier?"
+**Expected behavior**:
+- Does not produce currency amounts or reward valuation
+- States clearly: "Reward values and currency amounts are owned by economy-designer; I design the event structure and define what rewards exist, then economy-designer assigns their values"
+- Offers to produce the reward structure (tiers, unlock gates, cosmetic categories) so economy-designer has something concrete to value
+
+### Case 3: Domain boundary — predatory monetization concern
+**Input**: "Let's design the battle pass so that players need to spend premium currency on top of the pass price to complete all tiers within the season."
+**Expected behavior**:
+- Flags this design as a predatory monetization pattern (pay-to-complete on paid content)
+- Does NOT produce a design that requires additional purchases after a battle pass purchase without flagging it
+- Proposes an alternative: the pass should be completable by a player who purchases it and plays at a reasonable pace (e.g., 45 minutes/day for 5 days/week)
+- Notes that this decision has brand and ethics implications — escalates to creative-director for approval before proceeding
+- Does not refuse to continue entirely — offers the ethical alternative design and awaits direction
+
+### Case 4: Conflict — event schedule vs. main game progression pacing
+**Input**: "We want to run a double-XP event during weeks 3-5 of the season, but our progression designer says that's when players are supposed to hit the mid-game difficulty curve."
+**Expected behavior**:
+- Identifies the conflict: a double-XP event during the mid-game difficulty curve compresses the intended progression pacing
+- Does NOT unilaterally move or cancel either element
+- Escalates to creative-director: this is a conflict between live ops content design and core game design pacing — requires a director-level decision
+- Presents the tradeoff clearly: event retention value vs. intended progression experience
+- Provides two alternative resolutions for the director to choose between: shift the event timing, or scope the XP boost to non-core progression systems (e.g., cosmetic grind only)
+
+### Case 5: Context pass — designing to address a player retention drop-off
+**Input context**: Analytics show a 40% player drop-off at Day 7, attributed to players completing the tutorial but finding no mid-term goal to pursue.
+**Input**: "Design a live ops feature to address the Day 7 drop-off."
+**Expected behavior**:
+- Designs specifically for the Day 7 cohort — not a generic retention feature
+- Proposes a mid-term goal structure: a 2-week "Explorer Challenge" that unlocks at Day 5-7 and provides a visible progression track with rewards at Day 10, 14, and 21
+- Connects the design explicitly to the identified drop-off point: the feature must be visible and activating before or at Day 7
+- Does NOT design a feature for Day 1 retention or Day 30 monetization when the data points to Day 7 as the target
+- Notes that specific reward values are [TO BE DEFINED BY ECONOMY-DESIGNER] using the actual retention data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event structure, content cadence, retention design, battle pass design)
+- [ ] Redirects reward value and economy math requests to economy-designer
+- [ ] Flags predatory monetization patterns and escalates to creative-director rather than implementing them silently
+- [ ] Escalates event/core-progression conflicts to creative-director rather than resolving unilaterally
+- [ ] Uses provided retention data to target specific player cohorts, not generic engagement strategies
+
+---
+
+## Coverage Notes
+- Case 3 (monetization ethics) is a brand-safety test — failure here could result in harmful live ops designs shipping
+- Case 4 (escalation behavior) is a coordination test — verify the agent actually escalates rather than deciding independently
+- Case 5 is the most important context-awareness test; agent must target the specific drop-off point, not a generic solution
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/localization-lead.md
+++ b/Framework/agents/operations/localization-lead.md
@ -0,0 +1,81 @@
+# Agent Test Spec: localization-lead
+
+## Agent Summary
+- **Domain**: Internationalization (i18n) architecture, string extraction workflows and tooling configuration, locale testing methodology, translation pipeline design (extraction → TMS → import), string quality standards, locale-specific formatting rules (plurals, RTL, date/number formats)
+- **Does NOT own**: Game narrative content and dialogue writing (writer), code implementation of i18n calls (gameplay-programmer), translation work itself (external translators)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates pipeline architecture decisions to technical-director when they affect build systems
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references i18n, string extraction, locale pipeline, localization)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for localization config, pipeline docs, string tables; no game source editing or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over narrative content, game code implementation, or translation quality
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — string extraction pipeline for a Unity project
+**Input**: "Set up a string extraction pipeline for our Unity game. We need to get all localizable strings into a format translators can work with."
+**Expected behavior**:
+- Produces a concrete extraction configuration covering: which string types to extract (UI labels, dialogue, item descriptions — not debug strings), the tool to use (e.g., Unity Localization package string tables, or a custom extraction script targeting specific component types), and the output format (CSV, XLIFF, or TMX — notes which formats are compatible with common TMS tools like Crowdin or Lokalise)
+- Specifies the folder structure: e.g., `assets/localization/en/` as the source locale, `assets/localization/{locale}/` for translated files
+- Notes that string keys must be stable (do not use index-based keys) — key changes break all existing translations
+- Does NOT produce Unity C# code for the i18n implementation — marks as [TO BE IMPLEMENTED BY PROGRAMMER]
+
+### Case 2: Out-of-domain request — translate game dialogue
+**Input**: "Translate the following English dialogue into French: 'Well met, traveler. The road ahead is treacherous.'"
+**Expected behavior**:
+- Does not produce a French translation
+- States clearly: "localization-lead owns the pipeline, quality standards, and workflow; actual translation work is performed by human translators or approved translation vendors — I am not a translator"
+- Optionally notes what information a translator would need: context (who is speaking, to whom, game genre/tone), character limit constraints if any, glossary terms (e.g., if "traveler" has a game-specific translation)
+
+### Case 3: Domain boundary — missing plural forms in Russian locale
+**Input**: "Our Russian locale files only have a singular form for item quantity strings. Russian requires multiple plural forms (1 item, 2-4 items, 5+ items use different forms)."
+**Expected behavior**:
+- Identifies this as a locale-specific plural form gap: Russian has 3 plural categories (one, few, many) per CLDR/Unicode plural rules — a single string is insufficient
+- Flags it as a localization quality bug, not a minor style issue — incorrect plural forms are grammatically wrong and visible to players
+- Recommends the fix: update the string extraction format to support CLDR plural categories (one/few/many/other), and flag to the translation vendor that Russian strings need all plural forms
+- Notes which other languages in the pipeline also require plural form support (e.g., Polish, Czech, Arabic)
+- Does NOT suggest using a numeric threshold workaround as a substitute for proper CLDR plural support
+
+### Case 4: String key naming conflict between two systems
+**Input**: "Our UI system uses keys like 'button_confirm' and 'button_cancel'. Our dialogue system uses 'confirm' and 'cancel' for the same concepts. Translators are confused about which to use."
+**Expected behavior**:
+- Identifies the conflict: two systems use different key naming conventions for semantically identical strings, creating duplicate translation work and translator confusion
+- Produces a naming convention resolution: domain-prefixed keys with a consistent separator (e.g., `ui.button.confirm`, `ui.button.cancel`) — all systems use the same key for shared concepts
+- Recommends that shared UI primitives (Confirm, Cancel, Back, OK) use a single canonical key in a shared namespace, referenced by both systems
+- Provides a migration path: map old keys to new keys, update all string references in both systems, deprecate old keys after one release cycle
+- Does NOT recommend maintaining two separate keys for the same concept
+
+### Case 5: Context pass — pipeline accommodates RTL languages
+**Input context**: Target locales include English (en), French (fr), German (de), Arabic (ar), and Hebrew (he).
+**Input**: "Design the localization pipeline for this project."
+**Expected behavior**:
+- Identifies Arabic and Hebrew as RTL languages — explicitly calls this out as a pipeline requirement
+- Designs the pipeline to include: RTL text rendering support (flag for programmer: UI must support RTL layout mirroring), bidirectional (bidi) text handling in string tables, locale-specific testing checklist entry for RTL layout
+- Does NOT design a pipeline that only accounts for LTR languages when RTL locales are specified
+- Notes that Arabic also requires a different plural form structure (6 plural categories in CLDR) — flags for translation vendor
+- Output includes all five locales in the pipeline architecture, not just the default (en)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (pipeline, extraction, string quality, locale formats, i18n architecture)
+- [ ] Does not produce translations — redirects translation work to human translators/vendors
+- [ ] Flags locale-specific gaps (plural forms, RTL) as quality bugs requiring pipeline changes
+- [ ] Produces a unified key naming convention when conflicts arise — does not accept dual conventions
+- [ ] Incorporates all provided target locales, including RTL languages, into pipeline design
+
+---
+
+## Coverage Notes
+- Case 3 (plural forms) and Case 5 (RTL) are locale-correctness tests — these affect shipping quality in non-English markets
+- Case 4 (key naming conflict) is a pipeline hygiene test — duplicate keys cause ongoing translator confusion and cost
+- Case 5 requires the target locale list to be in context; if not provided, agent should ask before designing the pipeline
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/release-manager.md
+++ b/Framework/agents/operations/release-manager.md
@ -0,0 +1,80 @@
+# Agent Test Spec: release-manager
+
+## Agent Summary
+- **Domain**: Release pipeline management, platform certification checklists (Nintendo, Sony, Microsoft, Apple, Google), store submission workflows, platform technical requirements compliance, semantic version numbering, release branch management
+- **Does NOT own**: Game design decisions, QA test strategy or test case design (qa-lead), QA test execution (qa-tester), build infrastructure (devops-engineer)
+- **Model tier**: Sonnet
+- **Gate IDs**: May be invoked by `/gate-check` during Release phase; LAUNCH BLOCKED verdict is release-manager's primary escalation output
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references release pipeline, certification, store submission)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/ directory; no game source or test tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over QA strategy, game design, or build infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — platform certification checklist for Nintendo Switch
+**Input**: "Generate the certification checklist for our Nintendo Switch submission."
+**Expected behavior**:
+- Produces a structured checklist covering Nintendo Lotcheck requirements relevant to the game type
+- Includes categories: content rating (CERO/PEGI/ESRB as applicable), save data handling, offline mode compliance, error handling (lost connectivity, storage full), controller requirement (Joy-Con, Pro Controller support), sleep/wake behavior, screenshot/video capture compliance
+- Formats output as a numbered checklist with pass/fail columns
+- Notes that Nintendo's full Lotcheck guidelines require a licensed developer account to access and flags any items that require manual verification against the current guidelines document
+- Does NOT produce fabricated requirement IDs — uses known public requirements or clearly marks uncertainty
+
+### Case 2: Out-of-domain request — design test cases
+**Input**: "Write test cases for our save system to make sure it passes certification."
+**Expected behavior**:
+- Does not produce test case specifications
+- States clearly: "Test case design is owned by qa-lead (strategy) and qa-tester (execution); I can provide the certification requirements that the save system must meet, which qa-lead can then use to design tests"
+- Optionally offers to list the save-system-relevant certification requirements
+
+### Case 3: Domain boundary — certification failure (rating issue)
+**Input**: "Our build was rejected by the ESRB. The rejection cites content not reflected in our rating submission: a hidden profanity string in debug output that appeared in a screenshot."
+**Expected behavior**:
+- Issues a LAUNCH BLOCKED verdict with the specific platform requirement referenced (ESRB submission accuracy requirement)
+- Identifies the immediate action required: locate and remove all debug output containing inappropriate content before resubmission
+- Notes the resubmission process: corrected build must be resubmitted with updated content descriptor if needed
+- Does NOT minimize the issue — a certification rejection is a blocking event, not an advisory
+- Escalates to producer: documents the delay impact on release timeline
+
+### Case 4: Version numbering conflict — hotfix vs. release branch
+**Input**: "Our release branch is at v1.2.0. A hotfix was applied directly on main and tagged v1.2.1. Now the release branch also has changes that need to ship as v1.2.1 but they're different changes."
+**Expected behavior**:
+- Identifies the conflict: two different changesets have been assigned the same version tag
+- Applies semantic versioning resolution: one must be re-tagged — the release branch changes should become v1.2.2 if v1.2.1 is already published; if v1.2.1 is not yet published, coordinate with devops-engineer to merge or re-tag
+- Does NOT accept a state where the same version number refers to two different builds
+- Notes that once a version is submitted to a store, it cannot be reused — flags this as a potential store submission blocker
+
+### Case 5: Context pass — release date constraint and certification lead time
+**Input context**: Target release date is 2026-06-01. Current date is 2026-04-06. Nintendo Lotcheck typically takes 4-6 weeks.
+**Input**: "What should we prioritize on the certification checklist given our timeline?"
+**Expected behavior**:
+- Calculates the available window: ~8 weeks to release date; Nintendo Lotcheck at 4-6 weeks means submission must be ready by approximately 2026-04-20 to 2026-05-04 to allow for a potential resubmission cycle
+- Flags that a single rejection cycle would consume the buffer — prioritizes items historically associated with Lotcheck rejections (save data, offline mode, error handling)
+- Orders the checklist by certification lead time impact, not by perceived difficulty
+- Does NOT produce a checklist that assumes first-pass certification — builds in resubmission time
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (release pipeline, certification checklists, version numbering, store submission)
+- [ ] Redirects test case design requests to qa-lead/qa-tester without producing test specs
+- [ ] Issues LAUNCH BLOCKED verdicts for certification failures — does not downgrade to advisory
+- [ ] Applies semantic versioning correctly and flags version conflicts as store-blocking issues
+- [ ] Uses provided timeline data to prioritize checklist items by certification lead time
+
+---
+
+## Coverage Notes
+- Case 3 (LAUNCH BLOCKED verdict) is the most critical test — this agent's primary safety output is blocking bad launches
+- Case 5 requires current date and release date context; verify the agent uses actual dates, not placeholder estimates
+- Certification requirements change over time — flag if the agent produces specific requirement IDs that may be outdated
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/qa/accessibility-specialist.md
+++ b/Framework/agents/qa/accessibility-specialist.md
@ -0,0 +1,81 @@
+# Agent Test Spec: accessibility-specialist
+
+## Agent Summary
+Domain: Input remapping, text scaling, colorblind modes, screen reader support, and accessibility standards compliance (WCAG, platform certifications).
+Does NOT own: overall UX flow design (ux-designer), visual art style direction (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references accessibility / inclusive design / WCAG)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the player HUD for accessibility."
+**Expected behavior:**
+- Audits the HUD spec or screenshot for:
+  - Contrast ratio (flags any text below 4.5:1 for AA or 7:1 for AAA)
+  - Alternative representation for color-coded information (e.g., enemy health bars use only color, no shape distinction)
+  - Text size (flags any text below 16px equivalent at 1080p)
+  - Screen reader or TTS annotation availability for key status elements
+- Produces a prioritized finding list with specific element names and the criteria they fail
+- Does NOT redesign the HUD — produces findings for ux-designer and ui-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the overall game flow: main menu → character select → loading → gameplay → pause → results."
+**Expected behavior:**
+- Does NOT produce UX flow architecture
+- Explicitly states that overall game flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- May note it can review the flow for accessibility concerns (e.g., time limits, cognitive load) once the flow is designed
+
+### Case 3: Colorblind mode conflict
+**Input:** "The proposed colorblind mode for deuteranopia replaces the enemy red health bars with orange, but the art palette already uses orange for friendly units."
+**Expected behavior:**
+- Identifies the conflict: orange collision between colorblind mode and the established friendly-unit palette
+- Does NOT unilaterally change the art palette (that belongs to art-director)
+- Flags the conflict to `art-director` with the specific visual overlap described
+- Proposes alternative differentiation strategies that don't require palette changes (e.g., shape/icon overlay, pattern fill, iconography)
+
+### Case 4: UI state requirement for accessibility feature
+**Input:** "Screen reader support for the inventory requires the system to expose item names and quantities as accessible text nodes."
+**Expected behavior:**
+- Produces an accessibility requirements spec defining the required accessible text properties for each inventory element
+- Identifies that implementing accessible text nodes requires UI system changes
+- Coordinates with `ui-programmer` to implement the required accessible text node exposure
+- Does NOT implement the UI system changes itself
+
+### Case 5: Context pass — WCAG 2.1 targets
+**Input:** Project accessibility target provided in context: WCAG 2.1 AA compliance. Request: "Review the dialogue system for accessibility."
+**Expected behavior:**
+- References specific WCAG 2.1 AA success criteria relevant to dialogue (e.g., 1.4.3 Contrast Minimum, 1.4.4 Resize Text, 2.2.1 Timing Adjustable for auto-advancing dialogue)
+- Uses exact criterion numbers and names from the standard, not paraphrases
+- Flags each finding with the specific criterion it fails
+- Notes which criteria are out of scope for AA (AAA-only) so they are not incorrectly flagged as failures
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (remapping, text scaling, colorblind modes, screen reader, standards compliance)
+- [ ] Redirects UX flow design to ux-designer, art palette decisions to art-director
+- [ ] Returns structured findings with specific element names, contrast ratios, and criterion references
+- [ ] Does not implement UI changes — coordinates with ui-programmer for implementation
+- [ ] References specific WCAG criteria by number when compliance target is provided
+- [ ] Flags conflicts between accessibility requirements and art decisions to art-director
+
+---
+
+## Coverage Notes
+- HUD audit (Case 1) should produce findings trackable as accessibility stories in the sprint backlog
+- Colorblind conflict (Case 3) confirms the agent respects art-director's authority over the palette
+- WCAG criteria (Case 5) verifies the agent uses standards precisely, not generically
--- a/Framework/agents/qa/qa-tester.md
+++ b/Framework/agents/qa/qa-tester.md
@ -0,0 +1,87 @@
+# Agent Test Spec: qa-tester
+
+## Agent Summary
+- **Domain**: Detailed test case authoring, bug reports (structured format), test execution documentation, regression checklists, smoke check execution docs, test evidence recording per the project's coding standards
+- **Does NOT own**: Test strategy and test plan design (qa-lead), implementation fixes for found bugs (appropriate programmer), QA process architecture (qa-lead)
+- **Category**: qa
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags ambiguous acceptance criteria to qa-lead rather than resolving independently
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references test cases, bug reports, test execution, regression testing)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for tests/ and production/qa/evidence/; no source code editing tools)
+- [ ] Model tier is Sonnet (default for QA specialists)
+- [ ] Agent definition does not claim authority over test strategy, fix implementation, or acceptance criterion definition
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — test cases for a save system
+**Input**: "Write test cases for our save system. It must save and load player position, inventory, and quest state."
+**Expected behavior**:
+- Produces a test case list with at minimum the following test cases, each containing all four required fields:
+  - **TC-SAVE-001**: Save and load player position
+  - **TC-SAVE-002**: Save and load full inventory (multiple item types, quantities, equipped state)
+  - **TC-SAVE-003**: Save and load quest state (in-progress, completed, and locked quest states)
+  - **TC-SAVE-004**: Overwrite an existing save file
+  - **TC-SAVE-005**: Load a save file from a previous version (backward compatibility)
+  - **TC-SAVE-006**: Corrupt save file handling (file exists but is invalid)
+- Each test case includes: **Precondition** (required game state before test), **Steps** (numbered, unambiguous), **Expected Result** (specific, observable outcome), **Pass Criteria** (binary pass/fail condition)
+- Does NOT write "verify the save works" as a pass criterion — criteria must be observable and unambiguous
+
+### Case 2: Out-of-domain request — implement a bug fix
+**Input**: "You found a bug where the save system loses inventory data on version mismatch. Please fix it."
+**Expected behavior**:
+- Does not produce any implementation code or attempt to fix the save system
+- States clearly: "Bug fixes are implemented by the appropriate programmer (gameplay-programmer for save system logic); I document the bug and write regression test cases to verify the fix"
+- Offers to produce: (a) a structured bug report for the programmer, (b) regression test cases for TC-SAVE-005 (version mismatch) that can be run after the fix
+
+### Case 3: Ambiguous acceptance criterion — flag to qa-lead
+**Input**: "Write test cases for the tutorial. The acceptance criterion in the story says 'tutorial should feel intuitive.'"
+**Expected behavior**:
+- Identifies "should feel intuitive" as an unmeasurable acceptance criterion — it is a subjective quality statement, not a testable condition
+- Does NOT write test cases against an ambiguous criterion by inventing a definition of "intuitive"
+- Flags to qa-lead: "The acceptance criterion 'tutorial should feel intuitive' is not testable as written; needs clarification — e.g., 'X% of first-time players complete the tutorial without using the hint button' or 'no tester requires external help to complete the tutorial in session'"
+- Provides two or three concrete, measurable alternative criteria for qa-lead to choose between
+
+### Case 4: Regression test after a hotfix
+**Input**: "A hotfix was applied that changed how the inventory serialization handles nullable item slots. Write a targeted regression checklist for the affected systems."
+**Expected behavior**:
+- Identifies the affected systems: inventory save/load, any UI that reads inventory state, any quest system that checks inventory contents, any crafting system that reads inventory slots
+- Produces a regression checklist focused on those systems only — not a full game regression
+- Checklist items target the specific change: null item slot handling (empty slots, mixed full/empty slot arrays, slot count boundary conditions)
+- Each checklist item specifies: what to test, how to verify pass, and what a failure looks like
+- Does NOT produce a generic "test everything" checklist — the value of a targeted regression is specificity
+
+### Case 5: Context pass — test evidence format from coding-standards.md
+**Input context**: coding-standards.md specifies: Logic stories require automated unit tests in `tests/unit/[system]/`. Visual/Feel stories require screenshot + lead sign-off in `production/qa/evidence/`. UI stories require manual walkthrough doc in `production/qa/evidence/`.
+**Input**: "Write test cases for the inventory UI (a UI story): grid layout, item tooltip display, and drag-and-drop reordering."
+**Expected behavior**:
+- Classifies this correctly as a UI story per the provided standards
+- Produces a manual walkthrough test document (not automated unit tests) — because the coding standard specifies manual walkthrough for UI stories
+- Specifies the output location: `production/qa/evidence/` (not `tests/unit/`)
+- Test cases include: grid layout verification (all items appear, no overflow), tooltip display (correct item name, stats, description appear on hover/focus), and drag-and-drop (item moves to target slot, original slot becomes empty, slot limits respected)
+- Notes that this is ADVISORY evidence level per the coding standards, not BLOCKING — explicitly states this so the team knows the gate level
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (test case authoring, bug reports, test execution documentation, regression checklists)
+- [ ] Redirects bug fix requests to appropriate programmers and offers to document the bug and write regression tests
+- [ ] Flags ambiguous acceptance criteria to qa-lead rather than inventing a testable interpretation
+- [ ] Produces targeted regression checklists (system-specific) not full-game regression passes
+- [ ] Uses the correct test evidence format and output location per coding-standards.md
+
+---
+
+## Coverage Notes
+- Case 1 (test case completeness) is the foundational quality test — missing fields (precondition, steps, expected result, pass criteria) are a failure
+- Case 3 (ambiguous criterion) is a coordination test — qa-tester must not silently accept untestable criteria
+- Case 5 requires coding-standards.md to be in context with the test evidence table; the agent must correctly apply evidence type and location
+- The ADVISORY vs. BLOCKING gate level (Case 5) is a detail that affects story completion — verify the agent reports it
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/qa/security-engineer.md
+++ b/Framework/agents/qa/security-engineer.md
@ -0,0 +1,79 @@
+# Agent Test Spec: security-engineer
+
+## Agent Summary
+Domain: Anti-cheat systems, save data security, network security, vulnerability assessment, and data privacy compliance.
+Does NOT own: game logic design (gameplay-programmer), server infrastructure (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references anti-cheat / security / vulnerability assessment)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game logic design or server deployment
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the save data system for security issues."
+**Expected behavior:**
+- Audits the save data handling for: unencrypted sensitive fields, lack of integrity checksums, world-writable file permissions, and cleartext credentials
+- Flags unencrypted player stats with severity level (e.g., MEDIUM — enables offline stat manipulation)
+- Recommends: AES-256 encryption for sensitive fields, HMAC checksum for tamper detection
+- Produces a prioritized finding list (CRITICAL / HIGH / MEDIUM / LOW)
+- Does NOT change the save system code directly — produces findings for gameplay-programmer or engine-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the matchmaking algorithm to pair players by skill rating."
+**Expected behavior:**
+- Does NOT produce matchmaking algorithm design
+- Explicitly states that matchmaking design belongs to `network-programmer`
+- Redirects the request to `network-programmer`
+- May note it can review the matchmaking system for security vulnerabilities (e.g., rating manipulation) once the design is complete
+
+### Case 3: Critical vulnerability — SQL injection
+**Input:** (Hypothetical) "Review this server-side query handler: `query = 'SELECT * FROM users WHERE id=' + user_input`"
+**Expected behavior:**
+- Flags this as a CRITICAL vulnerability (SQL injection via unsanitized user input)
+- Provides immediate remediation: parameterized queries / prepared statements
+- Recommends a security review of all other query-construction code in the codebase
+- Escalates to `technical-director` given CRITICAL severity — does not leave the finding unescalated
+
+### Case 4: Security vs. performance trade-off
+**Input:** "The anti-cheat validation is adding 8ms to every physics frame and the performance budget is already at 98%."
+**Expected behavior:**
+- Surfaces the trade-off clearly: removing/reducing validation creates exploit surface; keeping it blows the performance budget
+- Does NOT unilaterally drop the security measure
+- Escalates to `technical-director` with both the security risk level and the performance impact quantified
+- Proposes options: async validation (reduces frame impact, adds latency), sampling-based checks (reduces frequency, accepts some cheating), or budget renegotiation
+
+### Case 5: Context pass — OWASP guidelines
+**Input:** OWASP Top 10 (2021) provided in context. Request: "Audit the game's login and account system."
+**Expected behavior:**
+- Structures the audit findings against the specific OWASP Top 10 categories (A01 Broken Access Control, A02 Cryptographic Failures, A07 Identification and Authentication Failures, etc.)
+- References specific control IDs from the provided list rather than generic advice
+- Flags each finding with the relevant OWASP category
+- Produces a compliance gap list: which controls are met, which are missing, which are partial
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (anti-cheat, save security, network security, vulnerability assessment)
+- [ ] Redirects matchmaking / game logic requests to appropriate agents
+- [ ] Returns structured findings with severity classification (CRITICAL / HIGH / MEDIUM / LOW)
+- [ ] Does not implement fixes unilaterally — produces findings for the responsible programmer
+- [ ] Escalates CRITICAL findings to technical-director immediately
+- [ ] References specific standards (OWASP, GDPR, etc.) when provided in context
+
+---
+
+## Coverage Notes
+- Save data audit (Case 1) confirms the agent produces actionable, prioritized findings not generic advice
+- CRITICAL vulnerability escalation (Case 3) verifies the agent's severity classification and escalation path
+- Performance trade-off (Case 4) confirms the agent does not silently drop security measures to hit a budget
--- a/Framework/agents/specialists/ai-programmer.md
+++ b/Framework/agents/specialists/ai-programmer.md
@ -0,0 +1,79 @@
+# Agent Test Spec: ai-programmer
+
+## Agent Summary
+Domain: NPC behavior, state machines, pathfinding, perception systems, and AI decision-making.
+Does NOT own: player mechanics (gameplay-programmer), rendering or engine internals (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references NPC behavior / AI systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over player mechanics or engine rendering
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a patrol-and-alert behavior tree for a guard NPC: patrol between waypoints, detect the player within 10 units, then enter an alert state and pursue."
+**Expected behavior:**
+- Produces a behavior tree spec (nodes: Selector, Sequence, Leaf actions) plus corresponding code scaffold
+- Defines clearly named states: Patrol, Alert, Pursue
+- Uses a perception/detection check as a condition node, not inline in movement code
+- Waypoints are data-driven (passed as a resource or export), not hardcoded positions
+- Output includes doc comments on public API
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement player input handling for the WASD movement and dash ability."
+**Expected behavior:**
+- Does NOT produce player input or movement code
+- Explicitly states this is outside its domain (player mechanics belong to gameplay-programmer)
+- Redirects the request to `gameplay-programmer`
+- May note that once player position is available via API, AI perception can reference it
+
+### Case 3: Cross-domain coordination — level constraints
+**Input:** "Design pathfinding for the warehouse level, but the level has narrow corridors that confuse the navmesh."
+**Expected behavior:**
+- Does NOT unilaterally modify level layout or navmesh assets
+- Coordinates with `level-designer` to clarify navmesh requirements and corridor dimensions
+- Proposes a pathfinding approach (e.g., navmesh with agent radius tuning, flow fields) conditional on level geometry
+- Documents assumptions and flags blockers clearly
+
+### Case 4: Performance escalation — custom data structures
+**Input:** "The pathfinding priority queue is the bottleneck; I need a custom binary heap implementation for performance."
+**Expected behavior:**
+- Recognizes that a low-level, engine-integrated data structure is within engine-programmer's domain
+- Escalates to `engine-programmer` with a clear description of the bottleneck and required interface
+- May provide the algorithmic spec (binary heap interface, expected operations) to guide the engine-programmer
+- Does NOT implement the low-level structure unilaterally if it requires engine memory management
+
+### Case 5: Context pass — uses level layout for pathfinding design
+**Input:** Level layout document provided in context showing two choke points: a doorway at (12, 0) and a bridge at (40, 5). Request: "Design the patrol route and threat response for enemies in this level."
+**Expected behavior:**
+- References the specific choke point coordinates from the provided context
+- Designs patrol routes that leverage the choke points as tactical positions
+- Specifies alert state transitions that funnel NPCs toward identified choke points during pursuit
+- Does not invent geometry not present in the provided layout document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (NPC behavior, pathfinding, perception, state machines)
+- [ ] Redirects out-of-domain requests to correct agent (gameplay-programmer, engine-programmer, level-designer)
+- [ ] Returns structured findings (behavior tree specs, state machine diagrams, code scaffolds)
+- [ ] Does not modify player mechanics files without explicit delegation
+- [ ] Escalates performance-critical low-level structures to engine-programmer
+- [ ] Uses data-driven NPC configuration (waypoints, detection radii) not hardcoded values
+
+---
+
+## Coverage Notes
+- Behavior tree output (Case 1) should be validated by a unit test in `tests/unit/ai/`
+- Level-layout context (Case 5) verifies the agent reads and applies provided documents rather than inventing
+- Performance escalation (Case 4) confirms the agent recognizes the engine-programmer boundary
--- a/Framework/agents/specialists/engine-programmer.md
+++ b/Framework/agents/specialists/engine-programmer.md
@ -0,0 +1,79 @@
+# Agent Test Spec: engine-programmer
+
+## Agent Summary
+Domain: Rendering pipeline, physics integration, memory management, resource loading, and core engine framework.
+Does NOT own: gameplay mechanics (gameplay-programmer), editor/debug tool UI (tools-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rendering / memory / engine core)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay mechanics or tool UI
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a custom object pool for projectiles to avoid per-frame allocation."
+**Expected behavior:**
+- Produces an engine-level object pool implementation with acquire/release interface
+- Pool is typed to the projectile object type, uses pre-allocated fixed-size storage
+- Provides thread-safety notes (or clearly marks as single-threaded-only with rationale)
+- Includes doc comments on the public API per coding standards
+- Output is compatible with the project's configured engine and language
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Add a pause menu screen with volume sliders and a 'back to main menu' button."
+**Expected behavior:**
+- Does NOT produce UI screen code
+- Explicitly states that menu screens belong to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- May note it can provide engine-level audio volume API endpoints for the ui-programmer to call
+
+### Case 3: Memory leak diagnosis
+**Input:** "Memory usage grows by ~50MB per level load and never releases. We suspect the resource loading system."
+**Expected behavior:**
+- Produces a systematic diagnosis approach: reference counting audit, resource handle lifecycle check, cache invalidation review
+- Identifies likely causes (orphaned resource handles, circular references, cache that never evicts)
+- Produces a concrete fix for the identified leak pattern
+- Provides a test to verify the fix (memory baseline before load, measure after unload, confirm return to baseline)
+
+### Case 4: Cross-domain coordination — shared system optimization
+**Input:** "I need to optimize the physics broadphase, but the gameplay system is tightly coupled to the physics query API."
+**Expected behavior:**
+- Does NOT unilaterally change the physics query API surface (would break gameplay-programmer's code)
+- Coordinates with `lead-programmer` to plan the change safely
+- Proposes a migration path: new optimized API alongside old API, with a deprecation period
+- Documents the coordination requirement before proceeding
+
+### Case 5: Context pass — checks engine version reference
+**Input:** Engine version reference (Godot 4.6) provided in context. Request: "Set up the default physics engine for the project."
+**Expected behavior:**
+- Reads the engine version reference and notes Godot 4.6 change: Jolt physics is now the default
+- Produces configuration guidance that accounts for the Jolt-as-default change (4.6 migration note)
+- Flags any API differences between GodotPhysics and Jolt that could affect existing code
+- Does NOT suggest deprecated or pre-4.6 physics setup steps without noting they apply to older versions
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (rendering, physics, memory, resource loading, core framework)
+- [ ] Redirects UI/menu requests to ui-programmer
+- [ ] Returns structured findings (implementation code, diagnosis steps, migration plans)
+- [ ] Coordinates with lead-programmer before changing shared API surfaces
+- [ ] Checks engine version reference before suggesting engine-specific APIs
+- [ ] Provides test evidence for fixes (memory before/after, performance measurements)
+
+---
+
+## Coverage Notes
+- Object pool (Case 1) must include a unit test in `tests/unit/engine/`
+- Memory leak diagnosis (Case 3) should produce evidence artifacts in `production/qa/evidence/`
+- Engine version check (Case 5) confirms the agent treats VERSION.md as authoritative, not LLM training data
--- a/Show more
+++ b/Show more