diff --git a/.claude/docs/director-gates.md b/.claude/docs/director-gates.md
index 48347fb..3f8cbe4 100644
--- a/.claude/docs/director-gates.md
+++ b/.claude/docs/director-gates.md
@@ -43,9 +43,9 @@ Examples:
| Mode | What runs | Best for |
|------|-----------|----------|
-| `full` | All gates active — current behaviour | New projects, teams, learning the workflow |
-| `lean` | PHASE-GATEs only (`/gate-check`) — all per-skill gates skipped | Experienced devs who trust their own design work |
-| `solo` | No director gates anywhere | Game jams, prototypes, seasoned solo devs at speed |
+| `full` | All gates active — every workflow step reviewed | Teams, learning users, or when you want thorough director feedback at every step |
+| `lean` | PHASE-GATEs only (`/gate-check`) — per-skill gates skipped | **Default** — solo devs and small teams; directors review at milestones only |
+| `solo` | No director gates anywhere | Game jams, prototypes, maximum speed |
**Check pattern — apply before every gate spawn:**
@@ -66,7 +66,18 @@ Apply the resolved mode:
## Invocation Pattern (copy into any skill)
+**MANDATORY: Resolve review mode before every gate spawn.** Never spawn a gate without checking. The resolved mode is determined once per skill run:
+1. If skill was called with `--review [mode]`, use that
+2. Else read `production/review-mode.txt`
+3. Else default to `lean`
+
+Apply the resolved mode:
+- `solo` → **skip all gates**. Note in output: `[GATE-ID] skipped — Solo mode`
+- `lean` → **skip unless this is a PHASE-GATE** (CD-PHASE-GATE, TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE). Note: `[GATE-ID] skipped — Lean mode`
+- `full` → spawn as normal
+
```
+# Apply mode check, then:
Spawn `[agent-name]` via Task:
- Gate: [GATE-ID] (see .claude/docs/director-gates.md)
- Context: [fields listed under that gate]
@@ -76,6 +87,7 @@ Spawn `[agent-name]` via Task:
For parallel spawning (multiple directors at the same gate point):
```
+# Apply mode check for each gate first, then spawn all that survive:
Spawn all [N] agents simultaneously via Task — issue all Task calls before
waiting for any result. Collect all verdicts before proceeding.
```
@@ -524,6 +536,86 @@ is invoked
---
+## Tier 1 — Art Director Gates
+
+Agent: `art-director` | Model tier: Sonnet | Domain: Visual identity, art bible, visual production readiness
+
+---
+
+### AD-CONCEPT-VISUAL — Visual Identity Anchor
+
+**Trigger**: After game pillars are locked (brainstorm Phase 4), in parallel with CD-PILLARS
+
+**Context to pass**:
+- Game concept (elevator pitch, core fantasy, unique hook)
+- Full pillar set with names, definitions, and design tests
+- Target platform (if known)
+- Any reference games or visual touchstones mentioned by the user
+
+**Prompt**:
+> "Based on these game pillars and core concept, propose 2-3 distinct visual identity
+> directions. For each direction provide: (1) a one-line visual rule that could guide
+> all visual decisions (e.g., 'everything must move', 'beauty is in the decay'), (2)
+> mood and atmosphere targets, (3) shape language (sharp/rounded/organic/geometric
+> emphasis), (4) color philosophy (palette direction, what colors mean in this world).
+> Be specific — avoid generic descriptions. One direction should directly serve the
+> primary design pillar. Name each direction. Recommend which best serves the stated
+> pillars and explain why."
+
+**Verdicts**: CONCEPTS (multiple valid options — user selects) / STRONG (one direction clearly dominant) / CONCERNS (pillars don't provide enough direction to differentiate visual identity yet)
+
+---
+
+### AD-ART-BIBLE — Art Bible Sign-Off
+
+**Trigger**: After the art bible is drafted (`/art-bible`), before asset production begins
+
+**Context to pass**:
+- Art bible path (`design/art/art-bible.md`)
+- Game pillars and core fantasy
+- Platform and performance constraints (from `.claude/docs/technical-preferences.md` if configured)
+- Visual identity anchor chosen during brainstorm (from `design/gdd/game-concept.md`)
+
+**Prompt**:
+> "Review this art bible for completeness and internal consistency. Does the color
+> system match the mood targets? Does the shape language follow from the visual
+> identity statement? Are the asset standards achievable within the platform
+> constraints? Does the character design direction give artists enough to work from
+> without over-specifying? Are there contradictions between sections? Would an
+> outsourcing team be able to produce assets from this document without additional
+> briefing? Return APPROVE (art bible is production-ready), CONCERNS [specific
+> sections needing clarification], or REJECT [fundamental inconsistencies that must
+> be resolved before asset production begins]."
+
+**Verdicts**: APPROVE / CONCERNS / REJECT
+
+---
+
+### AD-PHASE-GATE — Visual Readiness at Phase Transition
+
+**Trigger**: Always at `/gate-check` — spawn in parallel with CD-PHASE-GATE, TD-PHASE-GATE, and PR-PHASE-GATE
+
+**Context to pass**:
+- Target phase name
+- List of all art/visual artifacts present (file paths)
+- Visual identity anchor from `design/gdd/game-concept.md` (if present)
+- Art bible path if it exists (`design/art/art-bible.md`)
+
+**Prompt**:
+> "Review the current project state for [target phase] gate readiness from a visual
+> direction perspective. Is the visual identity established and documented at the
+> level this phase requires? Are the right visual artifacts in place? Would visual
+> teams be able to begin their work without visual direction gaps that cause costly
+> rework later? Are there visual decisions that are being deferred past their latest
+> responsible moment? Return READY, CONCERNS [specific visual direction gaps that
+> could cause production rework], or NOT READY [visual blockers that must exist
+> before this phase can succeed — specify what artifact is missing and why it
+> matters at this stage]."
+
+**Verdicts**: READY / CONCERNS / NOT READY
+
+---
+
## Tier 2 — Lead Gates
These gates are invoked by orchestration skills and senior skills when a domain
@@ -678,8 +770,9 @@ Spawn in parallel (issue all Task calls before waiting for any result):
1. creative-director → gate CD-PHASE-GATE
2. technical-director → gate TD-PHASE-GATE
3. producer → gate PR-PHASE-GATE
+4. art-director → gate AD-PHASE-GATE
-Collect all three verdicts, then apply escalation rules:
+Collect all four verdicts, then apply escalation rules:
- Any NOT READY / REJECT → overall verdict minimum FAIL
- Any CONCERNS → overall verdict minimum CONCERNS
- All READY / APPROVE → eligible for PASS (still subject to artifact checks)
@@ -704,10 +797,10 @@ When a new gate is needed for a new skill or workflow:
| Stage | Required Gates | Optional Gates |
|-------|---------------|----------------|
-| **Concept** | CD-PILLARS | TD-FEASIBILITY, PR-SCOPE |
+| **Concept** | CD-PILLARS, AD-CONCEPT-VISUAL | TD-FEASIBILITY, PR-SCOPE |
| **Systems Design** | TD-SYSTEM-BOUNDARY, CD-SYSTEMS, PR-SCOPE, CD-GDD-ALIGN (per GDD) | ND-CONSISTENCY, AD-VISUAL |
-| **Technical Setup** | TD-ARCHITECTURE, TD-ADR (per ADR), LP-FEASIBILITY | TD-ENGINE-RISK |
-| **Pre-Production** | PR-EPIC, QL-STORY-READY (per story), PR-SPRINT, all three PHASE-GATE (via gate-check) | CD-PLAYTEST |
-| **Production** | LP-CODE-REVIEW (per story), QL-STORY-READY, PR-SPRINT (per sprint) | PR-MILESTONE, QL-TEST-COVERAGE |
-| **Polish** | QL-TEST-COVERAGE, CD-PLAYTEST, PR-MILESTONE | |
-| **Release** | All three PHASE-GATE (via gate-check) | QL-TEST-COVERAGE |
+| **Technical Setup** | TD-ARCHITECTURE, TD-ADR (per ADR), LP-FEASIBILITY, AD-ART-BIBLE | TD-ENGINE-RISK |
+| **Pre-Production** | PR-EPIC, QL-STORY-READY (per story), PR-SPRINT, all four PHASE-GATEs (via gate-check) | CD-PLAYTEST |
+| **Production** | LP-CODE-REVIEW (per story), QL-STORY-READY, PR-SPRINT (per sprint) | PR-MILESTONE, QL-TEST-COVERAGE, AD-VISUAL |
+| **Polish** | QL-TEST-COVERAGE, CD-PLAYTEST, PR-MILESTONE | AD-VISUAL |
+| **Release** | All four PHASE-GATEs (via gate-check) | QL-TEST-COVERAGE |
diff --git a/.claude/docs/workflow-catalog.yaml b/.claude/docs/workflow-catalog.yaml
index d14fb0f..c1a7583 100644
--- a/.claude/docs/workflow-catalog.yaml
+++ b/.claude/docs/workflow-catalog.yaml
@@ -10,6 +10,9 @@
# required: true → blocks progression to next phase (shown as REQUIRED)
# required: false → optional enhancement (shown as OPTIONAL)
# repeatable: true → runs multiple times (one per system, story, etc.)
+#
+# Phase gates (/gate-check): verdicts are ADVISORY — they guide the decision
+# but never hard-block advancement. The user always decides whether to proceed.
phases:
@@ -47,6 +50,14 @@ phases:
required: false
description: "Validate the game concept (recommended before proceeding)"
+ - id: art-bible
+ name: "Art Bible"
+ command: /art-bible
+ required: true
+ artifact:
+ glob: "design/art/art-bible.md"
+ description: "Author the visual identity specification (9 sections). Uses the Visual Identity Anchor produced by /brainstorm. Run after game concept is formed, before systems design."
+
- id: map-systems
name: "Systems Map"
command: /map-systems
@@ -84,9 +95,16 @@ phases:
glob: "design/gdd/gdd-cross-review-*.md"
description: "Holistic consistency check + design theory review across all GDDs simultaneously"
+ - id: consistency-check
+ name: "Consistency Check"
+ command: /consistency-check
+ required: false
+ repeatable: true
+ description: "Scan all GDDs for contradictions, undefined references, and mechanic conflicts. Run after /review-all-gdds, and again any time a GDD is added or revised mid-project."
+
technical-setup:
label: "Technical Setup"
- description: "Architecture decisions, accessibility foundations, engine validation"
+ description: "Architecture decisions, visual identity specification, accessibility foundations, engine validation"
next_phase: pre-production
steps:
- id: create-architecture
@@ -132,9 +150,18 @@ phases:
pre-production:
label: "Pre-Production"
- description: "UX specs, prototype the core mechanic, define stories, validate fun"
+ description: "UX specs, asset specs, prototype the core mechanic, define stories, validate fun"
next_phase: production
steps:
+ - id: asset-spec
+ name: "Asset Specs"
+ command: /asset-spec
+ required: false
+ repeatable: true
+ artifact:
+ glob: "design/assets/asset-manifest.md"
+ description: "Generate per-asset visual specifications and AI generation prompts from approved GDDs and level docs. Run once per system/level/character."
+
- id: ux-design
name: "UX Specs (key screens)"
command: /ux-design
@@ -180,6 +207,14 @@ phases:
min_count: 2
description: "Break each epic into implementable story files. Run per epic: /create-stories [epic-slug]"
+ - id: test-setup
+ name: "Test Framework Setup"
+ command: /test-setup
+ required: false
+ artifact:
+ note: "Check tests/ directory for engine-specific test framework scaffold"
+ description: "Scaffold the test framework and CI pipeline once before the first sprint. Leads to /test-helpers for fixture generation, /qa-plan per epic, and /smoke-check per sprint."
+
- id: sprint-plan
name: "First Sprint Plan"
command: /sprint-plan
@@ -191,11 +226,12 @@ phases:
- id: vertical-slice
name: "Vertical Slice (playtested)"
+ command: /playtest-report
required: true
artifact:
glob: "production/playtests/*.md"
min_count: 1
- description: "Playable end-to-end core loop, playtested with ≥3 sessions. HARD GATE."
+ description: "Document vertical slice playtest sessions using /playtest-report. Run at least once here (≥1 session required before Production; ≥3 required before Polish). Each session should cover one complete run-through of the core loop."
production:
label: "Production"
@@ -224,7 +260,14 @@ phases:
repeatable: true
artifact:
note: "Check src/ for active code and production/epics/**/*.md for In Progress stories"
- description: "Pick the next ready story and implement it with /dev-story [story-path]. Routes to the correct programmer agent. Then run /code-review and /story-done."
+ description: "Pick the next ready story and implement it with /dev-story [story-path]. Routes to the correct programmer agent."
+
+ - id: code-review
+ name: "Code Review"
+ command: /code-review
+ required: false
+ repeatable: true
+ description: "Architectural code review after each story implementation. Run after /dev-story, before /story-done."
- id: story-done
name: "Story Done Review"
@@ -233,6 +276,33 @@ phases:
repeatable: true
description: "Verify all acceptance criteria, check GDD/ADR deviations, close the story"
+ - id: qa-plan
+ name: "QA Plan"
+ command: /qa-plan
+ required: false
+ repeatable: true
+ description: "Generate a QA test plan per epic or sprint. Run /qa-plan [epic-slug]. Produces test cases for /smoke-check, /regression-suite, and /test-evidence-review."
+
+ - id: bug-report
+ name: "Bug Report / Triage"
+ command: /bug-report
+ required: false
+ repeatable: true
+ description: "Log and prioritize bugs found during implementation. /bug-report creates a structured report; /bug-triage prioritizes the open backlog."
+
+ - id: retrospective
+ name: "Sprint Retrospective"
+ command: /retrospective
+ required: false
+ repeatable: true
+ description: "Post-sprint review to capture what worked and what to change. Run at the end of each sprint, before planning the next."
+
+ - id: team-feature
+ name: "Team Orchestration (optional)"
+ required: false
+ repeatable: true
+ description: "Coordinate multiple agents on a complex feature. Use: /team-combat, /team-narrative, /team-ui, /team-audio, /team-level, /team-live-ops, /team-qa. Run when a feature spans multiple agent domains."
+
- id: scope-check
name: "Scope Check"
command: /scope-check
diff --git a/.claude/hooks/session-stop.sh b/.claude/hooks/session-stop.sh
index ead7120..2b78ac2 100644
--- a/.claude/hooks/session-stop.sh
+++ b/.claude/hooks/session-stop.sh
@@ -11,17 +11,17 @@ mkdir -p "$SESSION_LOG_DIR" 2>/dev/null
RECENT_COMMITS=$(git log --oneline --since="8 hours ago" 2>/dev/null)
MODIFIED_FILES=$(git diff --name-only 2>/dev/null)
-# --- Clean up active session state on normal shutdown ---
+# --- Archive active session state on shutdown (do NOT delete) ---
+# active.md persists across clean exits so multi-session recovery works.
+# It is only valid to delete active.md manually or when explicitly superseded.
STATE_FILE="production/session-state/active.md"
if [ -f "$STATE_FILE" ]; then
- # Archive to session log before removing
{
echo "## Archived Session State: $TIMESTAMP"
cat "$STATE_FILE"
echo "---"
echo ""
} >> "$SESSION_LOG_DIR/session-log.md" 2>/dev/null
- rm "$STATE_FILE" 2>/dev/null
fi
if [ -n "$RECENT_COMMITS" ] || [ -n "$MODIFIED_FILES" ]; then
diff --git a/.claude/skills/adopt/SKILL.md b/.claude/skills/adopt/SKILL.md
index f7773c4..dca3fe0 100644
--- a/.claude/skills/adopt/SKILL.md
+++ b/.claude/skills/adopt/SKILL.md
@@ -4,7 +4,6 @@ description: "Brownfield onboarding — audits existing project artifacts for te
argument-hint: "[focus: full | gdds | adrs | stories | infra]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, AskUserQuestion
-context: fork
agent: technical-director
---
@@ -37,7 +36,10 @@ wrong internal format.
## Phase 1: Detect Project State
-Read silently before presenting anything.
+Emit one line before reading: `"Scanning project artifacts..."` — this confirms the
+skill is running during the silent read phase.
+
+Then read silently before presenting anything else.
### Existence check
- `production/stage.txt` — if present, read it (authoritative phase)
@@ -48,6 +50,7 @@ Read silently before presenting anything.
- Count story files: `production/epics/**/*.md` (excluding EPIC.md)
- `.claude/docs/technical-preferences.md` — engine configured?
- `docs/engine-reference/` — engine reference docs present?
+- Glob `docs/adoption-plan-*.md` — note the filename of the most recent prior plan if any exist
### Infer phase (if no stage.txt)
Use the same heuristic as `/project-stage-detect`:
@@ -58,9 +61,15 @@ Use the same heuristic as `/project-stage-detect`:
- game-concept.md exists → Concept
- Nothing → Fresh (not a brownfield project — suggest `/start`)
-If the project appears fresh (no artifacts at all), stop:
-> "This looks like a fresh project with no existing artifacts. Run `/start`
-> instead — `/adopt` is for projects that already have work to migrate."
+If the project appears fresh (no artifacts at all), use `AskUserQuestion`:
+- "This looks like a fresh project — no existing artifacts found. `/adopt` is for
+ projects with work to migrate. What would you like to do?"
+ - "Run `/start` — begin guided first-time onboarding"
+ - "My artifacts are in a non-standard location — help me find them"
+ - "Cancel"
+
+Then stop — do not proceed with the audit regardless of which option the user picks
+(each option leads to a different skill or manual investigation).
Report: "Detected phase: [phase]. Found: [N] GDDs, [M] ADRs, [P] stories."
@@ -247,7 +256,26 @@ Gap counts:
Estimated remediation: [X blocking items × ~Y min each = roughly Z hours]
```
-Ask: "May I write the full migration plan to `docs/adoption-plan-[date].md`?"
+Before asking to write, show a **Gap Preview**:
+- List every BLOCKING gap as a one-line bullet describing the actual problem
+ (e.g. `systems-index.md: 3 rows have parenthetical status values`,
+ `adr-0002.md: missing ## Status section`). No counts — show the actual items.
+- Show HIGH / MEDIUM / LOW as counts only (e.g. `HIGH: 4, MEDIUM: 2, LOW: 1`).
+
+This gives the user enough context to judge scope before committing to writing the file.
+
+If a prior adoption plan was detected in Phase 1, add a note:
+> "A previous plan exists at `docs/adoption-plan-[prior-date].md`. The new plan will
+> reflect current project state — it does not diff against the prior run."
+
+Use `AskUserQuestion`:
+- "Ready to write the migration plan?"
+ - "Yes — write `docs/adoption-plan-[date].md`"
+ - "Show me the full plan preview first (don't write yet)"
+ - "Cancel — I'll handle migration manually"
+
+If the user picks "Show me the full plan preview", output the complete plan as a
+fenced markdown block. Then ask again with the same three options.
---
@@ -261,7 +289,7 @@ If approved, write `docs/adoption-plan-[date].md` with this structure:
> **Generated**: [date]
> **Project phase**: [phase]
> **Engine**: [name + version, or "Not configured"]
-> **Template version**: v0.4.0+
+> **Template version**: v1.0+
Work through these steps in order. Check off each item as you complete it.
Re-run `/adopt` anytime to check remaining gaps.
@@ -334,29 +362,69 @@ are resolved. The new run will reflect the current state of the project.
---
+## Phase 6b: Set Review Mode
+
+After writing the adoption plan (or if the user cancels writing), check whether
+`production/review-mode.txt` exists.
+
+**If it exists**: Read it and note the current mode — "Review mode is already set to `[current]`." — skip the prompt.
+
+**If it does not exist**: Use `AskUserQuestion`:
+
+- **Prompt**: "One more setup step: how much design review would you like as you work through the workflow?"
+- **Options**:
+ - `Full` — Director specialists review at each key workflow step. Best for teams, learning the workflow, or when you want thorough feedback on every decision.
+ - `Lean (recommended)` — Directors only at phase gate transitions (/gate-check). Skips per-skill reviews. Balanced for solo devs and small teams.
+ - `Solo` — No director reviews at all. Maximum speed. Best for game jams, prototypes, or if reviews feel like overhead.
+
+Write the choice to `production/review-mode.txt` immediately after selection — no separate "May I write?" needed:
+- `Full` → write `full`
+- `Lean (recommended)` → write `lean`
+- `Solo` → write `solo`
+
+Create the `production/` directory if it does not exist.
+
+---
+
## Phase 7: Offer First Action
After writing the plan, don't stop there. Pick the single highest-priority gap
-and offer to handle it immediately:
-
-If there are parenthetical status values in systems-index.md:
-> "The most urgent fix is the systems-index.md status values — this breaks
-> multiple skills right now. I can fix these in-place in under 2 minutes.
-> Shall I edit the file now?"
-
-If ADRs are missing Status fields:
-> "The most urgent fix is adding Status fields to your ADRs. Shall I start
-> with `docs/architecture/adr-0001.md` using `/architecture-decision retrofit`?"
-
-If GDDs are missing Acceptance Criteria:
-> "The most important GDD gap is missing Acceptance Criteria — without these,
-> `/create-stories` can't generate stories. Shall I start with
-> `design/gdd/[highest-priority-system].md` using `/design-system retrofit`?"
+and offer to handle it immediately using `AskUserQuestion`. Choose the first
+branch that applies:
+**If there are parenthetical status values in systems-index.md:**
Use `AskUserQuestion`:
-- "What would you like to do now?"
- - Options: "Fix [most urgent gap] now", "Review the full plan first",
- "I'll work through the plan myself", "Run `/project-stage-detect` for broader context"
+- "The most urgent fix is `systems-index.md` — [N] rows have parenthetical status
+ values (e.g. `Needs Revision (see notes)`) that break /gate-check,
+ /create-stories, and /architecture-review right now. I can fix these in-place."
+ - "Fix it now — edit systems-index.md"
+ - "I'll fix it myself"
+ - "Done — leave me with the plan"
+
+**If ADRs are missing `## Status` (and no parenthetical issue):**
+Use `AskUserQuestion`:
+- "The most urgent fix is adding `## Status` to [N] ADR(s): [list filenames].
+ Without it, /story-readiness silently passes all ADR checks. Start with
+ [first affected filename]?"
+ - "Yes — retrofit [first affected filename] now"
+ - "Retrofit all [N] ADRs one by one"
+ - "I'll handle ADRs myself"
+
+**If GDDs are missing Acceptance Criteria (and no blocking issues above):**
+Use `AskUserQuestion`:
+- "The most urgent gap is missing Acceptance Criteria in [N] GDD(s):
+ [list filenames]. Without them, /create-stories can't generate stories.
+ Start with [highest-priority GDD filename]?"
+ - "Yes — add Acceptance Criteria to [GDD filename] now"
+ - "Do all [N] GDDs one by one"
+ - "I'll handle GDDs myself"
+
+**If no BLOCKING or HIGH gaps exist:**
+Use `AskUserQuestion`:
+- "No blocking gaps — this project is template-compatible. What next?"
+ - "Walk me through the medium-priority improvements"
+ - "Run /project-stage-detect for a broader health check"
+ - "Done — I'll work through the plan at my own pace"
---
diff --git a/.claude/skills/architecture-decision/SKILL.md b/.claude/skills/architecture-decision/SKILL.md
index adb37fe..4fecc57 100644
--- a/.claude/skills/architecture-decision/SKILL.md
+++ b/.claude/skills/architecture-decision/SKILL.md
@@ -3,15 +3,19 @@ name: architecture-decision
description: "Creates an Architecture Decision Record (ADR) documenting a significant technical decision, its context, alternatives considered, and consequences. Every major technical choice should have an ADR."
argument-hint: "[title] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Task
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
---
When this skill is invoked:
## 0. Parse Arguments — Detect Retrofit Mode
-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
**If the argument starts with `retrofit` followed by a file path**
(e.g., `/architecture-decision retrofit docs/architecture/adr-0001-event-system.md`):
@@ -163,33 +167,61 @@ or explicitly accepted as an intentional exception.
## 3. Guide the decision collaboratively
-Ask clarifying questions if the title alone is not sufficient. For each major
-section, present 2-4 options with pros/cons before drafting. Do not generate
-the ADR until the key decision is confirmed by the user.
+Before asking anything, derive the skill's best guesses from the context already
+gathered (GDDs read, engine reference loaded, existing ADRs scanned). Then present
+a **confirm/adjust** prompt using `AskUserQuestion` — not open-ended questions.
-Key questions to ask:
-- What problem are we solving? What breaks if we don't decide this now?
-- What constraints apply (engine version, platform, performance budget)?
-- What alternatives have you already considered?
-- Which post-cutoff engine features (if any) does this decision depend on?
-- **Which GDD systems motivated this decision?** For each, what specific
- requirement (rule, formula, performance constraint, integration point) in
- that GDD cannot be satisfied without this architectural decision?
+**Derive assumptions first:**
+- **Problem**: Infer from the title + GDD context what decision needs to be made
+- **Alternatives**: Propose 2-3 concrete options from engine reference + GDD requirements
+- **Dependencies**: Scan existing ADRs for upstream dependencies; assume None if unclear
+- **GDD linkage**: Extract which GDD systems the title directly relates to
+- **Status**: Always `Proposed` for new ADRs — never ask the user what the status is
-If the decision is foundational (no GDD drives it directly), ask:
-- Which GDD systems will this decision constrain or enable?
+**Scope of assumptions tab**: Assumptions cover only: problem framing, alternative approaches, upstream dependencies, GDD linkage, and status. Schema design questions (e.g., "How should spawn timing work?", "Should data be inline or external?") are NOT assumptions — they are design decisions belonging to a separate step after the assumptions are confirmed. Do not include schema design questions in the assumptions AskUserQuestion widget.
-This GDD linkage becomes a mandatory "GDD Requirements Addressed" section
-in the ADR. Do not skip it.
+**After assumptions are confirmed**, if the ADR involves schema or data design choices, use a separate multi-tab `AskUserQuestion` to ask each design question independently before drafting.
-**Does this ADR have ordering constraints?** Ask:
-- Does this decision depend on any other ADR that isn't yet Accepted? (If
- so, this ADR cannot be safely implemented until that one is resolved.)
-- Does accepting this ADR unlock or unblock any other pending decisions?
-- Does this ADR block any specific epic or story from starting?
+**Present assumptions with `AskUserQuestion`:**
-Record the answers in the **ADR Dependencies** section. If no ordering
-constraints exist, write "None" in each field.
+```
+Here's what I'm assuming before drafting:
+
+Problem: [one-sentence problem statement derived from context]
+Alternatives I'll consider:
+ A) [option derived from engine reference]
+ B) [option derived from GDD requirements]
+ C) [option from common patterns]
+GDD systems driving this: [list derived from context]
+Dependencies: [upstream ADRs if any, otherwise "None"]
+Status: Proposed
+
+[A] Proceed — draft with these assumptions
+[B] Change the alternatives list
+[C] Adjust the GDD linkage
+[D] Add a performance budget constraint
+[E] Something else needs changing first
+```
+
+Do not generate the ADR until the user confirms assumptions or provides corrections.
+
+**After engine specialist and TD reviews return** (Step 4.5/4.6), if unresolved
+decisions remain, present each one as a separate `AskUserQuestion` with the proposed
+options as choices plus a free-text escape:
+
+```
+Decision: [specific unresolved point]
+[A] [option from specialist review]
+[B] [alternative option]
+[C] Different approach — I'll describe it
+```
+
+**ADR Dependencies** — derive from existing ADRs, then confirm:
+- Does this decision depend on any other ADR not yet Accepted?
+- Does it unlock or unblock any other ADR or epic?
+- Does it block any specific epic from starting?
+
+Record answers in the **ADR Dependencies** section. Write "None" for each field if no constraints apply.
---
@@ -312,14 +344,48 @@ to implement it.]
- If the specialist identifies a **blocking issue** (wrong API, deprecated approach, engine version incompatibility): revise the Decision and Engine Compatibility sections accordingly, then confirm the changes with the user before proceeding
- If the specialist finds **minor notes** only: incorporate them into the ADR's Risks subsection
+**Review mode check** — apply before spawning TD-ADR:
+- `solo` → skip. Note: "TD-ADR skipped — Solo mode." Proceed to Step 4.7 (GDD sync check).
+- `lean` → skip (not a PHASE-GATE). Note: "TD-ADR skipped — Lean mode." Proceed to Step 4.7 (GDD sync check).
+- `full` → spawn as normal.
+
4.6. **Technical Director Strategic Review** — After the engine specialist validation, spawn `technical-director` via Task using gate **TD-ADR** (`.claude/docs/director-gates.md`):
- Pass: the ADR file path (or draft content), engine version, domain, any existing ADRs in the same domain
- The TD validates architectural coherence (is this decision consistent with the whole system?) — distinct from the engine specialist's API-level check
- If CONCERNS or REJECT: revise the Decision or Alternatives sections accordingly before proceeding
-5. Ask: "May I write this ADR to `docs/architecture/adr-[NNNN]-[slug].md`?"
+4.7. **GDD Sync Check** — Before presenting the write approval, scan all GDDs
+referenced in the "GDD Requirements Addressed" section for naming inconsistencies
+with the ADR's Key Interfaces and Decision sections (renamed signals, API methods,
+or data types). If any are found, surface them as a **prominent warning block**
+immediately before the write approval — not as a footnote:
-If yes, write the file, creating the directory if needed.
+```
+⚠️ GDD SYNC REQUIRED
+[gdd-filename].md uses names this ADR has renamed:
+ [old_name] → [new_name_from_adr]
+ [old_name_2] → [new_name_2_from_adr]
+The GDD must be updated before or alongside writing this ADR to prevent
+developers reading the GDD from implementing the wrong interface.
+```
+
+If no inconsistencies: skip this block silently.
+
+5. **Write approval** — Use `AskUserQuestion`:
+
+If GDD sync issues were found:
+- "ADR draft is complete. How would you like to proceed?"
+ - [A] Write ADR + update GDD in the same pass
+ - [B] Write ADR only — I'll update the GDD manually
+ - [C] Not yet — I need to review further
+
+If no GDD sync issues:
+- "ADR draft is complete. May I write it?"
+ - [A] Write ADR to `docs/architecture/adr-[NNNN]-[slug].md`
+ - [B] Not yet — I need to review further
+
+If yes to any write option, write the file, creating the directory if needed.
+For option [A] with GDD update: also update the GDD file(s) to use the new names.
6. **Update Architecture Registry**
@@ -340,10 +406,50 @@ Registry candidates from this ADR:
EXISTING (referenced_by update only): player_health → already registered ✅
```
-Ask: "May I update `docs/registry/architecture.yaml` with these [N] new stances?"
+**Registry append logic**: When writing to `docs/registry/architecture.yaml`, do NOT assume sections are empty. The file may already have entries from previous ADRs written in this session. Before each Edit call:
+1. Read the current state of `docs/registry/architecture.yaml`
+2. Find the correct section (state_ownership, interfaces, forbidden_patterns, api_decisions)
+3. Append the new entry AFTER the last existing entry in that section — do not try to replace a `[]` placeholder that may no longer exist
+4. If the section has entries already, use the closing content of the last entry as the `old_string` anchor, and append the new entry after it
-If yes: append new entries. Never modify existing entries — if a stance is
-changing, set the old entry to `status: superseded_by: ADR-[NNNN]` and add
-the new entry.
+**BLOCKING — do not write to `docs/registry/architecture.yaml` without explicit user approval.**
-**Next Steps:** Run `/architecture-review` to validate coverage after the ADR is saved. Update any stories that were `Status: Blocked` pending this ADR to `Status: Ready`.
+Ask using `AskUserQuestion`:
+- "May I update `docs/registry/architecture.yaml` with these [N] new stances?"
+ - Options: "Yes — update the registry", "Not yet — I want to review the candidates", "Skip registry update"
+
+Only proceed if the user selects yes. If yes: append new entries. Never modify existing entries — if a stance is
+changing, set the old entry to `status: superseded_by: ADR-[NNNN]` and add the new entry.
+
+---
+
+## 7. Closing Next Steps
+
+After the ADR is written (and registry optionally updated), close with `AskUserQuestion`.
+
+Before generating the widget:
+1. Read `docs/registry/architecture.yaml` — check if any priority ADRs are still unwritten (look for ADRs flagged in technical-preferences.md or systems-index.md as prerequisites)
+2. Check if all prerequisite ADRs are now written. If yes, include a "Start writing GDDs" option.
+3. List ALL remaining priority ADRs as individual options — not just the next one or two.
+
+Widget format:
+```
+ADR-[NNNN] written and registry updated. What would you like to do next?
+[1] Write [next-priority-adr-name] — [brief description from prerequisites list]
+[2] Write [another-priority-adr] — [brief description] (include ALL remaining ones)
+[N] Start writing GDDs — run `/design-system [first-undesigned-system]` (only show if all prerequisite ADRs are written)
+[N+1] Stop here for this session
+```
+
+If there are no remaining priority ADRs and no undesigned GDD systems, offer only "Stop here" and suggest running `/architecture-review` in a fresh session.
+
+**Always include this fixed notice in the closing output (do NOT omit it):**
+
+> To validate ADR coverage against your GDDs, open a **fresh Claude Code session**
+> and run `/architecture-review`.
+>
+> **Never run `/architecture-review` in the same session as `/architecture-decision`.**
+> The reviewing agent must be independent of the authoring context to give an unbiased
+> assessment. Running it here would invalidate the review.
+
+Update any stories that were `Status: Blocked` pending this ADR to `Status: Ready`.
diff --git a/.claude/skills/architecture-review/SKILL.md b/.claude/skills/architecture-review/SKILL.md
index 844e0fa..381cacd 100644
--- a/.claude/skills/architecture-review/SKILL.md
+++ b/.claude/skills/architecture-review/SKILL.md
@@ -3,8 +3,7 @@ name: architecture-review
description: "Validates completeness and consistency of the project architecture against all GDDs. Builds a traceability matrix mapping every GDD technical requirement to ADRs, identifies coverage gaps, detects cross-ADR conflicts, verifies engine compatibility consistency across all decisions, and produces a PASS/CONCERNS/FAIL verdict. The architecture equivalent of /design-review."
argument-hint: "[focus: full | coverage | consistency | engine | single-gdd path/to/gdd.md]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Task
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
agent: technical-director
model: opus
---
@@ -452,10 +451,11 @@ FAIL: Critical gaps (Foundation/Core layer requirements uncovered),
## Phase 8: Write and Update Traceability Index
-Ask: "May I write this review to `docs/architecture/architecture-review-[date].md`?"
-
-Also ask: "May I update `docs/architecture/architecture-traceability.md` with the
-current matrix? This is the living index that future reviews update incrementally."
+Use `AskUserQuestion` for the write approval:
+- "Review complete. What would you like to write?"
+ - [A] Write all three files (review report + traceability index + TR registry)
+ - [B] Write review report only — `docs/architecture/architecture-review-[date].md`
+ - [C] Don't write anything yet — I need to review the findings first
### RTM Output (rtm mode only)
@@ -596,7 +596,7 @@ Engine: [name + version]
## Phase 9: Handoff
-After completing the review:
+After completing the review and writing approved files, present:
1. **Immediate actions**: List the top 3 ADRs to create (highest-impact gaps first,
Foundation layer before Feature layer)
@@ -605,6 +605,12 @@ After completing the review:
3. **Rerun trigger**: "Re-run `/architecture-review` after each new ADR is written
to verify coverage improves"
+Then close with `AskUserQuestion`:
+- "Architecture review complete. What would you like to do next?"
+ - [A] Write a missing ADR — open a fresh session and run `/architecture-decision [system]`
+ - [B] Run `/gate-check pre-production` — if all blocking gaps are resolved
+ - [C] Stop here for this session
+
---
## Error Recovery Protocol
diff --git a/.claude/skills/art-bible/SKILL.md b/.claude/skills/art-bible/SKILL.md
new file mode 100644
index 0000000..40a0527
--- /dev/null
+++ b/.claude/skills/art-bible/SKILL.md
@@ -0,0 +1,214 @@
+---
+name: art-bible
+description: "Guided, section-by-section Art Bible authoring. Creates the visual identity specification that gates all asset production. Run after /brainstorm is approved and before /map-systems or any GDD authoring begins."
+argument-hint: "[--review full|lean|solo]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
+---
+
+## Phase 0: Parse Arguments and Context Check
+
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
+
+Read `design/gdd/game-concept.md`. If it does not exist, fail with:
+> "No game concept found. Run `/brainstorm` first — the art bible is authored after the game concept is approved."
+
+Extract from game-concept.md:
+- Game title (working title)
+- Core fantasy and elevator pitch
+- Game pillars (all of them)
+- **Visual Identity Anchor** section if present (from brainstorm Phase 4 art-director output)
+- Target platform (if noted)
+
+Read `design/art/art-bible.md` if it exists — this is **resume mode**. Read which sections already have real content vs. placeholders. Only work on missing sections.
+
+Read `.claude/docs/technical-preferences.md` if it exists — extract performance budgets and engine for asset standard constraints.
+
+---
+
+## Phase 1: Framing
+
+Present the session context and ask two questions before authoring anything:
+
+Use `AskUserQuestion` with two tabs:
+- Tab **"Scope"** — "Which sections need to be authored today?"
+ Options: `Full bible — all 9 sections` / `Visual identity core (sections 1–4 only)` / `Asset standards only (section 8)` / `Resume — fill in missing sections`
+- Tab **"References"** — "Do you have reference games, films, or art that define the visual direction?"
+ (Free text — let the user type specific titles. Do NOT preset options here.)
+
+If the game-concept.md has a Visual Identity Anchor section, note it:
+> "Found a visual identity anchor from brainstorm: '[anchor name] — [one-line rule]'. I'll use this as the foundation for the art bible."
+
+---
+
+## Phase 2: Visual Identity Foundation (Sections 1–4)
+
+These four sections define the core visual language. **All other sections flow from them.** Author and write each to file before moving to the next.
+
+### Section 1: Visual Identity Statement
+
+**Goal**: A one-line visual rule plus 2–3 supporting principles that resolve visual ambiguity.
+
+If a visual anchor exists from game-concept.md: present it and ask:
+- "Build directly from this anchor?"
+- "Revise it before expanding?"
+- "Start fresh with new options?"
+
+**Agent delegation (MANDATORY)**: Spawn `art-director` via Task:
+- Provide: game concept (elevator pitch, core fantasy), full pillar set, platform target, any reference games/art from Phase 1 framing, the visual anchor if it exists
+- Ask: "Draft a Visual Identity Statement for this game. Provide: (1) a one-line visual rule that could resolve any visual decision ambiguity, (2) 2–3 supporting visual principles, each with a one-sentence design test ('when X is ambiguous, this principle says choose Y'). Anchor all principles directly in the stated pillars — each principle must serve a specific pillar."
+
+Present the art-director's draft to the user. Use `AskUserQuestion`:
+- Options: `[A] Lock this in` / `[B] Revise the one-liner` / `[C] Revise a supporting principle` / `[D] Describe my own direction`
+
+Write the approved section to file immediately.
+
+### Section 2: Mood & Atmosphere
+
+**Goal**: Emotional targets by game state — specific enough for a lighting artist to work from.
+
+For each major game state (e.g., exploration, combat, victory, defeat, menus — adapt to this game's states), define:
+- Primary emotion/mood target
+- Lighting character (time of day, color temperature, contrast level)
+- Atmospheric descriptors (3–5 adjectives)
+- Energy level (frenetic / measured / contemplative / etc.)
+
+**Agent delegation**: Spawn `art-director` via Task with the Visual Identity Statement and pillar set. Ask: "Define mood and atmosphere targets for each major game state in this game. Be specific — 'dark and foreboding' is not enough. Name the exact emotional target, the lighting character (warm/cool, high/low contrast, time of day direction), and at least one visual element that carries the mood. Each game state must feel visually distinct from the others."
+
+Write the approved section to file immediately.
+
+### Section 3: Shape Language
+
+**Goal**: The geometric vocabulary that makes this game's world visually coherent and distinguishable.
+
+Cover:
+- Character silhouette philosophy (how readable at thumbnail size? Distinguishing trait per archetype?)
+- Environment geometry (angular/curved/organic/geometric — which dominates and why?)
+- UI shape grammar (does UI echo the world aesthetic, or is it a distinct HUD language?)
+- Hero shapes vs. supporting shapes (what draws the eye, what recedes?)
+
+**Agent delegation**: Spawn `art-director` via Task with Visual Identity Statement and mood targets. Ask: "Define the shape language for this game. Connect each shape principle back to the visual identity statement and a specific game pillar. Explain what these shape choices communicate to the player emotionally."
+
+Write the approved section to file immediately.
+
+### Section 4: Color System
+
+**Goal**: A complete, producible palette system that serves both aesthetic and communication needs.
+
+Cover:
+- Primary palette (5–7 colors with roles — not just hex codes, but what each color means in this world)
+- Semantic color usage (what does red communicate? Gold? Blue? White? Establish the color vocabulary)
+- Per-biome or per-area color temperature rules (if the game has distinct areas)
+- UI palette (may differ from world palette — define the divergence explicitly)
+- Colorblind safety: which semantic colors need shape/icon/sound backup
+
+**Agent delegation**: Spawn `art-director` via Task with Visual Identity Statement and mood targets. Ask: "Design the color system for this game. Every semantic color assignment must be explained — why does this color mean danger/safety/reward in this world? Identify which color pairs might fail colorblind players and specify what backup cues are needed."
+
+Write the approved section to file immediately.
+
+---
+
+## Phase 3: Production Guides (Sections 5–8)
+
+These sections translate the visual identity into concrete production rules. They should be specific enough that an outsourcing team can follow them without additional briefing.
+
+### Section 5: Character Design Direction
+
+**Agent delegation**: Spawn `art-director` via Task with sections 1–4. Ask: "Define character design direction for this game. Cover: visual archetype for the player character (if any), distinguishing feature rules per character type (how do players tell enemies/NPCs/allies apart at a glance?), expression/pose style targets (stiff/expressive/realistic/exaggerated), and LOD philosophy (how much detail is preserved at game camera distance?)."
+
+Write the approved section to file.
+
+### Section 6: Environment Design Language
+
+**Agent delegation**: Spawn `art-director` via Task with sections 1–4. Ask: "Define the environment design language for this game. Cover: architectural style and its relationship to the world's culture/history, texture philosophy (painted vs. PBR vs. stylized — why this choice for this game?), prop density rules (sparse/dense — what drives the choice per area type?), and environmental storytelling guidelines (what visual details should tell the story without text?)."
+
+Write the approved section to file.
+
+### Section 7: UI/HUD Visual Direction
+
+**Agent delegation**: Spawn in parallel:
+- **`art-director`**: Visual style for UI — diegetic vs. screen-space HUD, typography direction (font personality, weight, size hierarchy), iconography style (flat/outlined/illustrated/photorealistic), animation feel for UI elements
+- **`ux-designer`**: UX alignment check — does the visual direction support the interaction patterns this game requires? Flag any conflicts between art direction and readability/accessibility needs.
+
+Collect both. If they conflict (e.g., art-director wants elaborate diegetic UI but ux-designer flags it would reduce combat readability), surface the conflict explicitly with both positions. Do NOT silently resolve — use `AskUserQuestion` to let the user decide.
+
+Write the approved section to file.
+
+### Section 8: Asset Standards
+
+**Agent delegation**: Spawn in parallel:
+- **`art-director`**: File format preferences, naming convention direction, texture resolution tiers, LOD level expectations, export settings philosophy
+- **`technical-artist`**: Engine-specific hard constraints — poly count budgets per asset category, texture memory limits, material slot counts, importer constraints, anything from the performance budgets in `.claude/docs/technical-preferences.md`
+
+If any art preference conflicts with a technical constraint (e.g., art-director wants 4K textures but performance budget requires 2K for mobile), resolve the conflict explicitly — note both the ideal and the constrained standard, and explain the tradeoff. Ambiguity in asset standards is where production costs are born.
+
+Write the approved section to file.
+
+---
+
+## Phase 4: Reference Direction (Section 9)
+
+**Goal**: A curated reference set that is specific about what to take and what to avoid from each source.
+
+**Agent delegation**: Spawn `art-director` via Task with the completed sections 1–8. Ask: "Compile a reference direction for this game. Provide 3–5 reference sources (games, films, art styles, or specific artists). For each: name it, specify exactly what visual element to draw from it (not 'the general aesthetic' — a specific technique, color choice, or compositional rule), and specify what to explicitly avoid or diverge from (to prevent the 'trying to copy X' reading). References should be additive — no two references should be pointing in exactly the same direction."
+
+Write the approved section to file.
+
+---
+
+## Phase 5: Art Director Sign-Off
+
+**Review mode check** — apply before spawning AD-ART-BIBLE:
+- `solo` → skip. Note: "AD-ART-BIBLE skipped — Solo mode." Proceed to Phase 6.
+- `lean` → skip (not a PHASE-GATE). Note: "AD-ART-BIBLE skipped — Lean mode." Proceed to Phase 6.
+- `full` → spawn as normal.
+
+After all sections are complete (or the scoped set from Phase 1 is complete), spawn `creative-director` via Task using gate **AD-ART-BIBLE** (`.claude/docs/director-gates.md`).
+
+Pass: art bible file path, game pillars, visual identity anchor.
+
+Handle verdict per standard rules in `director-gates.md`. Record the verdict in the art bible's status header:
+`> **Art Director Sign-Off (AD-ART-BIBLE)**: APPROVED [date] / CONCERNS (accepted) [date] / REVISED [date]`
+
+---
+
+## Phase 6: Close
+
+Before presenting next steps, check project state:
+- Does `design/gdd/systems-index.md` exist? → map-systems is done, skip that option
+- Does `.claude/docs/technical-preferences.md` contain a configured engine (not `[TO BE CONFIGURED]`)? → setup-engine is done, skip that option
+- Does `design/gdd/` contain any `*.md` files? → design-system has been run, skip that option
+- Does `design/gdd/gdd-cross-review-*.md` exist? → review-all-gdds is done
+- Do GDDs exist (check above)? → include /consistency-check option
+
+Use `AskUserQuestion` for next steps. Only include options that are genuinely next based on the state check above:
+
+**Option pool — include only if not already done:**
+- `[_] Run /map-systems — decompose the concept into systems before writing GDDs` (skip if systems-index.md exists)
+- `[_] Run /setup-engine — configure the engine (asset standards may need revisiting after engine is set)` (skip if engine configured)
+- `[_] Run /design-system — start the first GDD` (skip if any GDDs exist)
+- `[_] Run /review-all-gdds — cross-GDD consistency check (required before Technical Setup gate)` (skip if gdd-cross-review-*.md exists)
+- `[_] Run /asset-spec — generate per-asset visual specs and AI generation prompts from approved GDDs` (include if GDDs exist)
+- `[_] Run /consistency-check — scan existing GDDs against the art bible for visual direction conflicts` (include if GDDs exist)
+- `[_] Run /create-architecture — author the master architecture document (next Technical Setup step)`
+- `[_] Stop here`
+
+Assign letters A, B, C… only to the options actually included. Mark the most logical pipeline-advancing option as `(recommended)`.
+
+> **Always include** `/create-architecture` and Stop here as options — these are always valid next steps once the art bible is complete.
+
+---
+
+## Collaborative Protocol
+
+Every section follows: **Question → Options → Decision → Draft (from art-director agent) → Approval → Write to file**
+
+- Never draft a section without first spawning the relevant agent(s)
+- Write each section to file immediately after approval — do not batch
+- Surface all agent disagreements to the user — never silently resolve conflicts between art-director and technical-artist
+- The art bible is a constraint document: it restricts future decisions in exchange for visual coherence. Every section should feel like it narrows the solution space productively.
diff --git a/.claude/skills/asset-audit/SKILL.md b/.claude/skills/asset-audit/SKILL.md
index 476ca2b..3edfb4c 100644
--- a/.claude/skills/asset-audit/SKILL.md
+++ b/.claude/skills/asset-audit/SKILL.md
@@ -4,7 +4,6 @@ description: "Audits game assets for compliance with naming conventions, file si
argument-hint: "[category|all]"
user-invocable: true
allowed-tools: Read, Glob, Grep
-context: fork
# Read-only diagnostic skill — no specialist agent delegation needed
---
diff --git a/.claude/skills/asset-spec/SKILL.md b/.claude/skills/asset-spec/SKILL.md
new file mode 100644
index 0000000..a343618
--- /dev/null
+++ b/.claude/skills/asset-spec/SKILL.md
@@ -0,0 +1,257 @@
+---
+name: asset-spec
+description: "Generate per-asset visual specifications and AI generation prompts from GDDs, level docs, or character profiles. Produces structured spec files and updates the master asset manifest. Run after art bible and GDD/level design are approved, before production begins."
+argument-hint: "[system: | level: | character:] [--review full|lean|solo]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
+---
+
+If no argument is provided, check whether `design/assets/asset-manifest.md` exists:
+- If it exists: read it, find the first context (system/level/character) with any asset at status "Needed" but no spec file written yet, and use `AskUserQuestion`:
+ - Prompt: "The next unspecced context is **[target]**. Generate asset specs for it?"
+ - Options: `[A] Yes — spec [target]` / `[B] Pick a different target` / `[C] Stop here`
+- If no manifest: fail with:
+ > "Usage: `/asset-spec system:` — e.g., `/asset-spec system:tower-defense`
+ > Or: `/asset-spec level:iron-gate-fortress` / `/asset-spec character:frost-warden`
+ > Run after your art bible and GDDs are approved."
+
+---
+
+## Phase 0: Parse Arguments
+
+Extract:
+- **Target type**: `system`, `level`, or `character`
+- **Target name**: the name after the colon (normalize to kebab-case)
+- **Review mode**: `--review [full|lean|solo]` if present
+
+**Mode behavior:**
+- `full` (default): spawn both `art-director` and `technical-artist` in parallel
+- `lean`: spawn `art-director` only — faster, skips technical constraint pass
+- `solo`: no agent spawning — main session writes specs from art bible rules alone. Use for simple asset categories or when speed matters more than depth.
+
+---
+
+## Phase 1: Gather Context
+
+Read all source material **before** asking the user anything.
+
+### Required reads:
+- **Art bible**: Read `design/art/art-bible.md` — fail if missing:
+ > "No art bible found. Run `/art-bible` first — asset specs are anchored to the art bible's visual rules and asset standards."
+ Extract: Visual Identity Statement, Color System (semantic colors), Shape Language, Asset Standards (Section 8 — dimensions, formats, polycount budgets, texture resolution tiers).
+
+- **Technical preferences**: Read `.claude/docs/technical-preferences.md` — extract performance budgets and naming conventions.
+
+### Source doc reads (by target type):
+- **system**: Read `design/gdd/[target-name].md`. Extract the **Visual/Audio Requirements** section. If it doesn't exist or reads `[To be designed]`:
+ > "The Visual/Audio section of `design/gdd/[target-name].md` is empty. Either run `/design-system [target-name]` to complete the GDD, or describe the visual needs manually."
+ Use `AskUserQuestion`: `[A] Describe needs manually` / `[B] Stop — complete the GDD first`
+- **level**: Read `design/levels/[target-name].md`. Extract art requirements, asset list, VFX needs, and the art-director's production concept specs from Step 4.
+- **character**: Read `design/narrative/characters/[target-name].md` or search `design/narrative/` for the character profile. Extract visual description, role, and any specified distinguishing features.
+
+### Optional reads:
+- **Existing manifest**: Read `design/assets/asset-manifest.md` if it exists — extract already-specced assets for this target to avoid duplicates.
+- **Related specs**: Glob `design/assets/specs/*.md` — scan for assets that could be shared (e.g., a common UI element specced for one system might apply here too).
+
+### Present context summary:
+> **Asset Spec: [Target Type] — [Target Name]**
+> - Source doc: [path] — [N] asset types identified
+> - Art bible: found — Asset Standards at Section 8
+> - Existing specs for this target: [N already specced / none]
+> - Shared assets found in other specs: [list or "none"]
+
+---
+
+## Phase 2: Asset Identification
+
+From the source doc, extract every asset type mentioned — explicit and implied.
+
+**For systems**: look for VFX events, sprite references, UI elements, audio triggers, particle effects, icon needs, and any "visual feedback" language.
+
+**For levels**: look for unique environment props, atmospheric VFX, lighting setups, ambient audio, skybox/background, and any area-specific materials.
+
+**For characters**: look for sprite sheets (idle, walk, attack, death), portrait/avatar, VFX attached to abilities, UI representation (icon, health bar skin).
+
+Group assets into categories:
+- **Sprite / 2D Art** — character sprites, UI icons, tile sheets
+- **VFX / Particles** — hit effects, ambient particles, screen effects
+- **Environment** — props, tiles, backgrounds, skyboxes
+- **UI** — HUD elements, menu art, fonts (if custom)
+- **Audio** — SFX, music tracks, ambient loops *(note: audio specs are descriptions only — no generation prompts)*
+- **3D Assets** — meshes, materials (if applicable per engine)
+
+Present the full identified list to the user. Use `AskUserQuestion`:
+- Prompt: "I identified [N] assets across [N] categories for **[target]**. Review before speccing:"
+- Show the grouped list in conversation text first
+- Options: `[A] Proceed — spec all of these` / `[B] Remove some assets` / `[C] Add assets I didn't catch` / `[D] Adjust categories`
+
+Do NOT proceed to Phase 3 without user confirmation of the asset list.
+
+---
+
+## Phase 3: Spec Generation
+
+Spawn specialist agents based on review mode. **Issue all Task calls simultaneously — do not wait for one before starting the next.**
+
+### Full mode — spawn in parallel:
+
+**`art-director`** via Task:
+- Provide: full asset list from Phase 2, art bible Visual Identity Statement, Color System, Shape Language, the source doc's visual requirements, and any reference games/art mentioned in the art bible Section 9
+- Ask: "For each asset in this list, produce: (1) a 2–3 sentence visual description anchored to the art bible's shape language and color system — be specific enough that two different artists would produce consistent results; (2) a generation prompt ready for use with AI image tools (Midjourney/Stable Diffusion style — include style keywords, composition, color palette anchors, negative prompts); (3) which art bible rules directly govern this asset (cite by section). For audio assets, describe the sonic character instead of a generation prompt."
+
+**`technical-artist`** via Task:
+- Provide: full asset list, art bible Asset Standards (Section 8), technical-preferences.md performance budgets, engine name and version
+- Ask: "For each asset in this list, specify: (1) exact dimensions or polycount (match the art bible Asset Standards tiers — do not invent new sizes); (2) file format and export settings; (3) naming convention (from technical-preferences.md); (4) any engine-specific constraints this asset type must respect; (5) LOD requirements if applicable. Flag any asset type where the art bible's preferred standard conflicts with the engine's constraints."
+
+### Lean mode — spawn art-director only (skip technical-artist).
+
+### Solo mode — skip both. Derive specs from art bible rules alone, noting that technical constraints were not validated.
+
+**Collect both responses before Phase 4.** If any conflict exists between art-director and technical-artist (e.g., art-director specifies 4K textures but technical-artist flags the engine budget requires 512px), surface it explicitly — do NOT silently resolve.
+
+---
+
+## Phase 4: Compile and Review
+
+Combine the agent outputs into a draft spec per asset. Present all specs in conversation text using this format:
+
+```
+## ASSET-[NNN] — [Asset Name]
+
+| Field | Value |
+|-------|-------|
+| Category | [Sprite / VFX / Environment / UI / Audio / 3D] |
+| Dimensions | [e.g. 256×256px, 4-frame sprite sheet] |
+| Format | [PNG / SVG / WAV / etc.] |
+| Naming | [e.g. vfx_frost_hit_01.png] |
+| Polycount | [if 3D — e.g. <800 tris] |
+| Texture Res | [e.g. 512px — matches Art Bible §8 Tier 2] |
+
+**Visual Description:**
+[2–3 sentences. Specific enough for two artists to produce consistent results.]
+
+**Art Bible Anchors:**
+- §3 Shape Language: [relevant rule applied]
+- §4 Color System: [color role — e.g. "uses Threat Blue per semantic color rules"]
+
+**Generation Prompt:**
+[Ready-to-use prompt. Include: style keywords, composition notes, color palette anchors, lighting direction, negative prompts.]
+
+**Status:** Needed
+```
+
+After presenting all specs, use `AskUserQuestion`:
+- Prompt: "Asset specs for **[target]** — [N] assets. Review complete?"
+- Options: `[A] Approve all — write to file` / `[B] Revise a specific asset` / `[C] Regenerate with different direction`
+
+If [B]: ask which asset and what to change. Revise inline and re-present. Do NOT re-spawn agents for minor text revisions — only re-spawn if the visual direction itself needs to change.
+
+If [C]: ask what direction to change. Re-spawn the relevant agent with the updated brief.
+
+---
+
+## Phase 5: Write Spec File
+
+After approval, ask: "May I write the spec to `design/assets/specs/[target-name]-assets.md`?"
+
+Write the file with:
+
+```markdown
+# Asset Specs — [Target Type]: [Target Name]
+
+> **Source**: [path to source GDD/level/character doc]
+> **Art Bible**: design/art/art-bible.md
+> **Generated**: [date]
+> **Status**: [N] assets specced / [N] approved / [N] in production / [N] done
+
+[all asset specs in ASSET-NNN format]
+```
+
+Then update `design/assets/asset-manifest.md`. If it doesn't exist, create it:
+
+```markdown
+# Asset Manifest
+
+> Last updated: [date]
+
+## Progress Summary
+
+| Total | Needed | In Progress | Done | Approved |
+|-------|--------|-------------|------|----------|
+| [N] | [N] | [N] | [N] | [N] |
+
+## Assets by Context
+
+### [Target Type]: [Target Name]
+| Asset ID | Name | Category | Status | Spec File |
+|----------|------|----------|--------|-----------|
+| ASSET-001 | [name] | [category] | Needed | design/assets/specs/[target]-assets.md |
+```
+
+If the manifest already exists, append the new context block and update the Progress Summary counts.
+
+Ask: "May I update `design/assets/asset-manifest.md`?"
+
+---
+
+## Phase 6: Close
+
+Use `AskUserQuestion`:
+- Prompt: "Asset specs complete for **[target]**. What's next?"
+- Options:
+ - `[A] Spec another system — /asset-spec system:[next-system]`
+ - `[B] Spec a level — /asset-spec level:[level-name]`
+ - `[C] Spec a character — /asset-spec character:[character-name]`
+ - `[D] Run /asset-audit — validate delivered assets against specs`
+ - `[E] Stop here`
+
+---
+
+## Asset ID Assignment
+
+Asset IDs are assigned sequentially across the entire project — not per-context. Read the manifest before assigning IDs to find the current highest number:
+
+```
+Grep pattern="ASSET-" path="design/assets/asset-manifest.md"
+```
+
+Start new assets from `ASSET-[highest + 1]`. This ensures IDs are stable and unique across the whole project.
+
+If no manifest exists yet, start from `ASSET-001`.
+
+---
+
+## Shared Asset Protocol
+
+Before speccing an asset, check if an equivalent already exists in another context's spec:
+
+- Common UI elements (health bars, score displays) are often shared across systems
+- Generic environment props may appear in multiple levels
+- Character VFX (hit sparks, death effects) may reuse a base spec with color variants
+
+If a match is found: reference the existing ASSET-ID rather than creating a duplicate. Note the shared usage in the manifest's referenced-by column.
+
+> "ASSET-012 (Generic Hit Spark) already specced for Combat system. Reusing for Tower Defense — adding tower-defense to referenced-by."
+
+---
+
+## Error Recovery Protocol
+
+If any spawned agent returns BLOCKED or cannot complete:
+
+1. Surface immediately: "[AgentName]: BLOCKED — [reason]"
+2. In `lean` mode or if `technical-artist` blocks: proceed with art-director output only — note that technical constraints were not validated
+3. In `solo` mode or if `art-director` blocks: derive descriptions from art bible rules — flag as "Art director not consulted — verify against art bible before production"
+4. Always produce a partial spec — never discard work because one agent blocked
+
+---
+
+## Collaborative Protocol
+
+Every phase follows: **Identify → Confirm → Generate → Review → Approve → Write**
+
+- Never spec assets without first confirming the asset list with the user
+- Always anchor specs to the art bible — a spec that contradicts the art bible is wrong
+- Surface all agent disagreements — do not silently pick one
+- Write the spec file only after explicit approval
+- Update the manifest immediately after writing the spec
diff --git a/.claude/skills/balance-check/SKILL.md b/.claude/skills/balance-check/SKILL.md
index 82c0aa8..65ff326 100644
--- a/.claude/skills/balance-check/SKILL.md
+++ b/.claude/skills/balance-check/SKILL.md
@@ -4,7 +4,6 @@ description: "Analyzes game balance data files, formulas, and configuration to i
argument-hint: "[system-name|path-to-data-file]"
user-invocable: true
allowed-tools: Read, Glob, Grep
-context: fork
agent: economy-designer
---
diff --git a/.claude/skills/brainstorm/SKILL.md b/.claude/skills/brainstorm/SKILL.md
index fdae11f..1699a4c 100644
--- a/.claude/skills/brainstorm/SKILL.md
+++ b/.claude/skills/brainstorm/SKILL.md
@@ -3,15 +3,19 @@ name: brainstorm
description: "Guided game concept ideation — from zero idea to a structured game concept document. Uses professional studio ideation techniques, player psychology frameworks, and structured creative exploration."
argument-hint: "[genre or theme hint, or 'open'] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, WebSearch, AskUserQuestion
+allowed-tools: Read, Glob, Grep, Write, WebSearch, Task, AskUserQuestion
---
When this skill is invoked:
1. **Parse the argument** for an optional genre/theme hint (e.g., `roguelike`,
`space survival`, `cozy farming`). If `open` or no argument, start from
- scratch. Also extract `--review [full|lean|solo]` if present and store as
- the review mode override for this run (see `.claude/docs/director-gates.md`).
+ scratch. Also resolve the review mode (once, store for all gate spawns this run):
+ 1. If `--review [full|lean|solo]` was passed → use that
+ 2. Else read `production/review-mode.txt` → use that value
+ 3. Else → default to `lean`
+
+ See `.claude/docs/director-gates.md` for the full check pattern.
2. **Check for existing concept work**:
- Read `design/gdd/game-concept.md` if it exists (resume, don't restart)
@@ -102,10 +106,24 @@ For each concept, present:
- **Why It Could Work** (1 sentence on market/audience fit)
- **Biggest Risk** (1 sentence on the hardest unanswered question)
-Present all three. Then use `AskUserQuestion` to capture the selection:
-- **Use a single-list call — NO tabs, just `prompt` and `options`. Do not use a tabbed form here.**
-- **Prompt**: "Which concept resonates with you? You can pick one, combine elements, or ask for fresh directions."
-- **Options**: one option per concept (e.g., `Concept 1 — SCAR`), plus `Combine elements across concepts` and `Generate fresh directions`
+Present all three. Then use `AskUserQuestion` to capture the selection.
+
+**CRITICAL**: This MUST be a plain list call — no tabs, no form fields. Use exactly this structure:
+
+```
+AskUserQuestion(
+ prompt: "Which concept resonates with you? You can pick one, combine elements, or ask for fresh directions.",
+ options: [
+ "Concept 1 — [Title]",
+ "Concept 2 — [Title]",
+ "Concept 3 — [Title]",
+ "Combine elements across concepts",
+ "Generate fresh directions"
+ ]
+)
+```
+
+Do NOT use a `tabs` field here. The `tabs` form is for multi-field input only — using it here causes an "Invalid tool parameters" error. This is a plain `prompt` + `options` call.
Never pressure toward a choice — let them sit with it.
@@ -168,11 +186,36 @@ Then define **3+ anti-pillars** (what this game is NOT):
be cool if..." features that don't serve the core vision
- Frame as: "We will NOT do [thing] because it would compromise [pillar]"
-**After pillars and anti-pillars are agreed, spawn `creative-director` via Task using gate CD-PILLARS (`.claude/docs/director-gates.md`) before moving to Phase 5.**
+**Pillar confirmation**: After presenting the full pillar set, use `AskUserQuestion`:
+- Prompt: "Do these pillars feel right for your game?"
+- Options: `[A] Lock these in` / `[B] Rename or reframe one` / `[C] Swap a pillar out` / `[D] Something else`
-Pass: full pillar set with design tests, anti-pillars, core fantasy, unique hook.
+If the user selects B, C, or D, make the revision, then use `AskUserQuestion` again:
+- Prompt: "Pillars updated. Ready to lock these in?"
+- Options: `[A] Lock these in` / `[B] Revise another pillar` / `[C] Something else`
-Present the feedback to the user. If CONCERNS or REJECT, offer to revise specific pillars before moving on. If APPROVE, note the approval and continue.
+Repeat until the user selects [A] Lock these in.
+
+**Review mode check** — apply before spawning CD-PILLARS and AD-CONCEPT-VISUAL:
+- `solo` → skip both. Note: "CD-PILLARS skipped — Solo mode. AD-CONCEPT-VISUAL skipped — Solo mode." Proceed to Phase 5.
+- `lean` → skip both (not PHASE-GATEs). Note: "CD-PILLARS skipped — Lean mode. AD-CONCEPT-VISUAL skipped — Lean mode." Proceed to Phase 5.
+- `full` → spawn as normal.
+
+**After pillars and anti-pillars are agreed, spawn BOTH `creative-director` AND `art-director` via Task in parallel before moving to Phase 5. Issue both Task calls simultaneously — do not wait for one before starting the other.**
+
+- **`creative-director`** — gate **CD-PILLARS** (`.claude/docs/director-gates.md`)
+ Pass: full pillar set with design tests, anti-pillars, core fantasy, unique hook.
+
+- **`art-director`** — gate **AD-CONCEPT-VISUAL** (`.claude/docs/director-gates.md`)
+ Pass: game concept elevator pitch, full pillar set with design tests, target platform (if known), any reference games or visual touchstones the user mentioned.
+
+Collect both verdicts, then present them together using a two-tab `AskUserQuestion`:
+- Tab **"Pillars"**: present creative-director feedback. Options mirror the standard CD-PILLARS handling — `Lock in as-is` / `Revise [specific pillar]` / `Discuss further`.
+- Tab **"Visual anchor"**: present the art-director's 2-3 named visual direction options. Options: each named direction (one per option) + `Combine elements across directions` + `Describe my own direction`.
+
+The user's selected visual anchor (the named direction or their custom description) is stored as the **Visual Identity Anchor** — it will be written into the game-concept document and becomes the foundation of the art bible.
+
+If the creative-director returns CONCERNS or REJECT on pillars, resolve pillar issues before asking for the visual anchor selection — visual direction should flow from confirmed pillars.
---
@@ -211,12 +254,22 @@ Ground the concept in reality:
- **Biggest risks**: Technical risks, design risks, market risks
- **Scope tiers**: What's the full vision vs. what ships if time runs out?
+**Review mode check** — apply before spawning TD-FEASIBILITY:
+- `solo` → skip. Note: "TD-FEASIBILITY skipped — Solo mode." Proceed directly to scope tier definition.
+- `lean` → skip (not a PHASE-GATE). Note: "TD-FEASIBILITY skipped — Lean mode." Proceed directly to scope tier definition.
+- `full` → spawn as normal.
+
**After identifying biggest technical risks, spawn `technical-director` via Task using gate TD-FEASIBILITY (`.claude/docs/director-gates.md`) before scope tiers are defined.**
Pass: core loop description, platform target, engine choice (or "undecided"), list of identified technical risks.
Present the assessment to the user. If HIGH RISK, offer to revisit scope before finalising. If CONCERNS, note them and continue.
+**Review mode check** — apply before spawning PR-SCOPE:
+- `solo` → skip. Note: "PR-SCOPE skipped — Solo mode." Proceed to document generation.
+- `lean` → skip (not a PHASE-GATE). Note: "PR-SCOPE skipped — Lean mode." Proceed to document generation.
+- `full` → spawn as normal.
+
**After scope tiers are defined, spawn `producer` via Task using gate PR-SCOPE (`.claude/docs/director-gates.md`).**
Pass: full vision scope, MVP definition, timeline estimate, team size.
@@ -230,35 +283,56 @@ Present the assessment to the user. If UNREALISTIC, offer to adjust the MVP defi
brainstorm conversation, including the MDA analysis, player motivation
profile, and flow state design sections.
-5. Ask: "May I write the game concept document to `design/gdd/game-concept.md`?"
+ **Include a Visual Identity Anchor section** in the game concept document with:
+ - The selected visual direction name
+ - The one-line visual rule
+ - The 2-3 supporting visual principles with their design tests
+ - The color philosophy summary
-If yes, generate the document using the template at `.claude/docs/templates/game-concept.md`, fill in ALL sections from the brainstorm conversation, and write the file, creating directories as needed.
+ This section is the seed of the art bible — it captures the "everything must
+ move" decision before it can be forgotten between sessions.
-If no:
-- If the user already named a section to change, revise it directly — do not ask again which section.
-- If the user said no without specifying what to change, use `AskUserQuestion` — "Which section would you like to revise?"
- Options: `Elevator Pitch` / `Core Fantasy & Unique Hook` / `Pillars` / `Core Loop` / `MVP Definition` / `Scope Tiers` / `Risks` / `Something else — I'll describe`
+5. Use `AskUserQuestion` for write approval:
+- Prompt: "Game concept is ready. May I write it to `design/gdd/game-concept.md`?"
+- Options: `[A] Yes — write it` / `[B] Not yet — revise a section first`
+
+If [B]: ask which section to revise using `AskUserQuestion` with options: `Elevator Pitch` / `Core Fantasy & Unique Hook` / `Pillars` / `Core Loop` / `MVP Definition` / `Scope Tiers` / `Risks` / `Something else — I'll describe`
After revising, show the updated section as a diff or clear before/after, then use `AskUserQuestion` — "Ready to write the updated concept document?"
-Options: `Yes — write it` / `Revise another section`
-Repeat until the user approves the write.
+Options: `[A] Yes — write it` / `[B] Revise another section`
+Repeat until the user selects [A].
+
+If yes, generate the document using the template at `.claude/docs/templates/game-concept.md`, fill in ALL sections from the brainstorm conversation, and write the file, creating directories as needed.
**Scope consistency rule**: The "Estimated Scope" field in the Core Identity table must match the full-vision timeline from the Scope Tiers section — not just say "Large (9+ months)". Write it as "Large (X–Y months, solo)" or "Large (X–Y months, team of N)" so the summary table is accurate.
6. **Suggest next steps** (in this order — this is the professional studio
pre-production pipeline). List ALL steps — do not abbreviate or truncate:
1. "Run `/setup-engine` to configure the engine and populate version-aware reference docs"
- 2. "Use `/design-review design/gdd/game-concept.md` to validate concept completeness before going downstream"
- 3. "Discuss vision with the `creative-director` agent for pillar refinement"
- 4. "Decompose the concept into individual systems with `/map-systems` — maps dependencies, assigns priorities, and creates the systems index"
+ 2. "Run `/art-bible` to create the visual identity specification — do this BEFORE writing GDDs. The art bible gates asset production and shapes technical architecture decisions (rendering, VFX, UI systems)."
+ 3. "Use `/design-review design/gdd/game-concept.md` to validate concept completeness before going downstream"
+ 4. "Discuss vision with the `creative-director` agent for pillar refinement"
+ 5. "Decompose the concept into individual systems with `/map-systems` — maps dependencies, assigns priorities, and creates the systems index"
5. "Author per-system GDDs with `/design-system` — guided, section-by-section GDD writing for each system identified in step 4"
- 6. "Plan the technical architecture with `/create-architecture` — defines how all systems fit together and connect"
- 7. "Validate readiness to advance with `/gate-check` — phase gate before committing to production"
- 8. "Prototype the riskiest system with `/prototype [core-mechanic]` — validate the core loop before full implementation"
- 9. "Run `/playtest-report` after the prototype to validate the core hypothesis"
- 10. "If validated, plan the first sprint with `/sprint-plan new`"
+ 6. "Plan the technical architecture with `/create-architecture` — produces the master architecture blueprint and Required ADR list"
+ 7. "Record key architectural decisions with `/architecture-decision (×N)` — write one ADR per decision in the Required ADR list from `/create-architecture`"
+ 8. "Validate readiness to advance with `/gate-check` — phase gate before committing to production"
+ 9. "Prototype the riskiest system with `/prototype [core-mechanic]` — validate the core loop before full implementation"
+ 10. "Run `/playtest-report` after the prototype to validate the core hypothesis"
+ 11. "If validated, plan the first sprint with `/sprint-plan new`"
7. **Output a summary** with the chosen concept's elevator pitch, pillars,
primary player type, engine recommendation, biggest risk, and file path.
Verdict: **COMPLETE** — game concept created and handed off for next steps.
+
+---
+
+## Context Window Awareness
+
+This is a multi-phase skill. If context reaches or exceeds 70% during any phase,
+append this notice to the current response before continuing:
+
+> **Context is approaching the limit (≥70%).** The game concept document is saved
+> to `design/gdd/game-concept.md`. Open a fresh Claude Code session to continue
+> if needed — progress is not lost.
diff --git a/.claude/skills/bug-report/SKILL.md b/.claude/skills/bug-report/SKILL.md
index 4a30334..45d8249 100644
--- a/.claude/skills/bug-report/SKILL.md
+++ b/.claude/skills/bug-report/SKILL.md
@@ -10,8 +10,10 @@ allowed-tools: Read, Glob, Grep, Write
Determine the mode from the argument:
-- No `analyze` keyword → **Description Mode**: generate a structured bug report from the provided description
+- No keyword → **Description Mode**: generate a structured bug report from the provided description
- `analyze [path]` → **Analyze Mode**: read the target file(s) and identify potential bugs
+- `verify [BUG-ID]` → **Verify Mode**: confirm a reported fix actually resolved the bug
+- `close [BUG-ID]` → **Close Mode**: mark a verified bug as closed with resolution record
If no argument is provided, ask the user for a bug description before proceeding.
@@ -87,6 +89,51 @@ If no argument is provided, ask the user for a bug description before proceeding
---
+## Phase 2C: Verify Mode
+
+Read `production/qa/bugs/[BUG-ID].md`. Extract the reproduction steps and expected result.
+
+1. **Re-run reproduction steps** — use Grep/Glob to check whether the root cause code path still exists as described. If the fix removed or changed it, note the change.
+2. **Run the related test** — if the bug's system has a test file in `tests/`, run it via Bash and report pass/fail.
+3. **Check for regression** — grep the codebase for any new occurrence of the pattern that caused the bug.
+
+Produce a verification verdict:
+
+- **VERIFIED FIXED** — reproduction steps no longer produce the bug; related tests pass
+- **STILL PRESENT** — bug reproduces as described; fix did not resolve the issue
+- **CANNOT VERIFY** — automated checks inconclusive; manual playtest required
+
+Ask: "May I update `production/qa/bugs/[BUG-ID].md` to set Status: Verified Fixed / Still Present / Cannot Verify?"
+
+If STILL PRESENT: reopen the bug, set Status back to Open, and suggest re-running `/hotfix [BUG-ID]`.
+
+---
+
+## Phase 2D: Close Mode
+
+Read `production/qa/bugs/[BUG-ID].md`. Confirm Status is `Verified Fixed` before closing. If status is anything else, stop: "Bug [ID] must be Verified Fixed before it can be closed. Run `/bug-report verify [BUG-ID]` first."
+
+Append a closure record to the bug file:
+
+```markdown
+## Closure Record
+**Closed**: [date]
+**Resolution**: Fixed — [one-line description of what was changed]
+**Fix commit / PR**: [if known]
+**Verified by**: qa-tester
+**Closed by**: [user]
+**Regression test**: [test file path, or "Manual verification"]
+**Status**: Closed
+```
+
+Update the top-level `**Status**: Open` field to `**Status**: Closed`.
+
+Ask: "May I update `production/qa/bugs/[BUG-ID].md` to mark it Closed?"
+
+After closing, check `production/qa/bug-triage-*.md` — if the bug appears in an open triage report, note: "Bug [ID] is referenced in the triage report. Run `/bug-triage` to refresh the open bug count."
+
+---
+
## Phase 3: Save Report
Present the completed bug report(s) to the user.
@@ -101,7 +148,16 @@ If no, stop here. Verdict: **BLOCKED** — user declined write.
## Phase 4: Next Steps
-After saving, suggest:
+After saving, suggest based on mode:
-- Run `/bug-triage` to prioritize this bug alongside existing open bugs.
-- If S1 or S2 severity, consider `/hotfix` for an emergency fix workflow.
+**After filing (Description/Analyze mode):**
+- Run `/bug-triage` to prioritize alongside existing open bugs
+- If S1 or S2: run `/hotfix [BUG-ID]` for emergency fix workflow
+
+**After fixing the bug (developer confirms fix is in):**
+- Run `/bug-report verify [BUG-ID]` — confirm the fix actually works before closing
+- Never mark a bug closed without verification — a fix that doesn't verify is still Open
+
+**After verify returns VERIFIED FIXED:**
+- Run `/bug-report close [BUG-ID]` — write the closure record and update status
+- Run `/bug-triage` to refresh the open bug count and remove it from the active list
diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md
index d3e6abe..cbed2b4 100644
--- a/.claude/skills/bug-triage/SKILL.md
+++ b/.claude/skills/bug-triage/SKILL.md
@@ -4,7 +4,6 @@ description: "Read all open bugs in production/qa/bugs/, re-evaluate priority vs
argument-hint: "[sprint | full | trend]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Edit
-context: fork
---
# Bug Triage
diff --git a/.claude/skills/code-review/SKILL.md b/.claude/skills/code-review/SKILL.md
index b84ba8d..e1f8733 100644
--- a/.claude/skills/code-review/SKILL.md
+++ b/.claude/skills/code-review/SKILL.md
@@ -4,7 +4,6 @@ description: "Performs an architectural and quality code review on a specified f
argument-hint: "[path-to-file-or-directory]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Bash, Task
-context: fork
agent: lead-programmer
---
@@ -82,9 +81,13 @@ Identify the system category (engine, gameplay, AI, networking, UI, tools) and e
---
-## Phase 7: Engine Specialist Review
+## Phase 7: Specialist Reviews (Parallel)
-If an engine is configured, spawn engine specialists via Task in parallel with the review above. Determine which specialist applies to each file:
+Spawn all applicable specialists simultaneously via Task — do not wait for one before starting the next.
+
+### Engine Specialists
+
+If an engine is configured, determine which specialist applies to each file and spawn in parallel:
- Primary language files (`.gd`, `.cs`, `.cpp`) → Language/Code Specialist
- Shader files (`.gdshader`, `.hlsl`, shader graph) → Shader Specialist
@@ -93,7 +96,23 @@ If an engine is configured, spawn engine specialists via Task in parallel with t
Also spawn the **Primary Specialist** for any file touching engine architecture (scene structure, node hierarchy, lifecycle hooks).
-Collect findings and include them under `### Engine Specialist Findings`.
+### QA Testability Review
+
+For Logic and Integration stories, also spawn `qa-tester` via Task in parallel with the engine specialists. Pass:
+- The implementation files being reviewed
+- The story's `## QA Test Cases` section (the pre-written test specs from qa-lead)
+- The story's `## Acceptance Criteria`
+
+Ask the qa-tester to evaluate:
+- [ ] Are all test hooks and interfaces exposed (not hidden behind private/internal access)?
+- [ ] Do the QA test cases from the story's `## QA Test Cases` section map to testable code paths?
+- [ ] Are any acceptance criteria untestable as implemented (e.g., hardcoded values, no seam for injection)?
+- [ ] Does the implementation introduce any new edge cases not covered by the existing QA test cases?
+- [ ] Are there any observable side effects that should have a test but don't?
+
+For Visual/Feel and UI stories: qa-tester reviews whether the manual verification steps in `## QA Test Cases` are achievable with the implementation as written — e.g., "is the state the manual checker needs to reach actually reachable?"
+
+Collect all specialist findings before producing output.
---
@@ -105,6 +124,10 @@ Collect findings and include them under `### Engine Specialist Findings`.
### Engine Specialist Findings: [N/A — no engine configured / CLEAN / ISSUES FOUND]
[Findings from engine specialist(s), or "No engine configured." if skipped]
+### Testability: [N/A — Visual/Feel or Config story / TESTABLE / GAPS / BLOCKING]
+[qa-tester findings: test hooks, coverage gaps, untestable paths, new edge cases]
+[If BLOCKING: implementation must expose [X] before tests in ## QA Test Cases can run]
+
### ADR Compliance: [NO ADRS FOUND / COMPLIANT / DRIFT / VIOLATION]
[List each ADR checked, result, and any deviations with severity]
diff --git a/.claude/skills/consistency-check/SKILL.md b/.claude/skills/consistency-check/SKILL.md
index 7e4fb63..a7f60a7 100644
--- a/.claude/skills/consistency-check/SKILL.md
+++ b/.claude/skills/consistency-check/SKILL.md
@@ -4,7 +4,6 @@ description: "Scan all GDDs against the entity registry to detect cross-document
argument-hint: "[full | since-last-review | entity: | item:]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Edit, Bash
-context: fork
---
# Consistency Check
diff --git a/.claude/skills/content-audit/SKILL.md b/.claude/skills/content-audit/SKILL.md
index 5665b1d..a62b4d8 100644
--- a/.claude/skills/content-audit/SKILL.md
+++ b/.claude/skills/content-audit/SKILL.md
@@ -4,7 +4,6 @@ description: "Audit GDD-specified content counts against implemented content. Id
argument-hint: "[system-name | --summary | (no arg = full audit)]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write
-context: fork
agent: producer
---
diff --git a/.claude/skills/create-architecture/SKILL.md b/.claude/skills/create-architecture/SKILL.md
index 85840bc..d8471b2 100644
--- a/.claude/skills/create-architecture/SKILL.md
+++ b/.claude/skills/create-architecture/SKILL.md
@@ -3,8 +3,7 @@ name: create-architecture
description: "Guided, section-by-section authoring of the master architecture document for the game. Reads all GDDs, the systems index, existing ADRs, and the engine reference library to produce a complete architecture blueprint before any code is written. Engine-version-aware: flags knowledge gaps and validates decisions against the pinned engine version."
argument-hint: "[focus-area: full | layers | data-flow | api-boundaries | adr-audit] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Bash
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Bash, AskUserQuestion, Task
agent: technical-director
---
@@ -17,8 +16,12 @@ It sits between design and implementation, and must exist before sprint planning
**Distinct from `/architecture-decision`**: ADRs record individual point decisions.
This skill creates the whole-system blueprint that gives ADRs their context.
-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
**Argument modes:**
- **No argument / `full`**: Full guided walkthrough — all sections, start to finish
@@ -336,6 +339,11 @@ After writing the master architecture document, perform an explicit sign-off bef
Apply gate **TD-ARCHITECTURE** (`.claude/docs/director-gates.md`) as a self-review. Check all four criteria from that gate definition against the completed document.
+**Review mode check** — apply before spawning LP-FEASIBILITY:
+- `solo` → skip. Note: "LP-FEASIBILITY skipped — Solo mode." Proceed to Phase 8 handoff.
+- `lean` → skip (not a PHASE-GATE). Note: "LP-FEASIBILITY skipped — Lean mode." Proceed to Phase 8 handoff.
+- `full` → spawn as normal.
+
**Step 2 — Spawn `lead-programmer` via Task using gate LP-FEASIBILITY (`.claude/docs/director-gates.md`):**
Pass: architecture document path, technical requirements baseline summary, ADR list.
diff --git a/.claude/skills/create-control-manifest/SKILL.md b/.claude/skills/create-control-manifest/SKILL.md
index e800df1..b9b36fe 100644
--- a/.claude/skills/create-control-manifest/SKILL.md
+++ b/.claude/skills/create-control-manifest/SKILL.md
@@ -4,7 +4,6 @@ description: "After architecture is complete, produces a flat actionable rules s
argument-hint: "[update — regenerate from current ADRs]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write
-context: fork
agent: technical-director
---
diff --git a/.claude/skills/create-epics/SKILL.md b/.claude/skills/create-epics/SKILL.md
index e9becf5..662a04a 100644
--- a/.claude/skills/create-epics/SKILL.md
+++ b/.claude/skills/create-epics/SKILL.md
@@ -3,8 +3,7 @@ name: create-epics
description: "Translate approved GDDs + architecture into epics — one epic per architectural module. Defines scope, governing ADRs, engine risk, and untraced requirements. Does NOT break into stories — run /create-stories [epic-slug] after each epic is created."
argument-hint: "[system-name | layer: foundation|core|feature|presentation | all] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
agent: technical-director
---
@@ -28,8 +27,12 @@ will have changed.
## 1. Parse Arguments
-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
**Modes:**
- `/create-epics all` — process all systems in layer order
@@ -55,14 +58,16 @@ Grep pattern="## Summary" glob="design/gdd/*.md" output_mode="content" -A 5
For `layer:` or `[system-name]` modes: filter to only in-scope GDDs based on
the Summary quick-reference. Skip full-reading anything out of scope.
-### Step 2b — Full document load
+### Step 2b — Full document load (in-scope systems only)
+
+Using the Step 2a grep results, identify which systems are in scope. Read full documents **only for in-scope systems** — do not read GDDs or ADRs for out-of-scope systems or layers.
Read for in-scope systems:
- `design/gdd/systems-index.md` — authoritative system list, layers, priority
-- In-scope GDDs (Approved or Designed status)
+- In-scope GDDs only (Approved or Designed status, filtered by Step 2a results)
- `docs/architecture/architecture.md` — module ownership and API boundaries
-- All Accepted ADRs — read the "GDD Requirements Addressed", "Decision", and "Engine Compatibility" sections
+- Accepted ADRs **whose domains cover in-scope systems only** — read the "GDD Requirements Addressed", "Decision", and "Engine Compatibility" sections; skip ADRs for unrelated domains
- `docs/architecture/control-manifest.md` — manifest version date from header
- `docs/architecture/tr-registry.yaml` — for tracing requirements to ADR coverage
- `docs/engine-reference/[engine]/VERSION.md` — engine name, version, risk levels
@@ -117,6 +122,11 @@ Options: "Yes, create it", "Skip", "Pause — I need to write ADRs first"
## 4b. Producer Epic Structure Gate
+**Review mode check** — apply before spawning PR-EPIC:
+- `solo` → skip. Note: "PR-EPIC skipped — Solo mode." Proceed to Step 5 (write epic files).
+- `lean` → skip (not a PHASE-GATE). Note: "PR-EPIC skipped — Lean mode." Proceed to Step 5 (write epic files).
+- `full` → spawn as normal.
+
After all epics for the current layer are defined (Step 4 completed for all in-scope systems), and before writing any files, spawn `producer` via Task using gate **PR-EPIC** (`.claude/docs/director-gates.md`).
Pass: the full epic structure summary (all epics, their scope summaries, governing ADR counts), the layer being processed, milestone timeline and team capacity.
diff --git a/.claude/skills/create-stories/SKILL.md b/.claude/skills/create-stories/SKILL.md
index e45875c..ba39446 100644
--- a/.claude/skills/create-stories/SKILL.md
+++ b/.claude/skills/create-stories/SKILL.md
@@ -3,8 +3,7 @@ name: create-stories
description: "Break a single epic into implementable story files. Reads the epic, its GDD, governing ADRs, and control manifest. Each story embeds its GDD requirement TR-ID, ADR guidance, acceptance criteria, story type, and test evidence path. Run after /create-epics for each epic."
argument-hint: "[epic-slug | epic-path] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
agent: lead-programmer
---
@@ -28,7 +27,10 @@ then Core, and so on — matching the dependency order.
## 1. Parse Argument
Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+override for this run. If not provided, read `production/review-mode.txt`
+(default `full` if missing). This resolved mode applies to all gate spawns
+in this skill — apply the check pattern from `.claude/docs/director-gates.md`
+before every gate invocation.
- `/create-stories [epic-slug]` — e.g. `/create-stories combat`
- `/create-stories production/epics/combat/EPIC.md` — full path also accepted
@@ -47,7 +49,15 @@ Read in full:
- `docs/architecture/control-manifest.md` — extract rules for this epic's layer; note the Manifest Version date from the header
- `docs/architecture/tr-registry.yaml` — load all TR-IDs for this system
-Report: "Loaded epic [name], GDD [filename], [N] governing ADRs, control manifest v[date]."
+**ADR existence validation**: After reading the governing ADRs list from the epic, confirm each ADR file exists on disk. If any ADR file cannot be found, **stop immediately** before decomposing any story:
+
+> "Epic references [ADR-NNNN: title] but `docs/architecture/[adr-file].md` was not found.
+> Check the filename in the epic's Governing ADRs list, or run `/architecture-decision`
+> to create it. Cannot create stories until all referenced ADR files are present."
+
+Do not proceed to Step 3 until all referenced ADR files are confirmed present.
+
+Report: "Loaded epic [name], GDD [filename], [N] governing ADRs (all confirmed present), control manifest v[date]."
---
@@ -92,11 +102,36 @@ For each story, determine:
## 4b. QA Lead Story Readiness Gate
+**Review mode check** — apply before spawning QL-STORY-READY:
+- `solo` → skip. Note: "QL-STORY-READY skipped — Solo mode." Proceed to Step 5 (present stories for review).
+- `lean` → skip (not a PHASE-GATE). Note: "QL-STORY-READY skipped — Lean mode." Proceed to Step 5 (present stories for review).
+- `full` → spawn as normal.
+
After decomposing all stories (Step 4 complete) but before presenting them for write approval, spawn `qa-lead` via Task using gate **QL-STORY-READY** (`.claude/docs/director-gates.md`).
Pass: the full story list with acceptance criteria, story types, and TR-IDs; the epic's GDD acceptance criteria for reference.
-Present the QA lead's assessment. For each story flagged as GAPS or INADEQUATE, revise the acceptance criteria before proceeding — stories with untestable criteria cannot be implemented correctly. Once all stories reach ADEQUATE, proceed to Step 5.
+Present the QA lead's assessment. For each story flagged as GAPS or INADEQUATE, revise the acceptance criteria before proceeding — stories with untestable criteria cannot be implemented correctly. Once all stories reach ADEQUATE, proceed.
+
+**After ADEQUATE**: for every Logic and Integration story, ask the qa-lead to produce concrete test case specifications — one per acceptance criterion — in this format:
+
+```
+Test: [criterion text]
+ Given: [precondition]
+ When: [action]
+ Then: [expected result / assertion]
+ Edge cases: [boundary values or failure states to test]
+```
+
+For Visual/Feel and UI stories, produce manual verification steps instead:
+```
+Manual check: [criterion text]
+ Setup: [how to reach the state]
+ Verify: [what to look for]
+ Pass condition: [unambiguous pass description]
+```
+
+These test case specs are embedded directly into each story's `## QA Test Cases` section. The developer implements against these cases. The programmer does not write tests from scratch — QA has already defined what "done" looks like.
---
@@ -122,7 +157,9 @@ Story 003: [title] — Visual/Feel — ADR-NNNN
[N stories total: N Logic, N Integration, N Visual/Feel, N UI, N Config/Data]
```
-Ask: "May I write these [N] stories to `production/epics/[epic-slug]/`?"
+Use `AskUserQuestion`:
+- Prompt: "May I write these [N] stories to `production/epics/[epic-slug]/`?"
+- Options: `[A] Yes — write all [N] stories` / `[B] Not yet — I want to review or adjust first`
---
@@ -185,6 +222,27 @@ change meaning. This is what the programmer reads instead of the ADR.]
---
+## QA Test Cases
+
+*Written by qa-lead at story creation. The developer implements against these — do not invent new test cases during implementation.*
+
+**[For Logic / Integration stories — automated test specs]:**
+
+- **AC-1**: [criterion text]
+ - Given: [precondition]
+ - When: [action]
+ - Then: [assertion]
+ - Edge cases: [boundary values / failure states]
+
+**[For Visual/Feel / UI stories — manual verification steps]:**
+
+- **AC-1**: [criterion text]
+ - Setup: [how to reach the state]
+ - Verify: [what to look for]
+ - Pass condition: [unambiguous pass description]
+
+---
+
## Test Evidence
**Story Type**: [type]
@@ -222,18 +280,21 @@ Replace the "Stories: Not yet created" line with a populated table:
## 7. After Writing
-Tell the user:
+Use `AskUserQuestion` to close with context-aware next steps:
-"[N] stories written to `production/epics/[epic-slug]/`.
+Check:
+- Are there other epics in `production/epics/` without stories yet? List them.
+- Is this the last epic? If so, include `/sprint-plan` as an option.
-To start implementation:
-1. Run `/story-readiness [story-path]` to confirm the first story is ready
-2. Run `/dev-story [story-path]` to implement it
-3. Run `/code-review [changed files]` after implementation
-4. Run `/story-done [story-path]` to close it
+Widget:
+- Prompt: "[N] stories written to `production/epics/[epic-slug]/`. What next?"
+- Options (include all that apply):
+ - `[A] Start implementing — run /story-readiness [first-story-path]` (Recommended)
+ - `[B] Create stories for [next-epic-slug] — run /create-stories [slug]` (only if other epics have no stories yet)
+ - `[C] Plan the sprint — run /sprint-plan` (only if all epics have stories)
+ - `[D] Stop here for this session`
-Work through stories in order — each story's `Depends on:` field tells you
-what must be DONE before you can start it."
+Note in output: "Work through stories in order — each story's `Depends on:` field tells you what must be DONE before you can start it."
---
diff --git a/.claude/skills/day-one-patch/SKILL.md b/.claude/skills/day-one-patch/SKILL.md
new file mode 100644
index 0000000..770d372
--- /dev/null
+++ b/.claude/skills/day-one-patch/SKILL.md
@@ -0,0 +1,218 @@
+---
+name: day-one-patch
+description: "Prepare a day-one patch for a game launch. Scopes, prioritises, implements, and QA-gates a focused patch addressing known issues discovered after gold master but before or immediately after public launch. Treats the patch as a mini-sprint with its own QA gate and rollback plan."
+argument-hint: "[scope: known-bugs | cert-feedback | all]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Edit, Bash, Task, AskUserQuestion
+---
+
+# Day-One Patch
+
+Every shipped game has a day-one patch. Planning it before launch day prevents
+chaos. This skill scopes the patch to only what is safe and necessary, gates it
+through a lightweight QA pass, and ensures a rollback plan exists before anything
+ships. It is a mini-sprint — not a hotfix, not a full sprint.
+
+**When to run:**
+- After the gold master build is locked (cert approved or launch candidate tagged)
+- When known bugs exist that are too risky to address in the gold master
+- When cert feedback requires minor fixes post-submission
+- When a pre-launch playtest surfaces must-fix issues after the release gate passed
+
+**Day-one patch scope rules:**
+- Only P1/P2 bugs that are SAFE to fix quickly
+- No new features — this is fix-only
+- No refactoring — minimum viable change
+- Any fix that requires more than 4 hours of dev time belongs in patch 1.1, not day-one
+
+**Output:** `production/releases/day-one-patch-[version].md`
+
+---
+
+## Phase 1: Load Release Context
+
+Read:
+- `production/stage.txt` — confirm project is in Release stage
+- The most recent file in `production/gate-checks/` — read the release gate verdict
+- `production/qa/bugs/*.md` — load all bugs with Status: Open or Fixed — Pending Verification
+- `production/sprints/` most recent — understand what shipped
+- `production/security/security-audit-*.md` most recent — check for any open security items
+
+If `production/stage.txt` is not `Release` or `Polish`:
+> "Day-one patch prep is for Release-stage projects. Current stage: [stage]. This skill is not appropriate until you are approaching launch."
+
+---
+
+## Phase 2: Scope the Patch
+
+### Step 2a — Classify open bugs for patch inclusion
+
+For each open bug, evaluate:
+
+| Criterion | Include in day-one? |
+|-----------|-------------------|
+| S1 or S2 severity | Yes — must include if safe to fix |
+| P1 priority | Yes |
+| Fix estimated < 4 hours | Yes |
+| Fix requires architecture change | No — defer to 1.1 |
+| Fix introduces new code paths | No — too risky |
+| Fix is data/config only (no code change) | Yes — very low risk |
+| Cert feedback requirement | Yes — required for platform approval |
+| S3/S4 severity | Only if trivial config fix; otherwise defer |
+
+### Step 2b — Present patch scope to user
+
+Use `AskUserQuestion`:
+- Prompt: "Based on open bugs and cert feedback, here is the proposed day-one patch scope. Does this look right?"
+- Show: table of included bugs (ID, severity, description, estimated effort)
+- Show: table of deferred bugs (ID, severity, reason deferred)
+- Options: `[A] Approve this scope` / `[B] Adjust — I want to add or remove items` / `[C] No day-one patch needed`
+
+If [C]: output "No day-one patch required. Proceed to `/launch-checklist`." Stop.
+
+### Step 2c — Check total scope
+
+Sum estimated effort. If total exceeds 1 day of work:
+> "⚠️ Patch scope is [N hours] — this exceeds a safe day-one window. Consider deferring lower-priority items to patch 1.1. A bloated day-one patch introduces more risk than it removes."
+
+Use `AskUserQuestion` to confirm proceeding or reduce scope.
+
+---
+
+## Phase 3: Rollback Plan
+
+Before any code is written, define the rollback procedure. This is non-negotiable.
+
+Spawn `release-manager` via Task. Ask them to produce a rollback plan covering:
+- How to revert to the gold master build on each target platform
+- Platform-specific rollback constraints (some platforms cannot roll back cert builds)
+- Who is responsible for triggering the rollback
+- What player communication is required if a rollback occurs
+
+Present the rollback plan. Ask: "May I write this rollback plan to `production/releases/rollback-plan-[version].md`?"
+
+Do not proceed to Phase 4 until the rollback plan is written.
+
+---
+
+## Phase 4: Implement Fixes
+
+For each bug in the approved scope, spawn a focused implementation loop:
+
+1. Spawn `lead-programmer` via Task with:
+ - The bug report (exact reproduction steps and root cause if known)
+ - The constraint: minimum viable fix only, no cleanup
+ - The affected files (from bug report Technical Context section)
+
+2. The lead-programmer implements and runs targeted tests.
+
+3. Spawn `qa-tester` via Task to verify: does the bug reproduce after the fix?
+
+For config/data-only fixes: make the change directly (no programmer agent needed). Confirm the value changed and re-run any relevant smoke test.
+
+---
+
+## Phase 5: Patch QA Gate
+
+This is a lightweight QA pass — not a full `/team-qa`. The patch is already QA-approved from the release gate; we are only re-verifying the changed areas.
+
+Spawn `qa-lead` via Task with:
+- List of all changed files
+- List of bugs fixed (with verification status from Phase 4)
+- The smoke check scope for the affected systems
+
+Ask qa-lead to determine: **Is a targeted smoke check sufficient, or do any fixes touch systems that require a broader regression?**
+
+Run the required QA scope:
+- **Targeted smoke check** — run `/smoke-check [affected-systems]`
+- **Broader regression** — run targeted tests in `tests/unit/` and `tests/integration/` for affected systems
+
+QA verdict must be PASS or PASS WITH WARNINGS before proceeding. If FAIL: scope the failing fix out of the day-one patch and defer to 1.1.
+
+---
+
+## Phase 6: Generate Patch Record
+
+```markdown
+# Day-One Patch: [Game Name] v[version]
+
+**Date prepared**: [date]
+**Target release**: [launch date or "day of launch"]
+**Base build**: [gold master tag or commit]
+**Patch build**: [patch tag or commit]
+
+---
+
+## Patch Notes (Internal)
+
+### Bugs Fixed
+| BUG-ID | Severity | Description | Fix summary |
+|--------|----------|-------------|-------------|
+| BUG-NNN | S[1-4] | [description] | [one-line fix] |
+
+### Deferred to 1.1
+| BUG-ID | Severity | Description | Reason deferred |
+|--------|----------|-------------|-----------------|
+| BUG-NNN | S[1-4] | [description] | [reason] |
+
+---
+
+## QA Sign-Off
+
+**QA scope**: [Targeted smoke / Broader regression]
+**Verdict**: [PASS / PASS WITH WARNINGS]
+**QA lead**: qa-lead agent
+**Date**: [date]
+**Warnings (if any)**: [list or "None"]
+
+---
+
+## Rollback Plan
+
+See: `production/releases/rollback-plan-[version].md`
+
+**Trigger condition**: If [N] or more S1 bugs are reported within [X] hours of launch, execute rollback.
+**Rollback owner**: [user / producer]
+
+---
+
+## Approvals Required Before Deploy
+
+- [ ] lead-programmer: all fixes reviewed
+- [ ] qa-lead: QA gate PASS confirmed
+- [ ] producer: deployment timing approved
+- [ ] release-manager: platform submission confirmed
+
+---
+
+## Player-Facing Patch Notes
+
+[Draft for community-manager to review before publishing]
+
+[list player-facing changes in plain language]
+```
+
+Ask: "May I write this patch record to `production/releases/day-one-patch-[version].md`?"
+
+---
+
+## Phase 7: Next Steps
+
+After the patch record is written:
+
+1. Run `/patch-notes` to generate the player-facing version of the patch notes
+2. Run `/bug-report verify [BUG-ID]` for each fixed bug after the patch is live
+3. Run `/bug-report close [BUG-ID]` for each verified fix
+4. Schedule a post-launch review 48–72 hours after launch using `/retrospective launch`
+
+**If any S1 bugs remain open after the patch:**
+> "⚠️ S1 bugs remain open and were not patched. These are accepted risks. Document them in the rollback plan trigger conditions — if they occur at scale, rollback may be preferable to a follow-up patch."
+
+---
+
+## Collaborative Protocol
+
+- **Scope discipline is everything** — resist scope creep; every addition increases risk
+- **Rollback plan first, always** — a patch without a rollback plan is irresponsible
+- **Deferred is not forgotten** — every deferred bug gets a 1.1 ticket automatically
+- **Player communication is part of the patch** — `/patch-notes` is a required output, not optional
diff --git a/.claude/skills/design-review/SKILL.md b/.claude/skills/design-review/SKILL.md
index a505d74..e12bbe9 100644
--- a/.claude/skills/design-review/SKILL.md
+++ b/.claude/skills/design-review/SKILL.md
@@ -1,17 +1,33 @@
---
name: design-review
description: "Reviews a game design document for completeness, internal consistency, implementability, and adherence to project design standards. Run this before handing a design document to programmers."
-argument-hint: "[path-to-design-doc]"
+argument-hint: "[path-to-design-doc] [--depth full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep
-context: fork
-# Read-only diagnostic skill — no specialist agent delegation needed
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
+---
+
+## Phase 0: Parse Arguments
+
+Extract `--depth [full|lean|solo]` if present. Default is `full` when no flag is given.
+
+**Note**: `--depth` controls the *analysis depth* of this skill (how many specialist agents are spawned). It is independent of the global review mode in `production/review-mode.txt`, which controls director gate spawning. These are two different concepts — `--depth` is about how thoroughly *this* skill analyses the document.
+
+- **`full`**: Complete review — all phases + specialist agent delegation (Phase 3b)
+- **`lean`**: All phases, no specialist agents — faster, single-session analysis
+- **`solo`**: Phases 1-4 only, no delegation, no Phase 5 next-step prompt — use when called from within another skill
+
---
## Phase 1: Load Documents
Read the target design document in full. Read CLAUDE.md to understand project context and standards. Read related design documents referenced or implied by the target doc (check `design/gdd/` for related systems).
+**Dependency graph validation:** For every system listed in the Dependencies section, use Glob to check whether its GDD file exists in `design/gdd/`. Flag any that don't exist yet — these are broken references that downstream authors will hit.
+
+**Lore/narrative alignment:** If `design/gdd/game-concept.md` or any file in `design/narrative/` exists, read it. Note any mechanical choices in this GDD that contradict established world rules, tone, or design pillars. Pass this context to `game-designer` in Phase 3b.
+
+**Prior review check:** Check whether `design/gdd/reviews/[doc-name]-review-log.md` exists. If it does, read the most recent entry — note what verdict was given and what blocking items were listed. This session is a re-review; track whether prior items were addressed.
+
---
## Phase 2: Completeness Check
@@ -48,42 +64,194 @@ Evaluate against the Design Document Standard checklist:
---
+## Phase 3b: Adversarial Specialist Review (full mode only)
+
+**Skip this phase in `lean` or `solo` mode.**
+
+**This phase is MANDATORY in full mode.** Do not skip it.
+
+**Before spawning any agents**, print this notice:
+> "Full review: spawning specialist agents in parallel. This typically takes 8–15 minutes. Use `--review lean` for faster single-session analysis."
+
+### Step 1 — Identify all domains the GDD touches
+
+Read the GDD and identify every domain present. A GDD can touch multiple domains simultaneously — be thorough. Common signals:
+
+| If the GDD contains... | Spawn these agents |
+|------------------------|-------------------|
+| Costs, prices, drops, rewards, economy | `economy-designer` |
+| Combat stats, damage, health, DPS | `game-designer`, `systems-designer` |
+| AI behaviour, pathfinding, targeting | `ai-programmer` |
+| Level layout, spawning, wave structure | `level-designer` |
+| Player progression, XP, unlocks | `economy-designer`, `game-designer` |
+| UI, HUD, menus, player-facing displays | `ux-designer`, `ui-programmer` |
+| Dialogue, quests, story, lore | `narrative-director` |
+| Animation, feel, timing, juice | `gameplay-programmer` |
+| Multiplayer, sync, replication | `network-programmer` |
+| Audio cues, music triggers | `audio-director` |
+| Performance, draw calls, memory | `performance-analyst` |
+| Engine-specific patterns or APIs | Primary engine specialist (from `.claude/docs/technical-preferences.md`) |
+| Acceptance criteria, test coverage | `qa-lead` |
+| Data schema, resource structure | `systems-designer` |
+| Any gameplay system | `game-designer` (always) |
+
+**Always spawn `game-designer` and `systems-designer` as a baseline minimum.** Every GDD touches their domain.
+
+### Step 2 — Spawn all relevant specialists in parallel
+
+**CRITICAL: Task in this skill spawns a SUBAGENT — a separate independent Claude session
+with its own context window. It is NOT task tracking. Do NOT simulate specialist
+perspectives internally. Do NOT reason through domain views yourself. You MUST issue
+actual Task calls. A simulated review is not a specialist review.**
+
+Issue all Task calls simultaneously. Do NOT spawn one at a time.
+
+**Prompt each specialist adversarially:**
+> "Here is the GDD for [system] and the main review's structural findings so far.
+> Your job is NOT to validate this design — your job is to find problems.
+> Challenge the design choices from your domain expertise. What is wrong,
+> underspecified, likely to cause problems, or missing entirely?
+> Be specific and critical. Disagreement with the main review is welcome."
+
+**Additional instructions per agent type:**
+
+- **`game-designer`**: Anchor your review to the Player Fantasy stated in Section B of this GDD. Does this design actually deliver that fantasy? Would a player feel the intended experience? Flag any rules that serve implementability but undermine the stated feeling.
+
+- **`systems-designer`**: For every formula in the GDD, plug in boundary values (minimum and maximum plausible inputs). Report whether any outputs go degenerate — negative values, division by zero, infinity, or nonsensical results at the extremes.
+
+- **`qa-lead`**: Review every acceptance criterion. Flag any that are not independently testable — phrases like "feels balanced", "works correctly", "performs well" are not ACs. Suggest concrete rewrites for any that fail this test.
+
+### Step 3 — Senior lead review
+
+After all specialists respond, spawn `creative-director` as the **senior reviewer**:
+- Provide: the GDD, all specialist findings, any disagreements between them
+- Ask: "Synthesise these findings. What are the most important issues? Do you agree with the specialists? What is your overall verdict on this design?"
+- The creative-director's synthesis becomes the **final verdict** in Phase 4.
+
+### Step 4 — Surface disagreements
+
+If specialists disagree with each other or with the creative-director, do NOT silently pick one view. Present the disagreement explicitly in Phase 4 so the user can adjudicate.
+
+Mark every finding with its source: `[game-designer]`, `[economy-designer]`, `[creative-director]` etc.
+
+---
+
## Phase 4: Output Review
```
## Design Review: [Document Title]
+Specialists consulted: [list agents spawned]
+Re-review: [Yes — prior verdict was X on YYYY-MM-DD / No — first review]
### Completeness: [X/8 sections present]
[List missing sections]
-### Consistency Issues
-[List any internal or cross-system contradictions]
+### Dependency Graph
+[List each declared dependency and whether its GDD file exists on disk]
+- ✓ enemy-definition-data.md — exists
+- ✗ loot-system.md — NOT FOUND (file does not exist yet)
-### Implementability Concerns
-[List any vague or unimplementable sections]
+### Required Before Implementation
+[Numbered list — blocking issues only. Each item tagged with source agent.]
-### Balance Concerns
-[List any obvious balance risks]
+### Recommended Revisions
+[Numbered list — important but not blocking. Source-tagged.]
-### Recommendations
-[Prioritized list of improvements]
+### Specialist Disagreements
+[Any cases where agents disagreed with each other or with the main review.
+Present both sides — do not silently resolve.]
+
+### Nice-to-Have
+[Minor improvements, low priority.]
+
+### Senior Verdict [creative-director]
+[Creative director's synthesis and overall assessment.]
+
+### Scope Signal
+Estimate implementation scope based on: dependency count, formula count,
+systems touched, and whether new ADRs are required.
+- **S** — single system, no formulas, no new ADRs, <3 dependencies
+- **M** — moderate complexity, 1-2 formulas, 3-6 dependencies
+- **L** — multi-system integration, 3+ formulas, may require new ADR
+- **XL** — cross-cutting concern, 5+ dependencies, multiple new ADRs likely
+Label clearly: "Rough scope signal: M (producer should verify before sprint planning)"
### Verdict: [APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED]
```
-This skill is read-only — no files are written.
+This skill is read-only — no files are written during Phase 4.
---
## Phase 5: Next Steps
-If the document being reviewed is `game-concept.md` or `game-pillars.md`:
-- Check if `design/gdd/systems-index.md` exists. If not, recommend: "Run `/map-systems` to break the concept down into individual systems with dependencies and priorities, then write per-system GDDs."
+Use `AskUserQuestion` for ALL closing interactions. Never plain text.
-If the document is an individual system GDD:
-- If verdict is APPROVED: suggest updating the system's status to 'Approved' in the systems index.
-- If verdict is NEEDS REVISION or MAJOR REVISION NEEDED: suggest updating the status to 'In Review'.
+**First widget — what to do next:**
-Next skill options:
-- APPROVED → `/create-epics` or `/map-systems`
-- NEEDS REVISION → revise the doc then re-run `/design-review`
+If APPROVED (first-pass, no revision needed), proceed directly to the systems-index widget, review-log widget, then the final closing widget. Do not show a separate "what to do" widget — the final closing widget covers next steps.
+
+If NEEDS REVISION or MAJOR REVISION NEEDED, options:
+- `[A] Revise the GDD now — address blocking items together`
+- `[B] Stop here — revise in a separate session`
+- `[C] Accept as-is and move on (only if all items are advisory)`
+
+**If user selects [A] — Revise now:**
+
+Work through all blocking items, asking for design decisions only where you cannot resolve the issue from the GDD and existing docs alone. Group all design-decision questions into a single multi-tab `AskUserQuestion` before making any edits — do not interrupt mid-revision for each blocker individually.
+
+After all revisions are complete, show a summary table (blocker → fix applied) and use `AskUserQuestion` for a **post-revision closing widget**:
+
+- Prompt: "Revisions complete — [N] blockers resolved. What next?"
+- Note current context usage: if context is above ~50%, add: "(Recommended: /clear before re-review — this session has used X% context. A full re-review runs 5 agents and needs clean context.)"
+- Options:
+ - `[A] Re-review in a new session — run /design-review [doc-path] after /clear`
+ - `[B] Accept revisions and mark Approved — update systems index, skip re-review`
+ - `[C] Move to next system — /design-system [next-system] (#N in design order)`
+ - `[D] Stop here`
+
+Never end the revision flow with plain text. Always close with this widget.
+
+**Second widget — systems index update (always show this separately):**
+
+Use a second `AskUserQuestion`:
+- Prompt: "May I update `design/gdd/systems-index.md` to mark [system] as [In Review / Approved]?"
+- Options: `[A] Yes — update it` / `[B] No — leave it as-is`
+
+**Third widget — review log (always offer):**
+
+Use a third `AskUserQuestion`:
+- Prompt: "May I append this review summary to `design/gdd/reviews/[doc-name]-review-log.md`? This creates a revision history so future re-reviews can track what changed."
+- Options: `[A] Yes — append to review log` / `[B] No — skip`
+
+If yes, append an entry in this format:
+```
+## Review — [YYYY-MM-DD] — Verdict: [APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED]
+Scope signal: [S/M/L/XL]
+Specialists: [list]
+Blocking items: [count] | Recommended: [count]
+Summary: [2-3 sentence summary of key findings from creative-director verdict]
+Prior verdict resolved: [Yes / No / First review]
+```
+
+---
+
+**Final closing widget — always show after all file writes complete:**
+
+Once the systems-index and review-log widgets are answered, check project state and show one final `AskUserQuestion`:
+
+Before building options, read:
+- `design/gdd/systems-index.md` — find any system with Status: In Review or NEEDS REVISION (other than the one just reviewed)
+- Count `.md` files in `design/gdd/` (excluding game-concept.md, systems-index.md) to determine if `/review-all-gdds` is worth offering (≥2 GDDs)
+- Find the next system with Status: Not Started in design order
+
+Build the option list dynamically — only include options that are genuinely next:
+- `[_] Run /design-review [other-gdd-path] — [system name] is still [In Review / NEEDS REVISION]` (include if another GDD needs review)
+- `[_] Run /consistency-check — verify this GDD's values don't conflict with existing GDDs` (always include if ≥1 other GDD exists)
+- `[_] Run /review-all-gdds — holistic design-theory review across all designed systems` (include if ≥2 GDDs exist)
+- `[_] Run /design-system [next-system] — next in design order` (always include, name the actual system)
+- `[_] Stop here`
+
+Assign letters A, B, C… only to included options. Mark the most pipeline-advancing option as `(recommended)`.
+
+Never end the skill with plain text after file writes. Always close with this widget.
diff --git a/.claude/skills/design-system/SKILL.md b/.claude/skills/design-system/SKILL.md
index e823633..f8edfa5 100644
--- a/.claude/skills/design-system/SKILL.md
+++ b/.claude/skills/design-system/SKILL.md
@@ -10,14 +10,24 @@ When this skill is invoked:
## 1. Parse Arguments & Validate
-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
-A system name or retrofit path is **required**. If missing, fail with:
-> "Usage: `/design-system ` — e.g., `/design-system movement`
-> Or to fill gaps in an existing GDD: `/design-system retrofit design/gdd/[system-name].md`
-> Run `/map-systems` first to create the systems index, then use this skill
-> to write individual system GDDs."
+See `.claude/docs/director-gates.md` for the full check pattern.
+
+A system name or retrofit path is **required**. If missing:
+
+1. Check if `design/gdd/systems-index.md` exists.
+2. If it exists: read it, find the highest-priority system with status "Not Started" or equivalent, and use `AskUserQuestion`:
+ - Prompt: "The next system in your design order is **[system-name]** ([priority] | [layer]). Start designing it?"
+ - Options: `[A] Yes — design [system-name]` / `[B] Pick a different system` / `[C] Stop here`
+ - If [A]: proceed with that system name. If [B]: ask which system to design (plain text). If [C]: exit.
+3. If no systems index exists, fail with:
+ > "Usage: `/design-system ` — e.g., `/design-system movement`
+ > Or to fill gaps in an existing GDD: `/design-system retrofit design/gdd/[system-name].md`
+ > No systems index found. Run `/map-systems` first to map your systems and get the design order."
**Detect retrofit mode:**
If the argument starts with `retrofit` or the argument is a file path to an
@@ -271,7 +281,12 @@ Use the template structure from `.claude/docs/templates/game-design-document.md`
Ask: "May I create the skeleton file at `design/gdd/[system-name].md`?"
-After writing, create `production/session-state/active.md` if it does not exist, then update it with:
+After writing, update `production/session-state/active.md`:
+- Use Glob to check if the file exists.
+- If it **does not exist**: use the **Write** tool to create it. Never attempt Edit on a file that may not exist.
+- If it **already exists**: use the **Edit** tool to update the relevant fields.
+
+File content:
- Task: Designing [system-name] GDD
- Current section: Starting (skeleton created)
- File: design/gdd/[system-name].md
@@ -304,10 +319,24 @@ Context -> Questions -> Options -> Decision -> Draft -> Approval ->
5. **Draft**: Write the section content in conversation text for review. Flag any
provisional assumptions about undesigned dependencies.
-6. **Approval**: Ask "Approve this section, or would you like changes?"
+6. **Approval**: Immediately after the draft — in the SAME response — use
+ `AskUserQuestion`. **NEVER use plain text. NEVER skip this step.**
+ - Prompt: "Approve the [Section Name] section?"
+ - Options: `[A] Approve — write it to file` / `[B] Make changes — describe what to fix` / `[C] Start over`
-7. **Write**: Use the Edit tool to replace the `[To be designed]` placeholder with
- the approved content. Confirm the write.
+ **The draft and the approval widget MUST appear together in one response.
+ If the draft appears without the widget, the user is left at a blank prompt
+ with no path forward — this is a protocol violation.**
+
+7. **Write**: Use the Edit tool to replace the placeholder with the approved content.
+ **CRITICAL**: Always include the section heading in the `old_string` to ensure
+ uniqueness — never match `[To be designed]` alone, as multiple sections use the
+ same placeholder and the Edit tool requires a unique match. Use this pattern:
+ ```
+ old_string: "## [Section Name]\n\n[To be designed]"
+ new_string: "## [Section Name]\n\n[approved content]"
+ ```
+ Confirm the write.
8. **Registry conflict check** (Sections C and D only — Detailed Design and Formulas):
After writing, scan the section content for entity names, item names, formula
@@ -321,7 +350,8 @@ Context -> Questions -> Options -> Decision -> Draft -> Approval ->
(will be handled in Phase 5).
After writing each section, update `production/session-state/active.md` with the
-completed section name.
+completed section name. Use Glob to check if the file exists — use Write to create
+it if absent, Edit to update it if present.
### Section-Specific Guidance
@@ -333,6 +363,20 @@ Each section has unique design considerations and may benefit from specialist ag
**Goal**: One paragraph a stranger could read and understand.
+**Derive recommended options before building the widget**: Read the system's category and layer from the systems index (already in context from Phase 2), then determine the recommended option for each tab:
+- **Framing tab**: Foundation/Infrastructure layer → `[A]` recommended. Player-facing categories (Combat, UI, Dialogue, Character, Animation, Visual Effects, Audio) → `[C] Both` recommended.
+- **ADR ref tab**: Glob `docs/architecture/adr-*.md` and grep for the system name in the GDD Requirements section of any ADR. If a matching ADR is found → `[A] Yes — cite the ADR` recommended. If none found → `[B] No` recommended.
+- **Fantasy tab**: Foundation/Infrastructure layer → `[B] No` recommended. All other categories → `[A] Yes` recommended.
+
+Append `(Recommended)` to the appropriate option text in each tab.
+
+**Framing questions (ask BEFORE drafting)**: Use `AskUserQuestion` with a multi-tab widget:
+- Tab "Framing" — "How should the overview frame this system?" Options: `[A] As a data/infrastructure layer (technical framing)` / `[B] Through its player-facing effect (design framing)` / `[C] Both — describe the data layer and its player impact`
+- Tab "ADR ref" — "Should the overview reference the existing ADR for this system?" Options: `[A] Yes — cite the ADR for implementation details` / `[B] No — keep the GDD at pure design level`
+- Tab "Fantasy" — "Does this system have a player fantasy worth stating?" Options: `[A] Yes — players feel it directly` / `[B] No — pure infrastructure, players feel what it enables`
+
+Use the user's answers to shape the draft. Do NOT answer these questions yourself and auto-draft.
+
**Questions to ask**:
- What is this system in one sentence?
- How does a player interact with it? (active/passive/automatic)
@@ -341,12 +385,32 @@ Each section has unique design considerations and may benefit from specialist ag
**Cross-reference**: Check that the description aligns with how the systems index
describes it. Flag discrepancies.
+**Design vs. implementation boundary**: Overview questions must stay at the behavior
+level — what the system *does*, not *how it is built*. If implementation questions
+arise during the Overview (e.g., "Should this use an Autoload singleton or a signal
+bus?"), note them as "→ becomes an ADR" and move on. Implementation patterns belong
+in `/architecture-decision`, not the GDD. The GDD describes behavior; the ADR
+describes the technical approach used to achieve it.
+
---
### Section B: Player Fantasy
**Goal**: The emotional target — what the player should *feel*.
+**Derive recommended option before building the widget**: Read the system's category and layer from Phase 2 context:
+- Player-facing categories (Combat, UI, Dialogue, Character, Animation, Audio, Level/World) → `[A] Direct` recommended
+- Foundation/Infrastructure layer → `[B] Indirect` recommended
+- Mixed categories (Camera/input, Economy, AI with visible player effects) → `[C] Both` recommended
+
+Append `(Recommended)` to the appropriate option text.
+
+**Framing question (ask BEFORE drafting)**: Use `AskUserQuestion`:
+- Prompt: "Is this system something the player engages with directly, or infrastructure they experience indirectly?"
+- Options: `[A] Direct — player actively uses or feels this system` / `[B] Indirect — player experiences the effects, not the system` / `[C] Both — has a direct interaction layer and infrastructure beneath it`
+
+Use the answer to frame the Player Fantasy section appropriately. Do NOT assume the answer.
+
**Questions to ask**:
- What emotion or power fantasy does this serve?
- What reference games nail this feeling? What specifically creates it?
@@ -355,6 +419,16 @@ describes it. Flag discrepancies.
**Cross-reference**: Must align with the game pillars. If the system serves a pillar,
quote the relevant pillar text.
+**Agent delegation (MANDATORY)**: After the framing answer is given but before drafting,
+spawn `creative-director` via Task:
+- Provide: system name, framing answer (direct/indirect/both), game pillars, any reference games the user mentioned, the game concept summary
+- Ask: "Shape the Player Fantasy for this system. What emotion or power fantasy should it serve? What player moment should we anchor to? What tone and language fits the game's established feeling? Be specific — give me 2-3 candidate framings."
+- Collect the creative-director's framings and present them to the user alongside the draft.
+
+**Do NOT draft Section B without first consulting `creative-director`.** The framing
+answer tells us *what kind* of fantasy it is; the creative-director shapes *how it's
+described* — tone, language, the specific player moment to anchor to.
+
---
### Section C: Detailed Design (Core Rules, States, Interactions)
@@ -375,9 +449,15 @@ This is usually the largest section. Break it into sub-sections:
- What are the decision points the player faces?
- What can the player NOT do? (Constraints are as important as capabilities)
-**Agent delegation**: For complex mechanics, use the Task tool to delegate to
-`game-designer` for high-level design review, or `systems-designer` for detailed
-mechanical modeling. Provide the full context gathered in Phase 2.
+**Agent delegation (MANDATORY)**: Before drafting Section C, spawn specialist agents via Task in parallel:
+- Look up the system category in the routing table (Section 6 of this skill)
+- Spawn the Primary Agent AND Supporting Agent(s) listed for this category
+- Provide each agent: system name, game concept summary, pillar set, dependency GDD excerpts, the specific section being worked on
+- Collect their findings before drafting
+- Surface any disagreements between agents to the user via `AskUserQuestion`
+- Draft only after receiving specialist input
+
+**Do NOT draft Section C without first consulting the appropriate specialists.** A `systems-designer` reviewing rules and mechanics will catch design gaps the main session cannot.
**Cross-reference**: For each interaction listed, verify it matches what the
dependency GDD specifies. If a dependency defines a value or formula and this
@@ -414,14 +494,12 @@ table. A formula without defined variables cannot be implemented without guesswo
- Should scaling be linear, logarithmic, or stepped?
- What should the output ranges be at early/mid/late game?
-**Agent delegation**: For formula-heavy systems, delegate to `systems-designer`
-via the Task tool. Provide:
-- The Core Rules from Section C (already written to file)
-- Tuning goals from the user
-- Balance context from dependency GDDs
-
-The agent should return proposed formulas with variable tables and expected output
-ranges. Present these to the user for review before approving.
+**Agent delegation (MANDATORY)**: Before proposing any formulas or balance values, spawn specialist agents via Task in parallel:
+- **Always spawn `systems-designer`**: provide Core Rules from Section C, tuning goals from user, balance context from dependency GDDs. Ask them to propose formulas with variable tables and output ranges.
+- **For economy/cost systems, also spawn `economy-designer`**: provide placement costs, upgrade cost intent, and progression goals. Ask them to validate cost curves and ratios.
+- Present the specialists' proposals to the user for review via `AskUserQuestion`
+- The user decides; the main session writes to file
+- **Do NOT invent formula values or balance numbers without specialist input.** A user without balance design expertise cannot evaluate raw numbers — they need the specialists' reasoning.
**Cross-reference**: If a dependency GDD defines a formula whose output feeds into
this system, reference it explicitly. Don't reinvent — connect.
@@ -448,9 +526,7 @@ design question, not a specification.
- What happens when two rules apply at the same time?
- What happens if a player finds an unintended interaction? (Identify degenerate strategies)
-**Agent delegation**: For systems with complex interactions, delegate to
-`systems-designer` to identify edge cases from the formula space. For narrative
-systems, consult `narrative-director` for story-breaking edge cases.
+**Agent delegation (MANDATORY)**: Spawn `systems-designer` via Task before finalising edge cases. Provide: the completed Sections C and D, and ask them to identify edge cases from the formula and rule space that the main session may have missed. For narrative systems, also spawn `narrative-director`. Present their findings and ask the user which to include.
**Cross-reference**: Check edge cases against dependency GDDs. If a dependency
defines a floor, cap, or resolution rule that this system could violate, flag it.
@@ -506,6 +582,8 @@ Include at least: one criterion per core rule from Section C, and one per formul
from Section D. Do NOT write "the system works as designed" — every criterion must
be independently verifiable by a QA tester without reading the GDD.
+**Agent delegation (MANDATORY)**: Spawn `qa-lead` via Task before finalising acceptance criteria. Provide: the completed GDD sections C, D, E, and ask them to validate that the criteria are independently testable and cover all core rules and formulas. Surface any gaps or untestable criteria to the user.
+
**Questions to ask**:
- What's the minimum set of tests that prove this works?
- What performance budget does this system get? (frame time, memory)
@@ -518,16 +596,30 @@ not just this system in isolation.
### Optional Sections: Visual/Audio, UI Requirements, Open Questions
-These sections are included in the template but aren't part of the 8 required
-sections. Offer them after the required sections are done:
+These sections are included in the template. Visual/Audio is **REQUIRED** for visual system categories — not optional. Determine the requirement level before asking:
+
+**Visual/Audio is REQUIRED (mandatory — do not offer to skip) for these system categories:**
+- Combat, damage, health
+- UI systems (HUD, menus)
+- Animation, character movement
+- Visual effects, particles, shaders
+- Character systems
+- Dialogue, quests, lore
+- Level/world systems
+
+For required systems: **spawn `art-director` via Task** before drafting this section. Provide: system name, game concept, game pillars, art bible sections 1–4 if they exist. Ask them to specify: (1) VFX and visual feedback requirements for this system's events, (2) any animation or visual style constraints, (3) which art bible principles most directly apply to this system. Present their output; do NOT leave this section as `[To be designed]` for visual systems.
+
+For **all other system categories** (Foundation/Infrastructure, Economy, AI/pathfinding, Camera/input), offer the optional sections after the required sections:
Use `AskUserQuestion`:
- "The 8 required sections are complete. Do you want to also define Visual/Audio
requirements, UI requirements, or capture open questions?"
- Options: "Yes, all three", "Just open questions", "Skip — I'll add these later"
-For **Visual/Audio**: Coordinate with `art-director` and `audio-director` if detail
-is needed. Often a brief note suffices at the GDD stage.
+For **Visual/Audio** (non-required systems): Coordinate with `art-director` and `audio-director` if detail is needed. Often a brief note suffices at the GDD stage.
+
+> **Asset Spec Flag**: After the Visual/Audio section is written with real content, output this notice:
+> "📌 **Asset Spec** — Visual/Audio requirements are defined. After the art bible is approved, run `/asset-spec system:[system-name]` to produce per-asset visual descriptions, dimensions, and generation prompts from this section."
For **UI Requirements**: Coordinate with `ux-designer` for complex UI systems.
After writing this section, check whether it contains real content (not just
@@ -562,6 +654,11 @@ the source of truth). Verify:
### 5a-bis: Creative Director Pillar Review
+**Review mode check** — apply before spawning CD-GDD-ALIGN:
+- `solo` → skip. Note: "CD-GDD-ALIGN skipped — Solo mode." Proceed to Step 5b.
+- `lean` → skip (not a PHASE-GATE). Note: "CD-GDD-ALIGN skipped — Lean mode." Proceed to Step 5b.
+- `full` → spawn as normal.
+
Before finalizing the GDD, spawn `creative-director` via Task using gate **CD-GDD-ALIGN** (`.claude/docs/director-gates.md`).
Pass: completed GDD file path, game pillars (from `design/gdd/game-concept.md` or `design/gdd/game-pillars.md`), MDA aesthetics target.
@@ -610,11 +707,14 @@ Present a completion summary:
> - Provisional assumptions: [list any assumptions about undesigned dependencies]
> - Cross-system conflicts found: [list or "none"]
-Use `AskUserQuestion`:
-- "Run `/design-review` now to validate the GDD?"
- - Options: "Yes, run review now", "I'll review it myself first", "Skip review"
+> **To validate this GDD, open a fresh Claude Code session and run:**
+> `/design-review design/gdd/[system-name].md`
+>
+> **Never run `/design-review` in the same session as `/design-system`.** The reviewing
+> agent must be independent of the authoring context. Running it here would inherit
+> the full design history, making independent critique impossible.
-If yes, invoke the design-review skill on the completed file.
+**NEVER offer to run `/design-review` inline.** Always direct the user to a fresh window.
### 5d: Update Systems Index
@@ -645,6 +745,7 @@ Update `production/session-state/active.md` with:
Use `AskUserQuestion`:
- "What's next?"
- Options:
+ - "Run `/consistency-check` — verify this GDD's values don't conflict with existing GDDs (recommended before designing the next system)"
- "Design next system ([next-in-order])" — if undesigned systems remain
- "Fix review findings" — if design-review flagged issues
- "Stop here for this session"
@@ -659,15 +760,19 @@ orchestrates the overall flow; agents provide expert content.
| System Category | Primary Agent | Supporting Agent(s) |
|----------------|---------------|---------------------|
-| Combat, damage, health | `game-designer` | `systems-designer` (formulas), `ai-programmer` (enemy AI) |
+| **Foundation/Infrastructure** (event bus, save/load, scene mgmt, service locator) | `systems-designer` | `gameplay-programmer` (feasibility), `engine-programmer` (engine integration) |
+| Combat, damage, health | `game-designer` | `systems-designer` (formulas), `ai-programmer` (enemy AI), `art-director` (hit feedback visual direction, VFX intent) |
| Economy, loot, crafting | `economy-designer` | `systems-designer` (curves), `game-designer` (loops) |
| Progression, XP, skills | `game-designer` | `systems-designer` (curves), `economy-designer` (sinks) |
-| Dialogue, quests, lore | `game-designer` | `narrative-director` (story), `writer` (content) |
-| UI systems (HUD, menus) | `game-designer` | `ux-designer` (flows), `ui-programmer` (feasibility) |
+| Dialogue, quests, lore | `game-designer` | `narrative-director` (story), `writer` (content), `art-director` (character visual profiles, cinematic tone) |
+| UI systems (HUD, menus) | `game-designer` | `ux-designer` (flows), `ui-programmer` (feasibility), `art-director` (visual style direction), `technical-artist` (render/shader constraints) |
| Audio systems | `game-designer` | `audio-director` (direction), `sound-designer` (specs) |
| AI, pathfinding, behavior | `game-designer` | `ai-programmer` (implementation), `systems-designer` (scoring) |
| Level/world systems | `game-designer` | `level-designer` (spatial), `world-builder` (lore) |
| Camera, input, controls | `game-designer` | `ux-designer` (feel), `gameplay-programmer` (feasibility) |
+| Animation, character movement | `game-designer` | `art-director` (animation style, pose language), `technical-artist` (rig/blend constraints), `gameplay-programmer` (feel) |
+| Visual effects, particles, shaders | `game-designer` | `art-director` (VFX visual direction), `technical-artist` (performance budget, shader complexity), `systems-designer` (trigger/state integration) |
+| Character systems (stats, archetypes) | `game-designer` | `art-director` (character visual archetype), `narrative-director` (character arc alignment), `systems-designer` (stat formulas) |
**When delegating via Task tool**:
- Provide: system name, game concept summary, dependency GDD excerpts, the specific
@@ -715,3 +820,13 @@ This skill follows the collaborative design principle at every step:
**Never** write a section without user approval.
**Never** contradict an existing approved GDD without flagging the conflict.
**Always** show where decisions come from (dependency GDDs, pillars, user choices).
+
+## Context Window Awareness
+
+This is a long-running skill. After writing each section, check if the status line
+shows context at or above 70%. If so, append this notice to the response:
+
+> **Context is approaching the limit (≥70%).** Your progress is saved — all approved
+> sections are written to `design/gdd/[system-name].md`. When you're ready to continue,
+> open a fresh Claude Code session and run `/design-system [system-name]` — it will
+> detect which sections are complete and resume from the next one.
diff --git a/.claude/skills/dev-story/SKILL.md b/.claude/skills/dev-story/SKILL.md
index f12fea5..10397e4 100644
--- a/.claude/skills/dev-story/SKILL.md
+++ b/.claude/skills/dev-story/SKILL.md
@@ -3,8 +3,7 @@ name: dev-story
description: "Read a story file and implement it. Loads the full context (story, GDD requirement, ADR guidelines, control manifest), routes to the right programmer agent for the system and engine, implements the code and test, and confirms each acceptance criterion. The core implementation skill — run after /story-readiness, before /code-review and /story-done."
argument-hint: "[story-path]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Bash, Task
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Bash, Task, AskUserQuestion
---
# Dev Story
@@ -15,12 +14,15 @@ drives implementation to completion — including writing the test.
**The loop for every story:**
```
+/qa-plan sprint ← define test requirements before sprint begins
/story-readiness [path] ← validate before starting
/dev-story [path] ← implement it (this skill)
/code-review [files] ← review it
/story-done [path] ← verify and close it
```
+**After all sprint stories are done:** run `/team-qa sprint` to execute the full QA cycle and get a sign-off verdict before advancing the project stage.
+
**Output:** Source code + test file in the project's `src/` and `tests/` directories.
---
@@ -38,7 +40,17 @@ If not found, ask: "Which story are we implementing?" Glob
## Phase 2: Load Full Context
-Read everything in this order — do not start implementation until all is loaded:
+**Before loading any context, verify required files exist.** Extract the ADR path from the story's `ADR Governing Implementation` field, then check:
+
+| File | Path | If missing |
+|------|------|------------|
+| TR registry | `docs/architecture/tr-registry.yaml` | **STOP** — "TR registry not found. Run `/create-epics` to generate it." |
+| Governing ADR | path from story's ADR field | **STOP** — "ADR file [path] not found. Run `/architecture-decision` to create it, or correct the filename in the story's ADR field." |
+| Control manifest | `docs/architecture/control-manifest.md` | **WARN and continue** — "Control manifest not found — layer rules cannot be checked. Run `/create-control-manifest`." |
+
+If the TR registry or governing ADR is missing, set the story status to **BLOCKED** in the session state and do not spawn any programmer agent.
+
+Read all of the following simultaneously — these are independent reads. Do not start implementation until all context is loaded:
### The story file
Extract and hold:
@@ -71,9 +83,16 @@ Read `docs/architecture/control-manifest.md`. Extract the rules for this story's
- Performance guardrails
Check: does the story's embedded Manifest Version match the current manifest header date?
-If they differ: "Story was written against manifest v[story-date]. Current manifest is
-v[current-date]. New rules may apply — reviewing the diff before implementing."
-Read the manifest carefully for any new rules added since the story was written.
+If they differ, use `AskUserQuestion` before proceeding:
+- Prompt: "Story was written against manifest v[story-date]. Current manifest is v[current-date]. New rules may apply. How do you want to proceed?"
+- Options:
+ - `[A] Update story manifest version and implement with current rules (Recommended)`
+ - `[B] Implement with old rules — I accept the risk of non-compliance`
+ - `[C] Stop here — I want to review the manifest diff first`
+
+If [A]: edit the story file's `Manifest Version:` field to the current manifest date before spawning the programmer. Then read the manifest carefully for new rules.
+If [B]: read the manifest carefully for new rules anyway, and note the version mismatch in the Phase 6 summary under "Deviations".
+If [C]: stop. Do not spawn any agent. Let the user review and re-run `/dev-story`.
### Engine reference
Read `.claude/docs/technical-preferences.md`:
@@ -89,6 +108,9 @@ Read `.claude/docs/technical-preferences.md`:
Based on the story's **Layer**, **Type**, and **system name**, determine which
specialist to spawn via Task.
+**Config/Data stories — skip agent spawning entirely:**
+If the story's Type is `Config/Data`, no programmer agent or engine specialist is needed. Jump directly to Phase 4 (Config/Data note). The implementation is a data file edit — no routing table evaluation, no engine specialist.
+
### Primary agent routing table
| Story context | Primary agent |
diff --git a/.claude/skills/gate-check/SKILL.md b/.claude/skills/gate-check/SKILL.md
index 767bc3b..9625dbb 100644
--- a/.claude/skills/gate-check/SKILL.md
+++ b/.claude/skills/gate-check/SKILL.md
@@ -3,8 +3,7 @@ name: gate-check
description: "Validate readiness to advance between development phases. Produces a PASS/CONCERNS/FAIL verdict with specific blockers and required artifacts. Use when user says 'are we ready to move to X', 'can we advance to production', 'check if we can start the next phase', 'pass the gate'."
argument-hint: "[target-phase: systems-design | technical-setup | pre-production | production | polish | release] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Bash, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Bash, Write, Task, AskUserQuestion
model: opus
---
@@ -37,14 +36,24 @@ The project progresses through these stages:
**Target phase:** `$ARGUMENTS[0]` (blank = auto-detect current stage, then validate next transition)
-Also extract `--review [full|lean|solo]` if present. Note: in `solo` mode,
-director spawns (CD-PHASE-GATE, TD-PHASE-GATE, PR-PHASE-GATE) are skipped —
-gate-check becomes artifact-existence checks only. In `lean` mode, all three
-directors still run (phase gates are the purpose of lean mode).
+Also resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+Note: in `solo` mode, director spawns (CD-PHASE-GATE, TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE) are skipped — gate-check becomes artifact-existence checks only. In `lean` mode, all four directors still run (phase gates are the purpose of lean mode).
- **With argument**: `/gate-check production` — validate readiness for that specific phase
- **No argument**: Auto-detect current stage using the same heuristics as
- `/project-stage-detect`, then validate the NEXT phase transition
+ `/project-stage-detect`, then **confirm with the user before running**:
+
+ Use `AskUserQuestion`:
+ - Prompt: "Detected stage: **[current stage]**. Running gate for [Current] → [Next] transition. Is this correct?"
+ - Options:
+ - `[A] Yes — run this gate`
+ - `[B] No — pick a different gate` (if selected, show a second widget listing all gate options: Concept → Systems Design, Systems Design → Technical Setup, Technical Setup → Pre-Production, Pre-Production → Production, Production → Polish, Polish → Release)
+
+ Do not skip this confirmation step when no argument is provided.
---
@@ -55,11 +64,13 @@ directors still run (phase gates are the purpose of lean mode).
**Required Artifacts:**
- [ ] `design/gdd/game-concept.md` exists and has content
- [ ] Game pillars defined (in concept doc or `design/gdd/game-pillars.md`)
+- [ ] Visual Identity Anchor section exists in `design/gdd/game-concept.md` (from brainstorm Phase 4 art-director output)
**Quality Checks:**
- [ ] Game concept has been reviewed (`/design-review` verdict not MAJOR REVISION NEEDED)
- [ ] Core loop is described and understood
- [ ] Target audience is identified
+- [ ] Visual Identity Anchor contains a one-line visual rule and at least 2 supporting visual principles
---
@@ -85,6 +96,7 @@ directors still run (phase gates are the purpose of lean mode).
**Required Artifacts:**
- [ ] Engine chosen (CLAUDE.md Technology Stack is not `[CHOOSE]`)
- [ ] Technical preferences configured (`.claude/docs/technical-preferences.md` populated)
+- [ ] Art bible exists at `design/art/art-bible.md` with at least Sections 1–4 (Visual Identity Foundation)
- [ ] At least 3 Architecture Decision Records in `docs/architecture/` covering
Foundation-layer systems (scene management, event architecture, save/load)
- [ ] Engine reference docs exist in `docs/engine-reference/[engine]/`
@@ -110,6 +122,13 @@ directors still run (phase gates are the purpose of lean mode).
- [ ] Architecture traceability matrix has **zero Foundation layer gaps**
(all Foundation requirements must have ADR coverage before Pre-Production)
+**ADR Circular Dependency Check**: For all ADRs in `docs/architecture/`, read each ADR's
+"ADR Dependencies" / "Depends On" section. Build a dependency graph (ADR-A → ADR-B means
+A depends on B). If any cycle is detected (e.g. A→B→A, or A→B→C→A):
+- Flag as **FAIL**: "Circular ADR dependency: [ADR-X] → [ADR-Y] → [ADR-X].
+ Neither can reach Accepted while the cycle exists. Remove one 'Depends On' edge to
+ break the cycle."
+
**Engine Validation** (read `docs/engine-reference/[engine]/VERSION.md` first):
- [ ] ADRs that touch post-cutoff engine APIs are flagged with Knowledge Risk: HIGH/MEDIUM
- [ ] `/architecture-review` engine audit shows no deprecated API usage
@@ -122,6 +141,8 @@ directors still run (phase gates are the purpose of lean mode).
**Required Artifacts:**
- [ ] At least 1 prototype in `prototypes/` with a README
- [ ] First sprint plan exists in `production/sprints/`
+- [ ] Art bible is complete (all 9 sections) and AD-ART-BIBLE sign-off verdict is recorded in `design/art/art-bible.md`
+- [ ] Character visual profiles exist for key characters referenced in narrative docs
- [ ] All MVP-tier GDDs from systems index are complete
- [ ] Master architecture document exists at `docs/architecture/architecture.md`
- [ ] At least 3 ADRs covering Foundation-layer decisions exist in `docs/architecture/`
@@ -174,6 +195,8 @@ directors still run (phase gates are the purpose of lean mode).
- [ ] Test files exist in `tests/unit/` and `tests/integration/` covering Logic and Integration stories
- [ ] All Logic stories from this sprint have corresponding unit test files in `tests/unit/`
- [ ] Smoke check has been run with a PASS or PASS WITH WARNINGS verdict — report exists in `production/qa/`
+- [ ] QA plan exists in `production/qa/` (generated by `/qa-plan`) covering this sprint or final production sprint
+- [ ] QA sign-off report exists in `production/qa/` (generated by `/team-qa`) with verdict APPROVED or APPROVED WITH CONDITIONS
- [ ] At least 3 distinct playtest sessions documented in `production/playtests/`
- [ ] Playtest reports cover: new player experience, mid-game systems, and difficulty curve
- [ ] Fun hypothesis from Game Concept has been explicitly validated or revised
@@ -236,6 +259,14 @@ For each item in the target gate:
- Don't just check existence — verify the file has real content (not just a template header)
- For code checks, verify directory structure and file counts
+**Systems Design → Technical Setup gate — cross-GDD review check**:
+Use `Glob('design/gdd/gdd-cross-review-*.md')` to find the `/review-all-gdds` report.
+If no file matches, mark the "cross-GDD review report exists" artifact as **FAIL** and
+surface it prominently: "No `/review-all-gdds` report found in `design/gdd/`. Run
+`/review-all-gdds` before advancing to Technical Setup."
+If a file is found, read it and check the verdict line: a FAIL verdict means the
+cross-GDD consistency check failed and must be resolved before advancing.
+
### Quality Checks
- For test checks: Run the test suite via `Bash` if a test runner is configured
- For design review checks: `Read` the GDD and check for the 8 required sections
@@ -264,17 +295,18 @@ For items that can't be automatically verified, **ask the user**:
## 4b. Director Panel Assessment
-Before generating the final verdict, spawn all three directors as **parallel subagents** via Task using the parallel gate protocol from `.claude/docs/director-gates.md`. Issue all three Task calls simultaneously — do not wait for one before starting the next.
+Before generating the final verdict, spawn all four directors as **parallel subagents** via Task using the parallel gate protocol from `.claude/docs/director-gates.md`. Issue all four Task calls simultaneously — do not wait for one before starting the next.
**Spawn in parallel:**
1. **`creative-director`** — gate **CD-PHASE-GATE** (`.claude/docs/director-gates.md`)
2. **`technical-director`** — gate **TD-PHASE-GATE** (`.claude/docs/director-gates.md`)
3. **`producer`** — gate **PR-PHASE-GATE** (`.claude/docs/director-gates.md`)
+4. **`art-director`** — gate **AD-PHASE-GATE** (`.claude/docs/director-gates.md`)
Pass to each: target phase name, list of artifacts present, and the context fields listed in that gate's definition.
-**Collect all three responses, then present the Director Panel summary:**
+**Collect all four responses, then present the Director Panel summary:**
```
## Director Panel Assessment
@@ -287,12 +319,15 @@ Technical Director: [READY / CONCERNS / NOT READY]
Producer: [READY / CONCERNS / NOT READY]
[feedback]
+
+Art Director: [READY / CONCERNS / NOT READY]
+ [feedback]
```
**Apply to the verdict:**
- Any director returns NOT READY → verdict is minimum FAIL (user may override with explicit acknowledgement)
- Any director returns CONCERNS → verdict is minimum CONCERNS
-- All three READY → eligible for PASS (still subject to artifact and quality checks from Section 3)
+- All four READY → eligible for PASS (still subject to artifact and quality checks from Section 3)
---
@@ -387,17 +422,53 @@ echo -n "Production" > production/stage.txt
---
-## 7. Follow-Up Actions
+## 7. Closing Next-Step Widget
+
+After the verdict is presented and any stage.txt update is complete, close with a structured next-step prompt using `AskUserQuestion`.
+
+**Tailor the options to the gate that just ran:**
+
+For **systems-design PASS**:
+```
+Gate passed. What would you like to do next?
+[A] Run /create-architecture — produce your master architecture blueprint and ADR work plan (recommended next step)
+[B] Design more GDDs first — return here when all MVP systems are complete
+[C] Stop here for this session
+```
+
+> **Note for systems-design PASS**: `/create-architecture` is the required next step before writing any ADRs. It produces the master architecture document and a prioritized list of ADRs to write. Running `/architecture-decision` without this step means writing ADRs without a blueprint — skip it at your own risk.
+
+For **technical-setup PASS**:
+```
+Gate passed. What would you like to do next?
+[A] Start Pre-Production — begin prototyping the Vertical Slice
+[B] Write more ADRs first — run /architecture-decision [next-system]
+[C] Stop here for this session
+```
+
+For all other gates, offer the two most logical next steps for that phase plus "Stop here".
+
+---
+
+## 8. Follow-Up Actions
Based on the verdict, suggest specific next steps:
+- **No art bible?** → `/art-bible` to create the visual identity specification
+- **Art bible exists but no asset specs?** → `/asset-spec system:[name]` to generate per-asset visual specs and generation prompts from approved GDDs
- **No game concept?** → `/brainstorm` to create one
- **No systems index?** → `/map-systems` to decompose the concept into systems
- **Missing design docs?** → `/reverse-document` or delegate to `game-designer`
- **Small design change needed?** → `/quick-design` for changes under ~4 hours (bypasses full GDD pipeline)
- **No UX specs?** → `/ux-design [screen name]` to author specs, or `/team-ui [feature]` for full pipeline
- **UX specs not reviewed?** → `/ux-review [file]` or `/ux-review all` to validate
-- **No accessibility requirements doc?** → create `design/accessibility-requirements.md` using the accessibility-requirements template
+- **No accessibility requirements doc?** → Use `AskUserQuestion` to offer to create it now:
+ - Prompt: "The gate requires `design/accessibility-requirements.md`. Shall I create it from the template?"
+ - Options: `Create it now — I'll choose an accessibility tier`, `I'll create it myself`, `Skip for now`
+ - If "Create it now": use a second `AskUserQuestion` to ask for the tier:
+ - Prompt: "Which accessibility tier fits this project?"
+ - Options: `Basic — remapping + subtitles only (lowest effort)`, `Standard — Basic + colorblind modes + scalable UI`, `Comprehensive — Standard + motor accessibility + full settings menu`, `Exemplary — Comprehensive + external audit + full customization`
+ - Then write `design/accessibility-requirements.md` using the template at `.claude/docs/templates/accessibility-requirements.md`, filling in the chosen tier. Confirm: "May I write `design/accessibility-requirements.md`?"
- **No interaction pattern library?** → `/ux-design patterns` to initialize it
- **GDDs not cross-reviewed?** → `/review-all-gdds` (run after all MVP GDDs are individually approved)
- **Cross-GDD consistency issues?** → fix flagged GDDs, then re-run `/review-all-gdds`
diff --git a/.claude/skills/help/SKILL.md b/.claude/skills/help/SKILL.md
index 2ddd406..12757c8 100644
--- a/.claude/skills/help/SKILL.md
+++ b/.claude/skills/help/SKILL.md
@@ -27,6 +27,29 @@ the artifact globs that indicate completion.
---
+## Step 1b: Find Skills Not in the Catalog
+
+After reading the catalog, Glob `.claude/skills/*/SKILL.md` to get the full list
+of installed skills. For each file, extract the `name:` field from its frontmatter.
+
+Compare against the `command:` values in the catalog. Any skill whose name does
+not appear as a catalog command is an **uncataloged skill** — still usable but not
+part of the phase-gated workflow.
+
+Collect these for the output in Step 7 — show them as a footer block:
+
+```
+### Also installed (not in workflow)
+- `/skill-name` — [description from SKILL.md frontmatter]
+- `/skill-name` — [description]
+```
+
+Only show this block if at least one uncataloged skill exists. Limit to the 10
+most relevant based on the user's current phase (QA skills in production, team
+skills in production/polish, etc.).
+
+---
+
## Step 2: Determine Current Phase
Check in this order:
diff --git a/.claude/skills/hotfix/SKILL.md b/.claude/skills/hotfix/SKILL.md
index 60a7314..2efd09d 100644
--- a/.claude/skills/hotfix/SKILL.md
+++ b/.claude/skills/hotfix/SKILL.md
@@ -84,27 +84,71 @@ Use the Task tool to request sign-off in parallel:
- `subagent_type: qa-tester` — Run targeted regression tests on the affected system
- `subagent_type: producer` — Approve deployment timing and communication plan
+All three must return APPROVE before proceeding. If any returns CONCERNS or REJECT, do not deploy — surface the issue and resolve it first.
+
---
-## Phase 6: Summary
+## Phase 5b: QA Re-Entry Gate
-Output a summary with: severity, root cause, fix applied, testing status, and what approvals are still needed before deployment.
+After approvals, determine the QA scope required before deploying the hotfix. Spawn `qa-lead` via Task with:
+- The hotfix description and affected system
+- The regression test results from Phase 5
+- A list of all systems that touch the changed files (use Grep to find callers)
+
+Ask qa-lead: **Is a full smoke check sufficient, or does this fix require a targeted team-qa pass?**
+
+Apply the verdict:
+- **Smoke check sufficient** — run `/smoke-check` against the hotfix build. If PASS, proceed to Phase 6.
+- **Targeted QA pass required** — run `/team-qa [affected-system]` scoped to the changed system only. If QA returns APPROVED or APPROVED WITH CONDITIONS, proceed to Phase 6.
+- **Full QA required** — S1 fixes that touch core systems may require a full `/team-qa sprint`. This delays deployment but prevents a bad patch.
+
+Do not skip this gate. A hotfix that breaks something else is worse than the original bug.
+
+---
+
+## Phase 6: Update Bug Status and Deploy
+
+Update the original bug file if one exists:
+
+```markdown
+## Fix Record
+**Fixed in**: hotfix/[branch-name] — [commit hash or description]
+**Fixed date**: [date]
+**Status**: Fixed — Pending Verification
+```
+
+Set `**Status**: Fixed — Pending Verification` in the bug file header.
+
+Output a deployment summary:
+
+```
+## Hotfix Ready to Deploy: [short-name]
+
+**Severity**: [S1/S2]
+**Root cause**: [one line]
+**Fix**: [one line]
+**QA gate**: [Smoke check PASS / Team-QA APPROVED]
+**Approvals**: lead-programmer ✓ / qa-tester ✓ / producer ✓
+**Rollback plan**: [from Phase 2 record]
+
+Merge to: release branch AND development branch
+Next: /bug-report verify [BUG-ID] after deploy to confirm resolution
+```
### Rules
-- Hotfixes must be the MINIMUM change to fix the issue — no cleanup, no refactoring, no "while we're here" changes
+- Hotfixes must be the MINIMUM change to fix the issue — no cleanup, no refactoring
- Every hotfix must have a rollback plan documented before deployment
- Hotfix branches merge to BOTH the release branch AND the development branch
- All hotfixes require a post-incident review within 48 hours
-- If the fix is complex enough to need more than 4 hours, escalate to technical-director for a scope decision
+- If the fix is complex enough to need more than 4 hours, escalate to `technical-director`
---
-## Phase 7: Next Steps
+## Phase 7: Post-Deploy Verification
-Verdict: **COMPLETE** — hotfix applied and backported.
+After deploying, run `/bug-report verify [BUG-ID]` to confirm the fix resolved the issue in the deployed build.
-After the fix is approved and merged:
+If VERIFIED FIXED: run `/bug-report close [BUG-ID]` to formally close it.
+If STILL PRESENT: the hotfix failed — immediately re-open, assess rollback, and escalate.
-- Run `/smoke-check` to verify critical paths are intact.
-- Run `/code-review` on the hotfix diff before merging to main.
-- Schedule a post-incident review within 48 hours.
+Schedule a post-incident review within 48 hours using `/retrospective hotfix`.
diff --git a/.claude/skills/localize/SKILL.md b/.claude/skills/localize/SKILL.md
index 8ae155a..8b241bf 100644
--- a/.claude/skills/localize/SKILL.md
+++ b/.claude/skills/localize/SKILL.md
@@ -1,20 +1,31 @@
---
name: localize
-description: "Run the localization workflow: extract strings, validate localization readiness, check for hardcoded text, and generate translation-ready string tables."
-argument-hint: "[scan|extract|validate|status]"
+description: "Full localization pipeline: scan for hardcoded strings, extract and manage string tables, validate translations, generate translator briefings, run cultural/sensitivity review, manage VO localization, test RTL/platform requirements, enforce string freeze, and report coverage."
+argument-hint: "[scan|extract|validate|status|brief|cultural-review|vo-pipeline|rtl-check|freeze|qa]"
user-invocable: true
agent: localization-lead
-allowed-tools: Read, Glob, Grep, Write, Bash
+allowed-tools: Read, Glob, Grep, Write, Bash, Task, AskUserQuestion
---
-## Phase 1: Parse Subcommand
+# Localization Pipeline
-Determine the mode from the argument:
+Localization is not just translation — it is the full process of making a game
+feel native in every language and region. Poor localization breaks immersion,
+confuses players, and blocks platform certification. This skill covers the
+complete pipeline from string extraction through cultural review, VO recording,
+RTL layout testing, and localization QA sign-off.
-- `scan` — Scan for localization issues (hardcoded strings, missing keys)
-- `extract` — Extract new strings and generate/update string tables
-- `validate` — Validate existing translations for completeness and format
-- `status` — Report overall localization status
+**Modes:**
+- `scan` — Find hardcoded strings and localization anti-patterns (read-only)
+- `extract` — Extract strings and generate translation-ready tables
+- `validate` — Check translations for completeness, placeholders, and length
+- `status` — Coverage matrix across all locales
+- `brief` — Generate translator context briefing document for an external team
+- `cultural-review` — Flag culturally sensitive content, symbols, colours, idioms
+- `vo-pipeline` — Manage voice-over localization: scripts, recording specs, integration
+- `rtl-check` — Validate RTL language layout, mirroring, and font support
+- `freeze` — Enforce string freeze; lock source strings before translation begins
+- `qa` — Run the full localization QA cycle before release
If no subcommand is provided, output usage and stop. Verdict: **FAIL** — missing required subcommand.
@@ -24,16 +35,19 @@ If no subcommand is provided, output usage and stop. Verdict: **FAIL** — missi
Search `src/` for hardcoded user-facing strings:
-- String literals in UI code not wrapped in a localization function
+- String literals in UI code not wrapped in a localization function (`tr()`, `Tr()`, `NSLocalizedString`, `GetText`, etc.)
- Concatenated strings that should be parameterized
- Strings with positional placeholders (`%s`, `%d`) instead of named ones (`{playerName}`)
+- Format strings that mix locale-sensitive data (numbers, dates, currencies) without locale-aware formatting
Search for localization anti-patterns:
- Date/time formatting not using locale-aware functions
-- Number formatting without locale awareness
-- Text embedded in images or textures (flag asset files)
-- Strings that assume left-to-right text direction
+- Number formatting without locale awareness (`1,000` vs `1.000`)
+- Text embedded in images or textures (flag asset files in `assets/`)
+- Strings that assume left-to-right text direction (positional layout, string assembly order)
+- Gender/plurality assumptions baked into string logic (must use plural forms or gender tokens)
+- Hardcoded punctuation (e.g. `"You won!"` — exclamation styles vary by locale)
Report all findings with file paths and line numbers. This mode is read-only — no files are written.
@@ -42,40 +56,50 @@ Report all findings with file paths and line numbers. This mode is read-only —
## Phase 2B: Extract Mode
- Scan all source files for localized string references
-- Compare against the existing string table (if any) in `assets/data/`
-- Generate new entries for strings that don't have keys yet
+- Compare against the existing string table in `assets/data/strings/`
+- Generate new entries for strings not yet keyed
- Suggest key names following the convention: `[category].[subcategory].[description]`
-- Output a diff of new strings to add to the string table
+ - Example: `ui.hud.health_label`, `dialogue.npc.merchant.greeting`, `menu.main.play_button`
+- Each new entry must include a `context` field — a translator comment explaining:
+ - Where it appears (which screen, which scene)
+ - Maximum character length
+ - Any placeholder meaning (`{playerName}` = the player's chosen display name)
+ - Gender/plurality context if applicable
-Present the diff to the user. Ask: "May I write these new entries to `assets/data/strings/strings-[locale].json`?"
+Output a diff of new strings to add to the string table.
+
+Present the diff to the user. Ask: "May I write these new entries to `assets/data/strings/strings-en.json`?"
If yes, write only the diff (new entries), not a full replacement. Verdict: **COMPLETE** — strings extracted and written.
-If no, stop here. Verdict: **BLOCKED** — user declined write.
-
---
## Phase 2C: Validate Mode
-- Read all string table files in `assets/data/`
-- Check each entry for:
- - Missing translations (key exists but no translation for a locale)
- - Placeholder mismatches (source has `{name}` but translation is missing it)
- - String length violations (exceeds character limits for UI elements)
- - Orphaned keys (translation exists but nothing references the key in code)
-- Report validation results grouped by locale and severity. This mode is read-only — no files are written.
+Read all string table files in `assets/data/strings/`. For each locale, check:
+
+- **Completeness** — key exists in source (en) but no translation for this locale
+- **Placeholder mismatches** — source has `{name}` but translation omits it or adds extras
+- **String length violations** — translation exceeds the character limit recorded in the source `context` field
+- **Plural form count** — locale requires N plural forms; translation provides fewer
+- **Orphaned keys** — translation exists but nothing in `src/` references the key
+- **Stale translations** — source string changed after translation was written (flag for re-translation)
+- **Encoding** — non-ASCII characters present and font atlas supports them (flag if uncertain)
+
+Report validation results grouped by locale and severity. This mode is read-only — no files are written.
---
## Phase 2D: Status Mode
-- Count total localizable strings
-- Per locale: count translated, untranslated, and stale (source changed since translation)
+- Count total localizable strings in the source table
+- Per locale: count translated, untranslated, stale (source changed since translation)
- Generate a coverage matrix:
```markdown
## Localization Status
Generated: [Date]
+String freeze: [Active / Not yet called / Lifted]
| Locale | Total | Translated | Missing | Stale | Coverage |
|--------|-------|-----------|---------|-------|----------|
@@ -83,25 +107,334 @@ Generated: [Date]
| [locale] | [N] | [N] | [N] | [N] | [X]% |
### Issues
-- [N] hardcoded strings found in source code
+- [N] hardcoded strings found in source code (run /localize scan)
- [N] strings exceeding character limits
- [N] placeholder mismatches
-- [N] orphaned keys (can be cleaned up)
+- [N] orphaned keys
+- [N] strings added after freeze was called (freeze violations)
```
This mode is read-only — no files are written.
---
-## Phase 3: Next Steps
+## Phase 2E: Brief Mode
-- If scan found hardcoded strings: run `/localize extract` to begin extracting them.
-- If validate found missing translations: share the report with the translation team.
-- If approaching launch: run `/asset-audit` to verify all localized assets are present.
+Generate a translator context briefing document. This document is sent to the
+external translation team or localisation vendor alongside the string table export.
+
+Read:
+- `design/gdd/` — extract game genre, tone, setting, character names
+- `assets/data/strings/strings-en.json` — the source string table
+- Any existing lore or narrative documents in `design/narrative/`
+
+Generate `production/localization/translator-brief-[locale]-[date].md`:
+
+```markdown
+# Translator Brief — [Game Name] — [Locale]
+
+## Game Overview
+[2-3 paragraph summary of the game, genre, tone, and audience]
+
+## Tone and Voice
+- **Overall tone**: [e.g., "Darkly comic, not slapstick — think Terry Pratchett, not Looney Tunes"]
+- **Player address**: [e.g., "Second person, informal. Never formal 'vous' — always 'tu' for French"]
+- **Profanity policy**: [e.g., "Mild — PG-13 equivalent. Match intensity to source, do not soften or escalate"]
+- **Humour**: [e.g., "Wordplay exists — if a pun cannot translate, invent an equivalent local joke; do not translate literally"]
+
+## Character Glossary
+| Name | Role | Personality | Notes |
+|------|------|-------------|-------|
+| [Name] | [Role] | [Personality] | [Do not translate / transliterate as X] |
+
+## World Glossary
+| Term | Meaning | Notes |
+|------|---------|-------|
+| [Term] | [What it means] | [Keep in English / translate as X] |
+
+## Do Not Translate List
+The following must appear verbatim in all locales:
+- [Game name]
+- [UI terms that match in-engine labels]
+- [Brand or trademark names]
+
+## Placeholder Reference
+| Placeholder | What it represents | Example |
+|-------------|-------------------|---------|
+| `{playerName}` | Player's chosen display name | "Shadowblade" |
+| `{count}` | Integer quantity | "3" |
+
+## Character Limits
+Tight UI fields with hard limits are marked in the string table `context` field.
+Where no limit is stated, target ±30% of the English length as a guideline.
+
+## Contact
+Direct questions to: [placeholder for user/team contact]
+Delivery format: JSON, same schema as strings-en.json
+```
+
+Ask: "May I write this translator brief to `production/localization/translator-brief-[locale]-[date].md`?"
+
+---
+
+## Phase 2F: Cultural Review Mode
+
+Spawn `localization-lead` via Task. Ask them to audit the following for cultural sensitivity across the target locales (read from `assets/data/strings/` and `assets/`):
+
+### Content Areas to Review
+
+**Symbols and gestures**
+- Thumbs up, OK hand, peace sign — meanings vary by region
+- Religious or spiritual symbols in art, UI, or audio
+- National flags, map representations, disputed territories
+
+**Colours**
+- White (mourning in some Asian cultures), green (political associations in some regions), red (luck vs danger)
+- Alert/warning colours that conflict with cultural associations
+
+**Numbers**
+- 4 (death in Japanese/Chinese), 13, 666 — flag use in UI (room numbers, item counts, prices)
+
+**Humour and idioms**
+- Idioms that translate as offensive in other locales
+- Toilet/bodily humour that is inappropriate in some markets (notably Japan, Germany, Middle East)
+- Dark humour around topics that are culturally sensitive in specific regions
+
+**Violence and content ratings**
+- Content that would require ratings changes in DE (Germany), AU (Australia), CN (China), or AE (UAE)
+- Blood colour, gore level, drug references — flag all for region-specific asset variants if needed
+
+**Names and representations**
+- Character names that are offensive, profane, or carry negative meaning in target locales
+- Stereotyped representation of nationalities, religions, or ethnic groups
+
+Present findings as a table:
+
+| Finding | Locale(s) Affected | Severity | Recommended Action |
+|---------|--------------------|----------|--------------------|
+| [Description] | [Locale] | [BLOCKING / ADVISORY / NOTE] | [Change / Flag for review / Accept] |
+
+BLOCKING = must fix before shipping that locale. ADVISORY = recommend change. NOTE = informational only.
+
+Ask: "May I write this cultural review report to `production/localization/cultural-review-[date].md`?"
+
+---
+
+## Phase 2G: VO Pipeline Mode
+
+Manage the voice-over localization process. Determine the sub-task from the argument:
+
+- `vo-pipeline scan` — identify all dialogue lines that require VO recording
+- `vo-pipeline script` — generate recording scripts with director notes
+- `vo-pipeline validate` — check that all recorded VO files are present and correctly named
+- `vo-pipeline integrate` — verify VO files are correctly referenced in code/assets
+
+### VO Pipeline: Scan
+
+Read `assets/data/strings/` and `design/narrative/`. Identify:
+- All dialogue lines (keys matching `dialogue.*`) with source text
+- Lines already recorded (audio file exists in `assets/audio/vo/`)
+- Lines not yet recorded
+
+Output a recording manifest:
+
+```
+## VO Recording Manifest — [Date]
+
+| Key | Character | Source Line | Status |
+|-----|-----------|-------------|--------|
+| dialogue.npc.merchant.greeting | Merchant | "Welcome, traveller." | Recorded |
+| dialogue.npc.merchant.haggle | Merchant | "That's my final offer." | Needs recording |
+```
+
+### VO Pipeline: Script
+
+Generate a recording script document for each character, grouped by scene. Include:
+
+- Character name and brief personality note
+- Full dialogue line with pronunciation guide for unusual proper nouns
+- Emotion/direction note for each line (`[Warm, welcoming]`, `[Annoyed, clipped]`)
+- Any lines that are responses in a conversation (provide context: "Player just said X")
+
+Ask: "May I write the VO recording scripts to `production/localization/vo-scripts-[locale]-[date].md`?"
+
+### VO Pipeline: Validate
+
+Glob `assets/audio/vo/[locale]/` for all `.wav`/`.ogg` files. Cross-reference against the VO manifest. Report:
+- Missing files (line in script, no audio file)
+- Extra files (audio file exists, no matching string key)
+- Naming convention violations
+
+### VO Pipeline: Integrate
+
+Grep `src/` for VO audio references. Verify each referenced path exists in `assets/audio/vo/[locale]/`. Report broken references.
+
+---
+
+## Phase 2H: RTL Check Mode
+
+Right-to-left languages (Arabic, Hebrew, Persian, Urdu) require layout mirroring beyond
+just translating text. This mode validates the implementation.
+
+Read `.claude/docs/technical-preferences.md` to determine the engine. Then check:
+
+**Layout mirroring**
+- Is RTL layout enabled in the engine? (Godot: `Control.layout_direction`, Unity: `RTL Support` package, Unreal: text direction flags)
+- Are all UI containers set to auto-mirror, or are positions hardcoded?
+- Do progress bars, health bars, and directional indicators mirror correctly?
+
+**Text rendering**
+- Are fonts loaded that support Arabic/Hebrew character sets?
+- Is Arabic text rendered with correct ligatures (connected script)?
+- Are numbers displayed as Eastern Arabic numerals where required?
+
+**String assembly**
+- Are there any string concatenations that assume left-to-right reading order?
+- Do `{placeholder}` positions in sentences work correctly when sentence structure is reversed?
+
+**Asset review**
+- Are there UI icons with directional arrows or asymmetric designs that need mirrored variants?
+- Do any text-in-image assets exist that require RTL versions?
+
+Grep patterns to check:
+- Engine-specific RTL flags in scene/prefab files
+- Any `HBoxContainer`, `LinearLayout`, `HorizontalBox` nodes — verify layout_direction settings
+- String concatenation with `+` near dialogue or UI code
+
+Report findings. Flag BLOCKING issues (content unreadable without fix) vs ADVISORY (cosmetic improvements).
+
+Ask: "May I write this RTL check report to `production/localization/rtl-check-[date].md`?"
+
+---
+
+## Phase 2I: Freeze Mode
+
+String freeze locks the source (English) string table so that translations can proceed
+without the source changing under the translators.
+
+### freeze call
+
+Check current freeze status in `production/localization/freeze-status.md` (if it exists).
+
+If already frozen:
+> "String freeze is currently ACTIVE (called [date]). [N] strings have been added or modified since freeze. These are freeze violations — they require re-translation or an approved freeze lift."
+
+If not frozen, present the pre-freeze checklist:
+
+```
+Pre-Freeze Checklist
+[ ] All planned UI screens are implemented
+[ ] All dialogue lines are final (no further narrative revisions planned)
+[ ] All system strings (error messages, tutorial text) are complete
+[ ] /localize scan shows zero hardcoded strings
+[ ] /localize validate shows no placeholder mismatches in source (en)
+[ ] Marketing strings (store description, achievements) are final
+```
+
+Use `AskUserQuestion`:
+- Prompt: "Are all items above confirmed? Calling string freeze locks the source table."
+- Options: `[A] Yes — call string freeze now` / `[B] No — I still have strings to add`
+
+If [A]: Write `production/localization/freeze-status.md`:
+
+```markdown
+# String Freeze Status
+
+**Status**: ACTIVE
+**Called**: [date]
+**Called by**: [user]
+**Total strings at freeze**: [N]
+
+## Post-Freeze Changes
+[Any strings added or modified after freeze are listed here automatically by /localize extract]
+```
+
+### freeze lift
+
+If argument includes `lift`: update `freeze-status.md` Status to `LIFTED`, record the reason and date. Warn: "Lifting the freeze requires re-translation of all modified strings. Notify the translation team."
+
+### freeze check (auto-integrated into extract)
+
+When `extract` mode finds new or modified strings and `freeze-status.md` shows Status: ACTIVE — append the new keys to `## Post-Freeze Changes` and warn:
+> "⚠️ String freeze is active. [N] new/modified strings have been added. These are freeze violations. Notify your localization vendor before proceeding."
+
+---
+
+## Phase 2J: QA Mode
+
+Localization QA is a dedicated pass that runs after translations are delivered but
+before any locale ships. This is not the same as `/validate` (which checks completeness)
+— this is a structured playthrough-based quality check.
+
+Spawn `localization-lead` via Task with:
+- The target locale(s) to QA
+- The list of all screens/flows in the game (from `design/gdd/` or `/content-audit` output)
+- The current `/localize validate` report
+- The cultural review report (if it exists)
+
+Ask the localization-lead to produce a QA plan covering:
+
+1. **Functional string check** — every string displays in-game without truncation, placeholder errors, or encoding corruption
+2. **UI overflow check** — translated strings that exceed UI bounds (even if within character limits, some languages expand)
+3. **Contextual accuracy** — a sample of 10% of strings reviewed in-game for translation accuracy and natural phrasing
+4. **Cultural review items** — verify all BLOCKING items from the cultural review are resolved
+5. **VO sync check** — if VO exists, verify lip sync or subtitle timing is acceptable after translation
+6. **Platform cert requirements** — check platform-specific localization requirements (age ratings text, legal notices, ESRB/PEGI/CERO text)
+
+Output a QA verdict per locale:
+
+```
+## Localization QA Verdict — [Locale]
+
+**Status**: PASS / PASS WITH CONDITIONS / FAIL
+**Reviewed by**: localization-lead
+**Date**: [date]
+
+### Findings
+| ID | Area | Description | Severity | Status |
+|----|------|-------------|----------|--------|
+| LOC-001 | UI Overflow | "Settings" button text overflows on [Screen] | BLOCKING | Open |
+| LOC-002 | Translation | [Key] translation is literal — sounds unnatural | ADVISORY | Open |
+
+### Conditions (if PASS WITH CONDITIONS)
+- [Condition 1 — must resolve before ship]
+
+### Sign-Off
+[ ] All BLOCKING findings resolved
+[ ] Producer approves shipping [Locale]
+```
+
+Ask: "May I write this localization QA report to `production/localization/loc-qa-[locale]-[date].md`?"
+
+**Gate integration**: The Polish → Release gate requires a PASS or PASS WITH CONDITIONS verdict for every locale being shipped. A FAIL blocks release for that locale only — other locales may still proceed if their QA passes.
+
+---
+
+## Phase 3: Rules and Next Steps
### Rules
- English (en) is always the source locale
-- Every string table entry must include a translator comment explaining context
+- Every string table entry must include a `context` field with translator notes, character limits, and placeholder meaning
- Never modify translation files directly — generate diffs for review
-- Character limits must be defined per-UI-element and enforced automatically
-- Right-to-left (RTL) language support should be considered from the start, not bolted on later
+- Character limits must be defined per-UI-element and enforced in validate mode
+- String freeze must be called before sending strings to translators — never translate a moving target
+- RTL support must be designed in from the start — retrofitting RTL layout is expensive
+- Cultural review is required for any locale where the game will be sold commercially
+- VO scripts must include director notes — raw dialogue lines produce flat recordings
+
+### Recommended Workflow
+
+```
+/localize scan → find hardcoded strings
+/localize extract → build string table
+/localize freeze → lock source before sending to translators
+/localize brief → generate translator briefing document
+[Send to translators]
+/localize validate → check returned translations
+/localize cultural-review → flag culturally sensitive content
+/localize rtl-check → if shipping Arabic / Hebrew / Persian
+/localize vo-pipeline → if shipping dubbed VO
+/localize qa → full localization QA pass
+```
+
+After `qa` returns PASS for all shipping locales, include the QA report path when running `/gate-check release`.
diff --git a/.claude/skills/map-systems/SKILL.md b/.claude/skills/map-systems/SKILL.md
index 05c20a3..aa0e484 100644
--- a/.claude/skills/map-systems/SKILL.md
+++ b/.claude/skills/map-systems/SKILL.md
@@ -3,12 +3,12 @@ name: map-systems
description: "Decompose a game concept into individual systems, map dependencies, prioritize design order, and create the systems index."
argument-hint: "[next | system-name] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion, TodoWrite
+allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion, TodoWrite, Task
---
When this skill is invoked:
-## 1. Parse Arguments
+## Parse Arguments
Two modes:
@@ -17,12 +17,16 @@ Two modes:
- **`next`**: `/map-systems next` — Pick the highest-priority undesigned system
from the index and hand off to `/design-system` (Phase 6).
-Also extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Also resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
---
-## 2. Phase 1: Read Concept (Required Context)
+## Phase 1: Read Concept (Required Context)
Read the game concept and any existing design work. This provides the raw material
for systems decomposition.
@@ -48,7 +52,7 @@ for systems decomposition.
---
-## 3. Phase 2: Systems Enumeration (Collaborative)
+## Phase 2: Systems Enumeration (Collaborative)
Extract and identify all systems the game needs. This is the creative core of the
skill — it requires human judgment because concept docs rarely enumerate every
@@ -101,7 +105,7 @@ Iterate until the user approves the enumeration.
---
-## 4. Phase 3: Dependency Mapping (Collaborative)
+## Phase 3: Dependency Mapping (Collaborative)
For each system, determine what it depends on. A system "depends on" another if
it cannot function without that other system existing first.
@@ -140,6 +144,11 @@ Show the dependency map as a layered list. Highlight:
Use `AskUserQuestion` to ask: "Does this dependency ordering look right? Any
dependencies I'm missing or that should be removed?"
+**Review mode check** — apply before spawning TD-SYSTEM-BOUNDARY:
+- `solo` → skip. Note: "TD-SYSTEM-BOUNDARY skipped — Solo mode." Proceed to priority assignment.
+- `lean` → skip (not a PHASE-GATE). Note: "TD-SYSTEM-BOUNDARY skipped — Lean mode." Proceed to priority assignment.
+- `full` → spawn as normal.
+
**After dependency mapping is approved, spawn `technical-director` via Task using gate TD-SYSTEM-BOUNDARY (`.claude/docs/director-gates.md`) before proceeding to priority assignment.**
Pass: the dependency map summary, layer assignments, bottleneck systems list, any circular dependency resolutions.
@@ -148,7 +157,7 @@ Present the assessment. If REJECT, revise the system boundaries with the user be
---
-## 5. Phase 4: Priority Assignment (Collaborative)
+## Phase 4: Priority Assignment (Collaborative)
Assign each system to a priority tier based on what milestone it's needed for.
@@ -172,6 +181,18 @@ Which systems should be higher or lower priority?"
Explain reasoning in conversation: "I placed [system] in MVP because the core loop
requires it — without [system], the 30-second loop can't function."
+**"Why" column guidance**: When explaining why each system was placed in a priority tier, mix technical necessity with player-experience reasoning. Do not use purely technical justifications like "Combat needs damage math" — connect to player experience where relevant. Examples of good "Why" entries:
+- "Required for the core loop — without it, placement decisions have no consequence (Pillar 2: Placement is the Puzzle)"
+- "Ballista's punch-through identity is established here — this stat definition is what makes it feel different from Archer"
+- "Foundation for all economy decisions — players must understand upgrade costs to make meaningful placement choices"
+
+Pure technical necessity ("X depends on Y") is insufficient alone when the system directly shapes player experience.
+
+**Review mode check** — apply before spawning PR-SCOPE:
+- `solo` → skip. Note: "PR-SCOPE skipped — Solo mode." Proceed to writing the systems index.
+- `lean` → skip (not a PHASE-GATE). Note: "PR-SCOPE skipped — Lean mode." Proceed to writing the systems index.
+- `full` → spawn as normal.
+
**After priorities are approved, spawn `producer` via Task using gate PR-SCOPE (`.claude/docs/director-gates.md`) before writing the index.**
Pass: total system count per milestone tier, estimated implementation volume per tier (system count × average complexity), team size, stated project timeline.
@@ -191,7 +212,7 @@ This is the order the team should write GDDs in.
---
-## 6. Phase 5: Create Systems Index (Write)
+## Phase 5: Create Systems Index (Write)
### Step 5a: Draft the Document
@@ -215,6 +236,11 @@ Ask: "May I write the systems index to `design/gdd/systems-index.md`?"
Wait for approval. Write the file only after "yes."
+**Review mode check** — apply before spawning CD-SYSTEMS:
+- `solo` → skip. Note: "CD-SYSTEMS skipped — Solo mode." Proceed to Phase 7 next steps.
+- `lean` → skip (not a PHASE-GATE). Note: "CD-SYSTEMS skipped — Lean mode." Proceed to Phase 7 next steps.
+- `full` → spawn as normal.
+
**After the systems index is written, spawn `creative-director` via Task using gate CD-SYSTEMS (`.claude/docs/director-gates.md`).**
Pass: systems index path, game pillars and core fantasy (from `design/gdd/game-concept.md`), MVP priority tier system list.
@@ -234,7 +260,7 @@ If the user declined: **Verdict: BLOCKED** — user did not approve the write.
---
-## 7. Phase 6: Design Individual Systems (Handoff to /design-system)
+## Phase 6: Design Individual Systems (Handoff to /design-system)
This phase is entered when:
- The user says "yes" to designing systems after creating the index
@@ -280,16 +306,20 @@ If continuing, return to Step 6a.
---
-## 8. Phase 7: Suggest Next Steps
+## Phase 7: Suggest Next Steps
-After the systems index is created (or after designing some systems), suggest
-the appropriate next actions:
+After the systems index is created (or after designing some systems), present next actions using `AskUserQuestion`:
-- "Run `/design-system [system-name]` to write the next system's GDD"
-- "Run `/design-review [path]` on each completed GDD to validate quality"
-- "Run `/gate-check pre-production` to check if you're ready to start building"
-- "Prototype the highest-risk system with `/prototype [system]`"
-- "Plan the first implementation sprint with `/sprint-plan new`"
+- "Systems index is written. What would you like to do next?"
+ - [A] Start designing GDDs — run `/design-system [first-system-in-order]`
+ - [B] Ask a director to review the index first — ask `creative-director` or `technical-director` to validate the system set before committing to 10+ GDD sessions
+ - [C] Stop here for this session
+
+**The director review option ([B]) is worth highlighting**: having a Creative Director or Technical Director review the completed systems index before starting GDD authoring catches scope issues, missing systems, and boundary problems before they're locked in across many documents. It is optional but recommended for new projects.
+
+After any individual GDD is completed:
+- "Run `/design-review design/gdd/[system].md` in a fresh session to validate quality"
+- "Run `/gate-check systems-design` when all MVP GDDs are complete"
---
@@ -314,3 +344,11 @@ This skill follows the collaborative design principle at every phase:
**Never** auto-generate the full systems list and write it without review.
**Never** start designing a system without user confirmation.
**Always** show the enumeration, dependencies, and priorities for user validation.
+
+## Context Window Awareness
+
+If context reaches or exceeds 70% at any point, append this notice:
+
+> **Context is approaching the limit (≥70%).** The systems index is saved to
+> `design/gdd/systems-index.md`. Open a fresh Claude Code session to continue
+> designing individual GDDs — run `/map-systems next` to pick up where you left off.
diff --git a/.claude/skills/milestone-review/SKILL.md b/.claude/skills/milestone-review/SKILL.md
index 2e6a052..06a9191 100644
--- a/.claude/skills/milestone-review/SKILL.md
+++ b/.claude/skills/milestone-review/SKILL.md
@@ -3,13 +3,17 @@ name: milestone-review
description: "Generates a comprehensive milestone progress review including feature completeness, quality metrics, risk assessment, and go/no-go recommendation. Use at milestone checkpoints or when evaluating readiness for a milestone deadline."
argument-hint: "[milestone-name|current] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
---
## Phase 0: Parse Arguments
-Extract the milestone name (`current` or a specific name) and any `--review [full|lean|solo]`
-flag. Store the review mode as the override for this run (see `.claude/docs/director-gates.md`).
+Extract the milestone name (`current` or a specific name) and resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
---
@@ -104,6 +108,11 @@ Read all sprint reports for sprints within this milestone from `production/sprin
## Phase 3b: Producer Risk Assessment
+**Review mode check** — apply before spawning PR-MILESTONE:
+- `solo` → skip. Note: "PR-MILESTONE skipped — Solo mode." Present the Go/No-Go section without a producer verdict.
+- `lean` → skip (not a PHASE-GATE). Note: "PR-MILESTONE skipped — Lean mode." Present the Go/No-Go section without a producer verdict.
+- `full` → spawn as normal.
+
Before generating the Go/No-Go recommendation, spawn `producer` via Task using gate **PR-MILESTONE** (`.claude/docs/director-gates.md`).
Pass: milestone name and target date, current completion percentage, blocked story count, velocity data from sprint reports (if available), list of cut candidates.
diff --git a/.claude/skills/playtest-report/SKILL.md b/.claude/skills/playtest-report/SKILL.md
index a216a58..33f981a 100644
--- a/.claude/skills/playtest-report/SKILL.md
+++ b/.claude/skills/playtest-report/SKILL.md
@@ -3,13 +3,17 @@ name: playtest-report
description: "Generates a structured playtest report template or analyzes existing playtest notes into a structured format. Use this to standardize playtest feedback collection and analysis."
argument-hint: "[new|analyze path-to-notes] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
---
## Phase 1: Parse Arguments
-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
Determine the mode:
@@ -112,6 +116,11 @@ Present the categorized list, then route:
## Phase 3b: Creative Director Player Experience Review
+**Review mode check** — apply before spawning CD-PLAYTEST:
+- `solo` → skip. Note: "CD-PLAYTEST skipped — Solo mode." Proceed to Phase 4 (save the report).
+- `lean` → skip (not a PHASE-GATE). Note: "CD-PLAYTEST skipped — Lean mode." Proceed to Phase 4 (save the report).
+- `full` → spawn as normal.
+
After categorising findings, spawn `creative-director` via Task using gate **CD-PLAYTEST** (`.claude/docs/director-gates.md`).
Pass: the structured report content, game pillars and core fantasy (from `design/gdd/game-concept.md`), the specific hypothesis being tested.
diff --git a/.claude/skills/project-stage-detect/SKILL.md b/.claude/skills/project-stage-detect/SKILL.md
index 3a1ad00..148abaa 100644
--- a/.claude/skills/project-stage-detect/SKILL.md
+++ b/.claude/skills/project-stage-detect/SKILL.md
@@ -4,7 +4,6 @@ description: "Automatically analyze project state, detect stage, identify gaps,
argument-hint: "[optional: role filter like 'programmer' or 'designer']"
user-invocable: true
allowed-tools: Read, Glob, Grep, Bash, Write
-context: fork
model: haiku
# Read-only diagnostic skill — no specialist agent delegation needed
---
diff --git a/.claude/skills/propagate-design-change/SKILL.md b/.claude/skills/propagate-design-change/SKILL.md
index 8ce05b1..5c10b38 100644
--- a/.claude/skills/propagate-design-change/SKILL.md
+++ b/.claude/skills/propagate-design-change/SKILL.md
@@ -4,7 +4,6 @@ description: "When a GDD is revised, scans all ADRs and the traceability index t
argument-hint: "[path/to/changed-gdd.md]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Bash
-context: fork
agent: technical-director
---
diff --git a/.claude/skills/prototype/SKILL.md b/.claude/skills/prototype/SKILL.md
index c739b5c..f3e6e28 100644
--- a/.claude/skills/prototype/SKILL.md
+++ b/.claude/skills/prototype/SKILL.md
@@ -4,15 +4,18 @@ description: "Rapid prototyping workflow. Skips normal standards to quickly vali
argument-hint: "[concept-description] [--review full|lean|solo]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Edit, Bash, Task
-context: fork
agent: prototyper
isolation: worktree
---
## Phase 1: Define the Question
-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
Read the concept description from the argument. Identify the core question this prototype must answer. If the concept is vague, state the question explicitly before proceeding — a prototype without a clear question wastes time.
@@ -113,6 +116,11 @@ If yes, write the file.
## Phase 6: Creative Director Review
+**Review mode check** — apply before spawning CD-PLAYTEST:
+- `solo` → skip. Note: "CD-PLAYTEST skipped — Solo mode." Proceed to Phase 7 summary with the prototyper's recommendation as the final verdict.
+- `lean` → skip (not a PHASE-GATE). Note: "CD-PLAYTEST skipped — Lean mode." Proceed to Phase 7 summary with the prototyper's recommendation as the final verdict.
+- `full` → spawn as normal.
+
Spawn `creative-director` via Task using gate **CD-PLAYTEST** (`.claude/docs/director-gates.md`).
Pass: the full REPORT.md content, the original design question, game pillars and core fantasy from `design/gdd/game-concept.md` (if it exists).
diff --git a/.claude/skills/qa-plan/SKILL.md b/.claude/skills/qa-plan/SKILL.md
index 8a119a3..054edf3 100644
--- a/.claude/skills/qa-plan/SKILL.md
+++ b/.claude/skills/qa-plan/SKILL.md
@@ -3,8 +3,7 @@ name: qa-plan
description: "Generate a QA test plan for a sprint or feature. Reads GDDs and story files, classifies stories by test type (Logic/Integration/Visual/UI), and produces a structured test plan covering automated tests required, manual test cases, smoke test scope, and playtest sign-off requirements. Run before sprint begins or when starting a major feature."
argument-hint: "[sprint | feature: system-name | story: path]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write
-context: fork
+allowed-tools: Read, Glob, Grep, Write, AskUserQuestion
agent: qa-lead
---
diff --git a/.claude/skills/quick-design/SKILL.md b/.claude/skills/quick-design/SKILL.md
index 4bd9ab5..6979140 100644
--- a/.claude/skills/quick-design/SKILL.md
+++ b/.claude/skills/quick-design/SKILL.md
@@ -4,7 +4,6 @@ description: "Lightweight design spec for small changes — tuning adjustments,
argument-hint: "[brief description of the change]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Edit
-context: fork
---
# Quick Design
@@ -55,8 +54,10 @@ Before drafting anything, read the relevant context:
- Search `design/gdd/` for the GDD most relevant to this change. Read the
sections that this change would affect.
-- Read `design/gdd/systems-index.md` to understand where this system sits in
- the dependency graph and what tier it belongs to.
+- Check whether `design/gdd/systems-index.md` exists. If it does, read it to
+ understand where this system sits in the dependency graph and what tier it
+ belongs to. If it does not exist, note "No systems index found — skipping
+ dependency tier check." and continue.
- Check `design/quick-specs/` for any prior quick specs that touched this
system — avoid contradicting them.
- If this is a Tuning change, also check `assets/data/` for the data file that
diff --git a/.claude/skills/regression-suite/SKILL.md b/.claude/skills/regression-suite/SKILL.md
index 3cb2eca..376d2d0 100644
--- a/.claude/skills/regression-suite/SKILL.md
+++ b/.claude/skills/regression-suite/SKILL.md
@@ -4,7 +4,6 @@ description: "Map test coverage to GDD critical paths, identify fixed bugs witho
argument-hint: "[update | audit | report]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Edit
-context: fork
---
# Regression Suite
diff --git a/.claude/skills/reverse-document/SKILL.md b/.claude/skills/reverse-document/SKILL.md
index c2c8024..d73cc58 100644
--- a/.claude/skills/reverse-document/SKILL.md
+++ b/.claude/skills/reverse-document/SKILL.md
@@ -4,7 +4,6 @@ description: "Generate design or architecture documents from existing implementa
argument-hint: " (e.g., 'design src/gameplay/combat' or 'architecture src/core')"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Edit, Bash
-context: fork
# Read-only diagnostic skill — no specialist agent delegation needed
---
diff --git a/.claude/skills/review-all-gdds/SKILL.md b/.claude/skills/review-all-gdds/SKILL.md
index 10e20ef..dd09d62 100644
--- a/.claude/skills/review-all-gdds/SKILL.md
+++ b/.claude/skills/review-all-gdds/SKILL.md
@@ -3,9 +3,7 @@ name: review-all-gdds
description: "Holistic cross-GDD consistency and game design review. Reads all system GDDs simultaneously and checks for contradictions between them, stale references, ownership conflicts, formula incompatibilities, and game design theory violations (dominant strategies, economic imbalance, cognitive overload, pillar drift). Run after all MVP GDDs are written, before architecture begins."
argument-hint: "[focus: full | consistency | design-theory | since-last-review]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Bash
-context: fork
-agent: game-designer
+allowed-tools: Read, Glob, Grep, Write, Bash, AskUserQuestion, Task
model: opus
---
@@ -546,16 +544,16 @@ FAIL: One or more blocking issues must be resolved before architecture begins.
## Phase 6: Write Report and Flag GDDs
-Ask: "May I write this review to `design/gdd/gdd-cross-review-[date].md`?"
+Use `AskUserQuestion` for write permission:
+- Prompt: "May I write this review to `design/gdd/gdd-cross-review-[date].md`?"
+- Options: `[A] Yes — write the report` / `[B] No — skip`
-If any GDDs are flagged for revision:
-
-Ask: "Should I update the systems index to mark these GDDs as needing revision?"
-- If yes: for each flagged GDD, update its Status field in systems-index.md
- to "Needs Revision" with a short note in the adjacent Notes/Description column.
+If any GDDs are flagged for revision, use a second `AskUserQuestion`:
+- Prompt: "Should I update the systems index to mark these GDDs as needing revision? ([list of flagged GDDs])"
+- Options: `[A] Yes — update systems index` / `[B] No — leave as-is`
+- If yes: update each flagged GDD's Status field in systems-index.md to "Needs Revision".
(Do NOT append parentheticals to the status value — other skills match "Needs Revision"
as an exact string and parentheticals break that match.)
- Ask approval before writing.
### Session State Update
@@ -577,18 +575,27 @@ Confirm in conversation: "Session state updated."
## Phase 7: Handoff
-After the report is written:
+After all file writes are complete, use `AskUserQuestion` for a closing widget.
-- **If FAIL**: "Resolve the blocking issues in the flagged GDDs, then re-run
- `/review-all-gdds` to confirm they're cleared before starting architecture."
-- **If CONCERNS**: "Warnings are present but not blocking. You may proceed to
- `/create-architecture` and resolve warnings in parallel, or resolve them now
- for a cleaner baseline."
-- **If PASS**: "GDDs are internally consistent. Run `/create-architecture` to
- begin translating the design into an engine-aware technical blueprint."
+Before building options, check project state:
+- Are there any Warning-level items that are simple edits (flagged with "30-second edit", "brief addition", or similar)? → offer inline quick-fix option
+- Are any GDDs in the "Flagged for Revision" table? → offer /design-review option for each
+- Read systems-index.md for the next system with Status: Not Started → offer /design-system option
+- Is the verdict PASS or CONCERNS? → offer /gate-check or /create-architecture
-Gate reminder: `/gate-check technical-setup` now requires a PASS or CONCERNS
-verdict from this review before architecture work can begin.
+Build the option list dynamically — only include options that apply:
+
+**Option pool:**
+- `[_] Apply quick fix: [W-XX description] in [gdd-name].md — [effort estimate]` (one option per simple-edit warning; only for Warning-level, not Blocking)
+- `[_] Run /design-review [flagged-gdd-path] — address flagged warnings` (one per flagged GDD, if any)
+- `[_] Run /design-system [next-system] — next in design order` (always include, name the actual system)
+- `[_] Run /create-architecture — begin architecture (verdict is PASS/CONCERNS)` (include if verdict is not FAIL)
+- `[_] Run /gate-check — validate Systems Design phase gate` (include if verdict is PASS)
+- `[_] Stop here`
+
+Assign letters A, B, C… only to included options. Mark the most pipeline-advancing option as `(recommended)`.
+
+Never end the skill with plain text. Always close with this widget.
---
diff --git a/.claude/skills/scope-check/SKILL.md b/.claude/skills/scope-check/SKILL.md
index 5de8048..ccffc91 100644
--- a/.claude/skills/scope-check/SKILL.md
+++ b/.claude/skills/scope-check/SKILL.md
@@ -4,7 +4,6 @@ description: "Analyze a feature or sprint for scope creep by comparing current s
argument-hint: "[feature-name or sprint-N]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Bash
-context: fork
model: haiku
---
diff --git a/.claude/skills/security-audit/SKILL.md b/.claude/skills/security-audit/SKILL.md
new file mode 100644
index 0000000..9e363fe
--- /dev/null
+++ b/.claude/skills/security-audit/SKILL.md
@@ -0,0 +1,244 @@
+---
+name: security-audit
+description: "Audit the game for security vulnerabilities: save tampering, cheat vectors, network exploits, data exposure, and input validation gaps. Produces a prioritised security report with remediation guidance. Run before any public release or multiplayer launch."
+argument-hint: "[full | network | save | input | quick]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Bash, Write, Task
+agent: security-engineer
+---
+
+# Security Audit
+
+Security is not optional for any shipped game. Even single-player games have
+save tampering vectors. Multiplayer games have cheat surfaces, data exposure
+risks, and denial-of-service potential. This skill systematically audits the
+codebase for the most common game security failures and produces a prioritised
+remediation plan.
+
+**Run this skill:**
+- Before any public release (required for the Polish → Release gate)
+- Before enabling any online/multiplayer feature
+- After implementing any system that reads from disk or network
+- When a security-related bug is reported
+
+**Output:** `production/security/security-audit-[date].md`
+
+---
+
+## Phase 1: Parse Arguments and Scope
+
+**Modes:**
+- `full` — all categories (recommended before release)
+- `network` — network/multiplayer only
+- `save` — save file and serialization only
+- `input` — input validation and injection only
+- `quick` — high-severity checks only (fastest, for iterative use)
+- No argument — run `full`
+
+Read `.claude/docs/technical-preferences.md` to determine:
+- Engine and language (affects which patterns to search for)
+- Target platforms (affects which attack surfaces apply)
+- Whether multiplayer/networking is in scope
+
+---
+
+## Phase 2: Spawn Security Engineer
+
+Spawn `security-engineer` via Task. Pass:
+- The audit scope/mode
+- Engine and language from technical preferences
+- A manifest of all source directories: `src/`, `assets/data/`, any config files
+
+The security-engineer runs the audit across 6 categories (see Phase 3). Collect their full findings before proceeding.
+
+---
+
+## Phase 3: Audit Categories
+
+The security-engineer evaluates each of the following. Skip categories not applicable to the project scope.
+
+### Category 1: Save File and Serialization Security
+- Are save files validated before loading? (no blind deserialization)
+- Are save file paths constructed from user input? (path traversal risk)
+- Are save files checksummed or signed? (tamper detection)
+- Does the game trust numeric values from save files without bounds checking?
+- Are there any eval() or dynamic code execution calls near save loading?
+
+Grep patterns: `File.open`, `load`, `deserialize`, `JSON.parse`, `from_json`, `read_file` — check each for validation.
+
+### Category 2: Network and Multiplayer Security (skip if single-player only)
+- Is game state authoritative on the server, or does the client dictate outcomes?
+- Are incoming network packets validated for size, type, and value range?
+- Are player positions and state changes validated server-side?
+- Is there rate limiting on any network calls?
+- Are authentication tokens handled correctly (never sent in plaintext)?
+- Does the game expose any debug endpoints in release builds?
+
+Grep for: `recv`, `receive`, `PacketPeer`, `socket`, `NetworkedMultiplayerPeer`, `rpc`, `rpc_id` — check each call site for validation.
+
+### Category 3: Input Validation
+- Are any player-supplied strings used in file paths? (path traversal)
+- Are any player-supplied strings logged without sanitization? (log injection)
+- Are numeric inputs (e.g., item quantities, character stats) bounds-checked before use?
+- Are achievement/stat values checked before being written to any backend?
+
+Grep for: `get_input`, `Input.get_`, `input_map`, user-facing text fields — check validation.
+
+### Category 4: Data Exposure
+- Are any API keys, credentials, or secrets hardcoded in `src/` or `assets/`?
+- Are debug symbols or verbose error messages included in release builds?
+- Does the game log sensitive player data to disk or console?
+- Are any internal file paths or system information exposed to players?
+
+Grep for: `api_key`, `secret`, `password`, `token`, `private_key`, `DEBUG`, `print(` in release-facing code.
+
+### Category 5: Cheat and Anti-Tamper Vectors
+- Are gameplay-critical values stored only in memory, not in easily-editable files?
+- Are any critical game progression flags (e.g., "has paid for DLC") validated server-side?
+- Is there any protection against memory editing tools (Cheat Engine, etc.) for multiplayer?
+- Are leaderboard/score submissions validated before acceptance?
+
+Note: Client-side anti-cheat is largely unenforceable. Focus on server-side validation for anything competitive or monetised.
+
+### Category 6: Dependency and Supply Chain
+- Are any third-party plugins or libraries used? List them.
+- Do any plugins have known CVEs in the version being used?
+- Are plugin sources verified (official marketplace, reviewed repository)?
+
+Glob for: `addons/`, `plugins/`, `third_party/`, `vendor/` — list all external dependencies.
+
+---
+
+## Phase 4: Classify Findings
+
+For each finding, assign:
+
+**Severity:**
+| Level | Definition |
+|-------|-----------|
+| **CRITICAL** | Remote code execution, data breach, or trivially-exploitable cheat that breaks multiplayer integrity |
+| **HIGH** | Save tampering that bypasses progression, credential exposure, or server-side authority bypass |
+| **MEDIUM** | Client-side cheat enablement, information disclosure, or input validation gap with limited impact |
+| **LOW** | Defence-in-depth improvement — hardening that reduces attack surface but no direct exploit exists |
+
+**Status:** Open / Accepted Risk / Out of Scope
+
+---
+
+## Phase 5: Generate Report
+
+```markdown
+# Security Audit Report
+
+**Date**: [date]
+**Scope**: [full | network | save | input | quick]
+**Engine**: [engine + version]
+**Audited by**: security-engineer via /security-audit
+**Files scanned**: [N source files, N config files]
+
+---
+
+## Executive Summary
+
+| Severity | Count | Must Fix Before Release |
+|----------|-------|------------------------|
+| CRITICAL | [N] | Yes — all |
+| HIGH | [N] | Yes — all |
+| MEDIUM | [N] | Recommended |
+| LOW | [N] | Optional |
+
+**Release recommendation**: [CLEAR TO SHIP / FIX CRITICALS FIRST / DO NOT SHIP]
+
+---
+
+## CRITICAL Findings
+
+### SEC-001: [Title]
+**Category**: [Save / Network / Input / Data / Cheat / Dependency]
+**File**: `[path]` line [N]
+**Description**: [What the vulnerability is]
+**Attack scenario**: [How a malicious user would exploit it]
+**Remediation**: [Specific code change or pattern to apply]
+**Effort**: [Low / Medium / High]
+
+[repeat per finding]
+
+---
+
+## HIGH Findings
+
+[same format]
+
+---
+
+## MEDIUM Findings
+
+[same format]
+
+---
+
+## LOW Findings
+
+[same format]
+
+---
+
+## Accepted Risk
+
+[Any findings explicitly accepted by the team with rationale]
+
+---
+
+## Dependency Inventory
+
+| Plugin / Library | Version | Source | Known CVEs |
+|-----------------|---------|--------|------------|
+| [name] | [version] | [source] | [none / CVE-XXXX-NNNN] |
+
+---
+
+## Remediation Priority Order
+
+1. [SEC-NNN] — [1-line description] — Est. effort: [Low/Medium/High]
+2. ...
+
+---
+
+## Re-Audit Trigger
+
+Run `/security-audit` again after remediating any CRITICAL or HIGH findings.
+The Polish → Release gate requires this report with no open CRITICAL or HIGH items.
+```
+
+---
+
+## Phase 6: Write Report
+
+Present the report summary (executive summary + CRITICAL/HIGH findings only) in conversation.
+
+Ask: "May I write the full security audit report to `production/security/security-audit-[date].md`?"
+
+Write only after approval.
+
+---
+
+## Phase 7: Gate Integration
+
+This report is a required artifact for the **Polish → Release gate**.
+
+After remediating findings, re-run: `/security-audit quick` to confirm CRITICAL/HIGH items are resolved before running `/gate-check release`.
+
+If CRITICAL findings exist:
+> "⛔ CRITICAL security findings must be resolved before any public release. Do not proceed to `/launch-checklist` until these are addressed."
+
+If no CRITICAL/HIGH findings:
+> "✅ No blocking security findings. Report written to `production/security/`. Include this path when running `/gate-check release`."
+
+---
+
+## Collaborative Protocol
+
+- **Never assume a pattern is safe** — flag it and let the user decide
+- **Accepted risk is a valid outcome** — some LOW findings are acceptable trade-offs for a solo team; document the decision
+- **Multiplayer games have a higher bar** — any HIGH finding in a multiplayer context should be treated as CRITICAL
+- **This is not a penetration test** — this audit covers common patterns; a real pentest by a human security professional is recommended before any competitive or monetised multiplayer launch
diff --git a/.claude/skills/setup-engine/SKILL.md b/.claude/skills/setup-engine/SKILL.md
index 4c65846..d39b81e 100644
--- a/.claude/skills/setup-engine/SKILL.md
+++ b/.claude/skills/setup-engine/SKILL.md
@@ -3,7 +3,7 @@ name: setup-engine
description: "Configure the project's game engine and version. Pins the engine in CLAUDE.md, detects knowledge gaps, and populates engine reference docs via WebSearch when the version is beyond the LLM's training data."
argument-hint: "[engine] | [engine version] | refresh | upgrade [old-version] [new-version] | no args for guided selection"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit, WebSearch, WebFetch, Task
+allowed-tools: Read, Glob, Grep, Write, Edit, WebSearch, WebFetch, Task, AskUserQuestion
---
When this skill is invoked:
@@ -230,10 +230,15 @@ Example filled section:
```
### Remaining Sections
-- Performance Budgets: Leave as `[TO BE CONFIGURED]` with a suggestion:
- > "Typical targets: 60fps / 16.6ms frame budget. Want to set these now?"
-- Testing: Suggest engine-appropriate framework (GUT for Godot, NUnit for Unity, etc.)
-- Forbidden Patterns / Allowed Libraries: Leave as placeholder
+- **Performance Budgets**: Use `AskUserQuestion`:
+ - Prompt: "Should I set default performance budgets now, or leave them for later?"
+ - Options: `[A] Set defaults now (60fps, 16.6ms frame budget, engine-appropriate draw call limit)` / `[B] Leave as [TO BE CONFIGURED] — I'll set these when I know my target hardware`
+ - If [A]: populate with the suggested defaults. If [B]: leave as placeholder.
+- **Testing**: Suggest engine-appropriate framework (GUT for Godot, NUnit for Unity, etc.) — ask before adding.
+- **Forbidden Patterns**: Leave as placeholder — do NOT pre-populate.
+- **Allowed Libraries**: Leave as placeholder — do NOT pre-populate dependencies the project does not currently need. Only add a library here when it is actively being integrated, not speculatively.
+
+> **Guardrail**: Never add speculative dependencies to Allowed Libraries. For example, do NOT add GodotSteam unless Steam integration is actively beginning in this session. Post-launch integrations should be added to Allowed Libraries when that work begins, not during engine setup.
### Engine Specialists Routing
@@ -571,6 +576,7 @@ Verdict: **COMPLETE** — engine configured and reference docs populated.
- If reference docs already exist for a different engine, ask before replacing
- Always show the user what you're about to change before making CLAUDE.md edits
- If WebSearch returns ambiguous results, show the user and let them decide
+- When the user chose **GDScript**: copy the GDScript CLAUDE.md template from Appendix A1 exactly. NEVER add "C++ via GDExtension" to the Language field. GDScript projects may use GDExtension, but it is not a primary project language. The `godot-gdextension-specialist` in the routing table is available for when native extensions are needed — it does not make C++ a project language.
---
@@ -585,11 +591,13 @@ All Godot-specific variants for language-dependent configuration. Referenced fro
**GDScript:**
```markdown
- **Engine**: Godot [version]
-- **Language**: GDScript (primary), C++ via GDExtension (performance-critical)
+- **Language**: GDScript
- **Build System**: SCons (engine), Godot Export Templates
- **Asset Pipeline**: Godot Import System + custom resource pipeline
```
+> **Guardrail**: When using this GDScript template, write the Language field as exactly "`GDScript`" — no additions. Do NOT append "C++ via GDExtension" or any other language. The C# template below includes GDExtension because C# projects commonly wrap native code; GDScript projects do not.
+
**C#:**
```markdown
- **Engine**: Godot [version]
diff --git a/.claude/skills/skill-improve/SKILL.md b/.claude/skills/skill-improve/SKILL.md
new file mode 100644
index 0000000..340aee4
--- /dev/null
+++ b/.claude/skills/skill-improve/SKILL.md
@@ -0,0 +1,144 @@
+---
+name: skill-improve
+description: "Improve a skill using a test-fix-retest loop. Runs static checks, proposes targeted fixes, rewrites the skill, re-tests, and keeps or reverts based on score change."
+argument-hint: "[skill-name]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Bash
+---
+
+# Skill Improve
+
+Runs an improvement loop on a single skill:
+test → fix → retest → keep or revert.
+
+---
+
+## Phase 1: Parse Argument
+
+Read the skill name from the first argument. If missing, output usage and stop:
+
+```
+Usage: /skill-improve [skill-name]
+Example: /skill-improve tech-debt
+```
+
+Verify `.claude/skills/[name]/SKILL.md` exists. If not, stop with:
+"Skill '[name]' not found."
+
+---
+
+## Phase 2: Baseline Test
+
+Run `/skill-test static [name]` and record the baseline score:
+- Count of FAILs
+- Count of WARNs
+- Which specific checks failed (Check 1–7)
+
+Display to the user:
+```
+Static baseline: [N] failures, [M] warnings
+Failing: Check 4 (no ask-before-write), Check 5 (no handoff)
+```
+
+If baseline is 0 FAILs and 0 WARNs, note it and proceed to Phase 2b.
+
+### Phase 2b: Category Baseline
+
+Look up the skill's `category:` field in `CCGS Skill Testing Framework/catalog.yaml`.
+
+If no `category:` field is found, display:
+"Category: not yet assigned — skipping category checks."
+and skip to Phase 3.
+
+If category is found, run `/skill-test category [name]` and record the category baseline:
+- Count of FAILs
+- Count of WARNs
+- Which specific category rubric metrics failed
+
+Display to the user:
+```
+Category baseline: [N] failures, [M] warnings ([category] rubric)
+```
+
+If BOTH static and category baselines are 0 FAILs and 0 WARNs, stop:
+"This skill already passes all static and category checks. No improvements needed."
+
+---
+
+## Phase 3: Diagnose
+
+Read the full skill file at `.claude/skills/[name]/SKILL.md`.
+
+For each failing or warning **static** check, identify the exact gap:
+
+- **Check 1 fail** → which frontmatter field is missing
+- **Check 2 fail** → how many phases found vs. minimum required
+- **Check 3 fail** → no verdict keywords anywhere in the skill body
+- **Check 4 fail** → Write or Edit in allowed-tools but no ask-before-write language
+- **Check 5 warn** → no follow-up or next-step section at the end
+- **Check 6 warn** → `context: fork` set but fewer than 5 phases found
+- **Check 7 warn** → argument-hint is empty or doesn't match documented modes
+
+For each failing or warning **category** check (if category was assigned in Phase 2b),
+identify the exact gap in the skill's text. For example:
+- If G2 fails (gate mode, full directors not spawned): skill body never references all 4
+ PHASE-GATE director prompts
+- If A2 fails (authoring, no per-section May-I-write): skill asks once at the end, not
+ before each section write
+- If T3 fails (team, BLOCKED not surfaced): skill doesn't halt dependent work on blocked agent
+
+Show the full combined diagnosis to the user before proposing any changes.
+
+---
+
+## Phase 4: Propose Fix
+
+Write a targeted fix for each failure and warning. Show the proposed changes
+as clearly marked before/after blocks. Only change what is failing — do not
+rewrite sections that are passing.
+
+Ask: "May I write this improved version to `.claude/skills/[name]/SKILL.md`?"
+
+If the user says no, stop here.
+
+---
+
+## Phase 5: Write and Retest
+
+Record the current content of the skill file (for revert if needed).
+
+Write the improved skill to `.claude/skills/[name]/SKILL.md`.
+
+Re-run `/skill-test static [name]` and record the new static score.
+If a category was assigned, also re-run `/skill-test category [name]` and record the new category score.
+
+Display the comparison:
+```
+Static: Before [N] failures, [M] warnings → After [N'] failures, [M'] warnings
+Category: Before [N] failures, [M] warnings → After [N'] failures, [M'] warnings (if applicable)
+Combined change: improved / no change / worse
+```
+
+---
+
+## Phase 6: Verdict
+
+Count the combined failure total: static FAILs + category FAILs + static WARNs + category WARNs.
+
+**If combined score improved (combined failure count is lower than baseline):**
+Report: "Score improved. Changes kept."
+Show a summary of what was fixed in each dimension.
+
+**If combined score is the same or worse:**
+Report: "Combined score did not improve."
+Show what changed and why it may not have helped.
+Ask: "May I revert `.claude/skills/[name]/SKILL.md` using git checkout?"
+If yes: run `git checkout -- .claude/skills/[name]/SKILL.md`
+
+---
+
+## Phase 7: Next Steps
+
+- Run `/skill-test static all` to find the next skill with failures.
+- Run `/skill-improve [next-name]` to continue the loop on another skill.
+- Run `/skill-test audit` to see overall coverage progress.
diff --git a/.claude/skills/skill-test/SKILL.md b/.claude/skills/skill-test/SKILL.md
index 1fb0f6d..07ba49d 100644
--- a/.claude/skills/skill-test/SKILL.md
+++ b/.claude/skills/skill-test/SKILL.md
@@ -1,10 +1,9 @@
---
name: skill-test
description: "Validate skill files for structural compliance and behavioral correctness. Three modes: static (linter), spec (behavioral), audit (coverage report)."
-argument-hint: "static [skill-name | all] | spec [skill-name] | audit"
+argument-hint: "static [skill-name | all] | spec [skill-name] | category [skill-name | all] | audit"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write
-context: fork
---
# Skill Test
@@ -13,13 +12,14 @@ Validates `.claude/skills/*/SKILL.md` files for structural compliance and
behavioral correctness. No external dependencies — runs entirely within the
existing skill/hook/template architecture.
-**Three modes:**
+**Four modes:**
| Mode | Command | Purpose | Token Cost |
|------|---------|---------|------------|
| `static` | `/skill-test static [name\|all]` | Structural linter — 7 compliance checks per skill | Low (~1k/skill) |
| `spec` | `/skill-test spec [name]` | Behavioral verifier — evaluates assertions in test spec | Medium (~5k/skill) |
-| `audit` | `/skill-test audit` | Coverage report — which skills have specs, last test dates | Low (~2k total) |
+| `category` | `/skill-test category [name\|all]` | Category rubric — checks skill against its category-specific metrics | Low (~2k/skill) |
+| `audit` | `/skill-test audit` | Coverage report — skills, agent specs, last test dates | Low (~3k total) |
---
@@ -30,7 +30,9 @@ Determine mode from the first argument:
- `static [name]` → run 7 structural checks on one skill
- `static all` → run 7 structural checks on all skills (Glob `.claude/skills/*/SKILL.md`)
- `spec [name]` → read skill + test spec, evaluate assertions
-- `audit` (or no argument) → read catalog, list all skills, show coverage
+- `category [name]` → run category-specific rubric from `CCGS Skill Testing Framework/quality-rubric.md`
+- `category all` → run category rubric for every skill that has a `category:` in catalog
+- `audit` (or no argument) → read catalog, list all skills and agents, show coverage
If argument is missing or unrecognized, output usage and stop.
@@ -137,13 +139,14 @@ Aggregate Verdict: N WARNINGS / N FAILURES
### Step 1 — Locate Files
Find skill at `.claude/skills/[name]/SKILL.md`.
-Find spec at `tests/skills/[name].md`.
+Look up the spec path from `CCGS Skill Testing Framework/catalog.yaml` — use the
+`spec:` field for the matching skill entry.
If either is missing:
- Missing skill: "Skill '[name]' not found in `.claude/skills/`."
-- Missing spec: "No test spec found for '[name]'. Run `/skill-test audit` to see
- coverage gaps, or create a spec using the template at
- `.claude/docs/templates/skill-test-spec.md`."
+- Missing spec path in catalog: "No spec path set for '[name]' in catalog.yaml."
+- Spec file not found at path: "Spec file missing at [path]. Run `/skill-test audit`
+ to see coverage gaps."
### Step 2 — Read Both Files
@@ -177,7 +180,7 @@ For **Protocol Compliance** assertions (always present):
```
=== Skill Spec Test: /[name] ===
Date: [date]
-Spec: tests/skills/[name].md
+Spec: CCGS Skill Testing Framework/skills/[category]/[name].md
Case 1: [Happy Path — name]
Fixture: [summary]
@@ -201,78 +204,139 @@ Overall Verdict: FAIL (1 case failed, 1 warning)
### Step 5 — Offer to Write Results
-"May I write these results to `tests/results/skill-test-spec-[name]-[date].md`
-and update `tests/skills/catalog.yaml`?"
+"May I write these results to `CCGS Skill Testing Framework/results/skill-test-spec-[name]-[date].md`
+and update `CCGS Skill Testing Framework/catalog.yaml`?"
If yes:
-- Write results file to `tests/results/`
-- Update the skill's entry in `tests/skills/catalog.yaml`:
+- Write results file to `CCGS Skill Testing Framework/results/`
+- Update the skill's entry in `CCGS Skill Testing Framework/catalog.yaml`:
- `last_spec: [date]`
- `last_spec_result: PASS|PARTIAL|FAIL`
---
+## Phase 2D: Category Mode — Rubric Evaluation
+
+### Step 1 — Locate Skill and Category
+
+Find skill at `.claude/skills/[name]/SKILL.md`.
+Look up `category:` field in `CCGS Skill Testing Framework/catalog.yaml`.
+
+If skill not found: "Skill '[name]' not found."
+If no `category:` field: "No category assigned for '[name]' in catalog.yaml.
+Add `category: [name]` to the skill entry first."
+
+For `category all`: collect all skills with a `category:` field and process each.
+`category: utility` skills are evaluated against U1 (static checks pass) and U2
+(gate mode correct if applicable) only — skip to the static mode for U1.
+
+### Step 2 — Read Rubric Section
+
+Read `CCGS Skill Testing Framework/quality-rubric.md`.
+Extract the section matching the skill's category (e.g., `### gate`, `### team`).
+
+### Step 3 — Read Skill
+
+Read the skill's `SKILL.md` fully.
+
+### Step 4 — Evaluate Rubric Metrics
+
+For each metric in the category's rubric table:
+1. Check whether the skill's written instructions clearly satisfy the criterion
+2. Mark PASS, FAIL, or WARN
+3. For FAIL/WARN, identify the exact gap in the skill text (quote the relevant section
+ or note its absence)
+
+### Step 5 — Output Report
+
+```
+=== Skill Category Check: /[name] ([category]) ===
+
+Metric G1 — Review mode read: PASS
+Metric G2 — Full mode directors: FAIL
+ Gap: Phase 3 spawns only CD-PHASE-GATE; TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE absent
+Metric G3 — Lean mode: PHASE-GATE only: PASS
+Metric G4 — Solo mode: no directors: PASS
+Metric G5 — No auto-advance: PASS
+
+Verdict: FAIL (1 failure, 0 warnings)
+Fix: Add TD-PHASE-GATE, PR-PHASE-GATE, and AD-PHASE-GATE to the full-mode director
+ panel in Phase 3.
+```
+
+### Step 6 — Offer to Update Catalog
+
+"May I update `CCGS Skill Testing Framework/catalog.yaml` to record this category check
+(`last_category`, `last_category_result`) for [name]?"
+
+---
+
## Phase 2C: Audit Mode — Coverage Report
### Step 1 — Read Catalog
-Read `tests/skills/catalog.yaml`. If missing, note that catalog doesn't exist
+Read `CCGS Skill Testing Framework/catalog.yaml`. If missing, note that catalog doesn't exist
yet (first-run state).
-### Step 2 — Enumerate All Skills
+### Step 2 — Enumerate All Skills and Agents
Glob `.claude/skills/*/SKILL.md` to get the complete list of skills.
Extract skill name from each path (directory name).
-### Step 3 — Build Coverage Table
+Also read the `agents:` section from `CCGS Skill Testing Framework/catalog.yaml` to get the
+complete list of agents.
+
+### Step 3 — Build Skill Coverage Table
For each skill:
-- Check if a spec file exists at `tests/skills/[name].md`
-- Look up `last_static`, `last_static_result`, `last_spec`, `last_spec_result`
- from catalog (or mark as "never" if not in catalog)
-- Assign priority:
- - `critical` — gate-check, design-review, story-readiness, story-done, review-all-gdds, architecture-review
- - `high` — create-epics, create-stories, dev-story, create-control-manifest, propagate-design-change, story-done
- - `medium` — team-* skills, sprint-plan, sprint-status
- - `low` — all others
+- Check if a spec file exists (use the `spec:` path from catalog, or glob `CCGS Skill Testing Framework/skills/*/[name].md`)
+- Look up `last_static`, `last_static_result`, `last_spec`, `last_spec_result`,
+ `last_category`, `last_category_result`, `category` from catalog (or mark as
+ "never" / "—" if not in catalog)
+- Priority comes from catalog `priority:` field (critical/high/medium/low)
+
+### Step 3b — Build Agent Coverage Table
+
+For each agent in catalog's `agents:` section:
+- Check if a spec file exists (use the `spec:` path from catalog, or glob `CCGS Skill Testing Framework/agents/*/[name].md`)
+- Look up `last_spec`, `last_spec_result`, `category` from catalog
### Step 4 — Output Report
```
=== Skill Test Coverage Audit ===
Date: [date]
-Total skills: 52
-Specs written: 4 (7.7%)
-Never tested (static): 48
-Coverage Table:
-Skill | Has Spec | Last Static | Static Result | Last Spec | Spec Result | Priority
------------------------|----------|------------------|---------------|------------------|-------------|----------
-gate-check | YES | never | — | never | — | critical
-design-review | YES | never | — | never | — | critical
-story-readiness | YES | never | — | never | — | critical
-story-done | YES | never | — | never | — | critical
-architecture-review | NO | never | — | never | — | critical
-review-all-gdds | NO | never | — | never | — | critical
+SKILLS (72 total)
+Specs written: 72 (100%) | Never static tested: 72 | Never category tested: 72
+
+Skill | Cat | Has Spec | Last Static | S.Result | Last Cat | C.Result | Priority
+-----------------------|----------|----------|-------------|----------|----------|----------|----------
+gate-check | gate | YES | never | — | never | — | critical
+design-review | review | YES | never | — | never | — | critical
...
-Top 5 Priority Gaps (no spec, critical/high priority):
-1. /architecture-review — critical, no spec
-2. /review-all-gdds — critical, no spec
-3. /create-epics — high, no spec
-4. /create-stories — high, no spec
-5. /dev-story — high, no spec
-4. /propagate-design-change — high, no spec
-5. /sprint-plan — medium, no spec
+AGENTS (49 total)
+Agent specs written: 49 (100%)
-Coverage: 4/52 specs (7.7%)
+Agent | Category | Has Spec | Last Spec | Result
+-----------------------|------------|----------|-------------|--------
+creative-director | director | YES | never | —
+technical-director | director | YES | never | —
+...
+
+Top 5 Priority Gaps (skills with no spec, critical/high priority):
+(none if all specs are written)
+
+Skill coverage: 72/72 specs (100%)
+Agent coverage: 49/49 specs (100%)
```
No file writes in audit mode.
Offer: "Would you like to run `/skill-test static all` to check structural
-compliance across all skills? Or `/skill-test spec [name]` to run a specific
-behavioral test?"
+compliance across all skills? `/skill-test category all` to run category rubric
+checks? Or `/skill-test spec [name]` to run a specific behavioral test?"
---
@@ -284,9 +348,9 @@ After any mode completes, offer contextual follow-up:
correctness if a test spec exists."
- After `static all` with failures: "Address NON-COMPLIANT skills first. Run
`/skill-test static [name]` individually for detailed remediation guidance."
-- After `spec [name]` PASS: "Update `tests/skills/catalog.yaml` to record this
+- After `spec [name]` PASS: "Update `CCGS Skill Testing Framework/catalog.yaml` to record this
pass date. Consider running `/skill-test audit` to find the next spec gap."
- After `spec [name]` FAIL: "Review the failing assertions and update the skill
or the test spec to resolve the mismatch."
- After `audit`: "Start with the critical-priority gaps. Use the spec template
- at `.claude/docs/templates/skill-test-spec.md` to create new specs."
+ at `CCGS Skill Testing Framework/templates/skill-test-spec.md` to create new specs."
diff --git a/.claude/skills/smoke-check/SKILL.md b/.claude/skills/smoke-check/SKILL.md
index 0c0a0c7..6cb1932 100644
--- a/.claude/skills/smoke-check/SKILL.md
+++ b/.claude/skills/smoke-check/SKILL.md
@@ -3,7 +3,7 @@ name: smoke-check
description: "Run the critical path smoke test gate before QA hand-off. Executes the automated test suite, verifies core functionality, and produces a PASS/FAIL report. Run after a sprint's stories are implemented and before manual QA begins. A failed smoke check means the build is not ready for QA."
argument-hint: "[sprint | quick | --platform pc|console|mobile|all]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Bash, Write
+allowed-tools: Read, Glob, Grep, Bash, Write, AskUserQuestion
---
# Smoke Check
diff --git a/.claude/skills/soak-test/SKILL.md b/.claude/skills/soak-test/SKILL.md
index 7a20e56..389f402 100644
--- a/.claude/skills/soak-test/SKILL.md
+++ b/.claude/skills/soak-test/SKILL.md
@@ -4,7 +4,6 @@ description: "Generate a soak test protocol for extended play sessions. Defines
argument-hint: "[duration: 30m | 1h | 2h | 4h] [focus: memory | stability | balance | all]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write
-context: fork
---
# Soak Test
diff --git a/.claude/skills/sprint-plan/SKILL.md b/.claude/skills/sprint-plan/SKILL.md
index 60b62ab..4cdff74 100644
--- a/.claude/skills/sprint-plan/SKILL.md
+++ b/.claude/skills/sprint-plan/SKILL.md
@@ -3,15 +3,19 @@ name: sprint-plan
description: "Generates a new sprint plan or updates an existing one based on the current milestone, completed work, and available capacity. Pulls context from production documents and design backlogs."
argument-hint: "[new|update|status] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit
+allowed-tools: Read, Glob, Grep, Write, Edit, Task, AskUserQuestion
context: |
!ls production/sprints/ 2>/dev/null
---
## Phase 0: Parse Arguments
-Extract the mode argument (`new`, `update`, or `status`) and any `--review [full|lean|solo]`
-flag. Store the review mode as the override for this run (see `.claude/docs/director-gates.md`).
+Extract the mode argument (`new`, `update`, or `status`) and resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
---
@@ -33,7 +37,7 @@ flag. Store the review mode as the override for this run (see `.claude/docs/dire
For `new`:
-**Generate a sprint plan** following this format and present it to the user. Ask: "May I write this sprint plan to `production/sprints/sprint-[N].md`?" If yes, write the file, creating the directory if needed. Verdict: **COMPLETE** — sprint plan created. If no: Verdict: **BLOCKED** — user declined write.
+**Generate a sprint plan** following this format and present it to the user. Do NOT ask to write yet — the producer feasibility gate (Phase 4) runs first and may require revisions before the file is written.
```markdown
# Sprint [N] -- [Start Date] to [End Date]
@@ -74,6 +78,10 @@ For `new`:
## Definition of Done for this Sprint
- [ ] All Must Have tasks completed
- [ ] All tasks pass acceptance criteria
+- [ ] QA plan exists (`production/qa/qa-plan-sprint-[N].md`)
+- [ ] All Logic/Integration stories have passing unit/integration tests
+- [ ] Smoke check passed (`/smoke-check sprint`)
+- [ ] QA sign-off report: APPROVED or APPROVED WITH CONDITIONS (`/team-qa sprint`)
- [ ] No S1 or S2 bugs in delivered features
- [ ] Design documents updated for any deviations
- [ ] Code reviewed and merged
@@ -159,23 +167,62 @@ stories that haven't changed, add new stories, remove dropped ones.
## Phase 4: Producer Feasibility Gate
+**Review mode check** — apply before spawning PR-SPRINT:
+- `solo` → skip. Note: "PR-SPRINT skipped — Solo mode." Proceed to Phase 5 (QA plan gate).
+- `lean` → skip (not a PHASE-GATE). Note: "PR-SPRINT skipped — Lean mode." Proceed to Phase 5 (QA plan gate).
+- `full` → spawn as normal.
+
Before finalising the sprint plan, spawn `producer` via Task using gate **PR-SPRINT** (`.claude/docs/director-gates.md`).
Pass: proposed story list (titles, estimates, dependencies), total team capacity in hours/days, any carryover from the previous sprint, milestone constraints and deadline.
Present the producer's assessment. If UNREALISTIC, revise the story selection (defer stories to Should Have or Nice to Have) before asking for write approval. If CONCERNS, surface them and let the user decide whether to adjust.
-After handling the producer's verdict, add:
+After handling the producer's verdict, ask: "May I write this sprint plan to `production/sprints/sprint-[N].md`?" If yes, write the file, creating the directory if needed. Verdict: **COMPLETE** — sprint plan created. If no: Verdict: **BLOCKED** — user declined write.
+
+After writing, add:
> **Scope check:** If this sprint includes stories added beyond the original epic scope, run `/scope-check [epic]` to detect scope creep before implementation begins.
---
-## Phase 5: Next Steps
+## Phase 5: QA Plan Gate
-After the sprint plan is written, recommend:
+Before closing the sprint plan, check whether a QA plan exists for this sprint.
+Use `Glob` to look for `production/qa/qa-plan-sprint-[N].md` or any file in `production/qa/` referencing this sprint number.
+
+**If a QA plan is found**: note it in the sprint plan output — "QA Plan: `[path]`" — and proceed.
+
+**If no QA plan exists**: do not silently proceed. Surface this explicitly:
+
+> "This sprint has no QA plan. A sprint plan without a QA plan means test requirements are undefined — developers won't know what 'done' looks like from a QA perspective, and the sprint cannot pass the Production → Polish gate without one.
+>
+> Run `/qa-plan sprint` now, before starting any implementation. It takes one session and produces the test case requirements each story needs."
+
+Use `AskUserQuestion`:
+- Prompt: "No QA plan found for this sprint. How do you want to proceed?"
+- Options:
+ - `[A] Run /qa-plan sprint now — I'll do that before starting implementation (Recommended)`
+ - `[B] Skip for now — I understand QA sign-off will be blocked at the Production → Polish gate`
+
+If [A]: close with "Sprint plan written. Run `/qa-plan sprint` next — then begin implementation."
+If [B]: add a warning block to the sprint plan document:
+
+```markdown
+> ⚠️ **No QA Plan**: This sprint was started without a QA plan. Run `/qa-plan sprint`
+> before the last story is implemented. The Production → Polish gate requires a QA
+> sign-off report, which requires a QA plan.
+```
+
+---
+
+## Phase 6: Next Steps
+
+After the sprint plan is written and QA plan status is resolved:
+
+- `/qa-plan sprint` — **required before implementation begins** — defines test cases per story so developers implement against QA specs, not a blank slate
+- `/story-readiness [story-file]` — validate a story is ready before starting it
+- `/dev-story [story-file]` — begin implementing the first story
- `/sprint-status` — check progress mid-sprint
- `/scope-check [epic]` — verify no scope creep before implementation begins
-- `/dev-story [story-file]` — begin implementing the first story
-- `/story-readiness [story-file]` — validate a story is ready before starting it
diff --git a/.claude/skills/sprint-status/SKILL.md b/.claude/skills/sprint-status/SKILL.md
index 5ec1529..a5e6c1f 100644
--- a/.claude/skills/sprint-status/SKILL.md
+++ b/.claude/skills/sprint-status/SKILL.md
@@ -4,7 +4,6 @@ description: "Fast sprint status check. Reads the current sprint plan, scans sto
argument-hint: "[sprint-number or blank for current]"
user-invocable: true
allowed-tools: Read, Glob, Grep
-context: fork
model: haiku
---
diff --git a/.claude/skills/start/SKILL.md b/.claude/skills/start/SKILL.md
index e079c02..9e4ce7e 100644
--- a/.claude/skills/start/SKILL.md
+++ b/.claude/skills/start/SKILL.md
@@ -58,18 +58,24 @@ The user needs creative exploration before anything else.
**Concept phase:**
- `/brainstorm open` — discover your game concept
- `/setup-engine` — configure the engine (brainstorm will recommend one)
+ - `/art-bible` — define visual identity (uses the Visual Identity Anchor brainstorm produces)
- `/map-systems` — decompose the concept into systems
- `/design-system` — author a GDD for each MVP system
- `/review-all-gdds` — cross-system consistency check
- `/gate-check` — validate readiness before architecture work
**Architecture phase:**
- - `/architecture-decision` — record key technical decisions (one per system)
+ - `/create-architecture` — produce the master architecture blueprint and Required ADR list
+ - `/architecture-decision (×N)` — record key technical decisions, following the Required ADR list
- `/create-control-manifest` — compile decisions into an actionable rules sheet
- `/architecture-review` — validate architecture coverage
- **Production phase:**
+ **Pre-Production phase:**
+ - `/ux-design` — author UX specs for key screens (main menu, HUD, core interactions)
+ - `/prototype` — build a throwaway prototype to validate the core mechanic
+ - `/playtest-report (×1+)` — document each vertical slice playtest session
- `/create-epics` — map systems to epics
- `/create-stories` — break epics into implementable stories
- `/sprint-plan` — plan the first sprint
+ **Production phase:** → pick up stories with `/dev-story`
#### If B: Vague idea
@@ -80,18 +86,24 @@ The user needs creative exploration before anything else.
**Concept phase:**
- `/brainstorm [hint]` — develop the idea into a full concept
- `/setup-engine` — configure the engine
+ - `/art-bible` — define visual identity (uses the Visual Identity Anchor brainstorm produces)
- `/map-systems` — decompose the concept into systems
- `/design-system` — author a GDD for each MVP system
- `/review-all-gdds` — cross-system consistency check
- `/gate-check` — validate readiness before architecture work
**Architecture phase:**
- - `/architecture-decision` — record key technical decisions (one per system)
+ - `/create-architecture` — produce the master architecture blueprint and Required ADR list
+ - `/architecture-decision (×N)` — record key technical decisions, following the Required ADR list
- `/create-control-manifest` — compile decisions into an actionable rules sheet
- `/architecture-review` — validate architecture coverage
- **Production phase:**
+ **Pre-Production phase:**
+ - `/ux-design` — author UX specs for key screens (main menu, HUD, core interactions)
+ - `/prototype` — build a throwaway prototype to validate the core mechanic
+ - `/playtest-report (×1+)` — document each vertical slice playtest session
- `/create-epics` — map systems to epics
- `/create-stories` — break epics into implementable stories
- `/sprint-plan` — plan the first sprint
+ **Production phase:** → pick up stories with `/dev-story`
#### If C: Clear concept
@@ -103,20 +115,26 @@ The user needs creative exploration before anything else.
- `Jump straight in` — Go to `/setup-engine` now and write the GDD manually afterward
3. Show the recommended path:
**Concept phase:**
- - `/brainstorm` or `/setup-engine` (their pick)
+ - `/brainstorm` or `/setup-engine` — (their pick from step 2)
+ - `/art-bible` — define visual identity (after brainstorm if run, or after concept doc exists)
- `/design-review` — validate the concept doc
- `/map-systems` — decompose the concept into individual systems
- `/design-system` — author a GDD for each MVP system
- `/review-all-gdds` — cross-system consistency check
- `/gate-check` — validate readiness before architecture work
**Architecture phase:**
- - `/architecture-decision` — record key technical decisions (one per system)
+ - `/create-architecture` — produce the master architecture blueprint and Required ADR list
+ - `/architecture-decision (×N)` — record key technical decisions, following the Required ADR list
- `/create-control-manifest` — compile decisions into an actionable rules sheet
- `/architecture-review` — validate architecture coverage
- **Production phase:**
+ **Pre-Production phase:**
+ - `/ux-design` — author UX specs for key screens (main menu, HUD, core interactions)
+ - `/prototype` — build a throwaway prototype to validate the core mechanic
+ - `/playtest-report (×1+)` — document each vertical slice playtest session
- `/create-epics` — map systems to epics
- `/create-stories` — break epics into implementable stories
- `/sprint-plan` — plan the first sprint
+ **Production phase:** → pick up stories with `/dev-story`
#### If D: Existing work
@@ -155,15 +173,15 @@ Check if `production/review-mode.txt` already exists.
- **Prompt**: "One setup choice: how much design review would you want as you work through the workflow?"
- **Options**:
- - `Full (recommended)` — Director specialists review at each key workflow step. Best for new projects or when you want structured feedback on your decisions.
- - `Lean` — Directors only at phase gate transitions (/gate-check). Skips per-skill reviews. For experienced users who trust their own design work.
+ - `Full` — Director specialists review at each key workflow step. Best for teams, learning the workflow, or when you want thorough feedback on every decision.
+ - `Lean (recommended)` — Directors only at phase gate transitions (/gate-check). Skips per-skill reviews. Balanced approach for solo devs and small teams.
- `Solo` — No director reviews at all. Maximum speed. Best for game jams, prototypes, or if the reviews feel like overhead.
Write the choice to `production/review-mode.txt` immediately after the user
selects — no separate "May I write?" needed, as the write is a direct
consequence of the selection:
-- `Full (recommended)` → write `full`
-- `Lean` → write `lean`
+- `Full` → write `full`
+- `Lean (recommended)` → write `lean`
- `Solo` → write `solo`
Create the `production/` directory if it does not exist.
@@ -193,7 +211,7 @@ Verdict: **COMPLETE** — user oriented and handed off to next step.
- **User picks D but project is empty**: Gently redirect — "It looks like the project is a fresh template with no artifacts yet. Would Path A or B be a better fit?"
- **User picks A but project has code**: Mention what you found — "I noticed there's already code in `src/`. Did you mean to pick D (existing work)?"
-- **User is returning (engine configured, concept exists)**: Skip onboarding entirely — "It looks like you're already set up! Your engine is [X] and you have a game concept at `design/gdd/game-concept.md`. Review mode: `[read from production/review-mode.txt, or 'full (default)' if missing]`. Want to pick up where you left off? Try `/sprint-plan` or just tell me what you'd like to work on."
+- **User is returning (engine configured, concept exists)**: Skip onboarding entirely — "It looks like you're already set up! Your engine is [X] and you have a game concept at `design/gdd/game-concept.md`. Review mode: `[read from production/review-mode.txt, or 'lean (default)' if missing]`. Want to pick up where you left off? Try `/sprint-plan` or just tell me what you'd like to work on."
- **User doesn't fit any option**: Let them describe their situation in their own words and adapt.
---
diff --git a/.claude/skills/story-done/SKILL.md b/.claude/skills/story-done/SKILL.md
index 227e464..f9a4fd2 100644
--- a/.claude/skills/story-done/SKILL.md
+++ b/.claude/skills/story-done/SKILL.md
@@ -3,7 +3,7 @@ name: story-done
description: "End-of-story completion review. Reads the story file, verifies each acceptance criterion against the implementation, checks for GDD/ADR deviations, prompts code review, updates story status to Complete, and surfaces the next ready story from the sprint."
argument-hint: "[story-file-path] [--review full|lean|solo]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Bash, Edit
+allowed-tools: Read, Glob, Grep, Bash, Edit, AskUserQuestion, Task
---
# Story Done
@@ -20,8 +20,12 @@ forgotten, and the story file reflects actual completion status.
## Phase 1: Find the Story
-Extract `--review [full|lean|solo]` if present and store as the review mode
-override for this run (see `.claude/docs/director-gates.md`).
+Resolve the review mode (once, store for all gate spawns this run):
+1. If `--review [full|lean|solo]` was passed → use that
+2. Else read `production/review-mode.txt` → use that value
+3. Else → default to `lean`
+
+See `.claude/docs/director-gates.md` for the full check pattern.
**If a file path is provided** (e.g., `/story-done production/epics/core/story-damage-calculator.md`):
read that file directly.
@@ -149,15 +153,19 @@ Based on the Story Type extracted in Phase 2, check for required evidence:
| **UI** | Manual walkthrough doc OR interaction test in `production/qa/evidence/` | ADVISORY |
| **Config/Data** | Smoke check pass report in `production/qa/smoke-*.md` | ADVISORY |
-**For Logic stories**: use `Glob` to check `tests/unit/[system]/` for a test
-file matching the story slug. If none found:
-- Flag as **BLOCKING**: "Logic story has no unit test file. Expected at
- `tests/unit/[system]/[story-slug]_test.[ext]`. Create and run the test
- before marking this story Complete."
+**For Logic stories**: first read the story's **Test Evidence** section to extract the
+exact required file path. Use `Glob` to check that exact path. If the exact path is not
+found, also search `tests/unit/[system]/` broadly (the file may have been placed at a
+slightly different location). If no test file is found at either location:
+- Flag as **BLOCKING**: "Logic story has no unit test file. Story requires it at
+ `[exact-path-from-Test-Evidence-section]`. Create and run the test before marking
+ this story Complete."
-**For Integration stories**: check `tests/integration/[system]/` AND
-`production/session-logs/` for a playtest record referencing this story.
-If neither exists: flag as **BLOCKING** (same rule as Logic).
+**For Integration stories**: read the story's **Test Evidence** section for the exact
+required path. Use `Glob` to check that exact path first, then search
+`tests/integration/[system]/` broadly, then check `production/session-logs/` for a
+playtest record referencing this story.
+If none found: flag as **BLOCKING** (same rule as Logic).
**For Visual/Feel and UI stories**: glob `production/qa/evidence/` for a file
referencing this story. If none: flag as **ADVISORY** —
@@ -217,8 +225,39 @@ For each deviation found, categorize:
---
+## Phase 4b: QA Coverage Gate
+
+**Review mode check** — apply before spawning QL-TEST-COVERAGE:
+- `solo` → skip. Note: "QL-TEST-COVERAGE skipped — Solo mode." Proceed to Phase 5.
+- `lean` → skip (not a PHASE-GATE). Note: "QL-TEST-COVERAGE skipped — Lean mode." Proceed to Phase 5.
+- `full` → spawn as normal.
+
+After completing the deviation checks in Phase 4, spawn `qa-lead` via Task using gate **QL-TEST-COVERAGE** (`.claude/docs/director-gates.md`).
+
+Pass:
+- The story file path and story type
+- Test file paths found during Phase 3 (exact paths, or "none found")
+- The story's `## QA Test Cases` section (the pre-written test specs from story creation)
+- The story's `## Acceptance Criteria` list
+
+The qa-lead reviews whether the tests actually cover what was specified — not just whether files exist.
+
+Apply the verdict:
+- **ADEQUATE** → proceed to Phase 5
+- **GAPS** → flag as **ADVISORY**: "QA lead identified coverage gaps: [list]. Story can complete but gaps should be addressed in a follow-up story."
+- **INADEQUATE** → flag as **BLOCKING**: "QA lead: critical logic is untested. Verdict cannot be COMPLETE until coverage improves. Specific gaps: [list]."
+
+Skip this phase for Config/Data stories (no code tests required).
+
+---
+
## Phase 5: Lead Programmer Code Review Gate
+**Review mode check** — apply before spawning LP-CODE-REVIEW:
+- `solo` → skip. Note: "LP-CODE-REVIEW skipped — Solo mode." Proceed to Phase 6 (completion report).
+- `lean` → skip (not a PHASE-GATE). Note: "LP-CODE-REVIEW skipped — Lean mode." Proceed to Phase 6 (completion report).
+- `full` → spawn as normal.
+
Spawn `lead-programmer` via Task using gate **LP-CODE-REVIEW** (`.claude/docs/director-gates.md`).
Pass: implementation file paths, story file path, relevant GDD section, governing ADR.
@@ -346,13 +385,25 @@ Run `/story-readiness [path]` to confirm a story is implementation-ready
before starting.
```
-If no more stories are ready in this sprint:
-"No more stories ready in this sprint. Consider running `/sprint-status` to
-assess sprint health."
+If no more Must Have stories remain in this sprint (all are Complete or Blocked):
-If all Must Have stories are complete:
-"All Must Have stories are complete. Consider running `/milestone-review` or
-pulling from the Should Have list."
+```
+### Sprint Close-Out Sequence
+
+All Must Have stories are complete. QA sign-off is required before advancing.
+Run these in order:
+
+1. `/smoke-check sprint` — verify the critical path still works end-to-end
+2. `/team-qa sprint` — full QA cycle: test case execution, bug triage, sign-off report
+3. `/gate-check` — advance to the next phase once QA approves
+
+Do not run `/gate-check` until `/team-qa` returns APPROVED or APPROVED WITH CONDITIONS.
+```
+
+If there are Should Have stories still unstarted, surface them alongside the close-out sequence so the user can choose: close the sprint now, or pull in more work first.
+
+If no more stories are ready but Must Have stories are still In Progress (not Complete):
+"No more stories ready to start — [N] Must Have stories still in progress. Continue implementing those before sprint close-out."
---
diff --git a/.claude/skills/story-readiness/SKILL.md b/.claude/skills/story-readiness/SKILL.md
index 2c682c0..ddc5ecb 100644
--- a/.claude/skills/story-readiness/SKILL.md
+++ b/.claude/skills/story-readiness/SKILL.md
@@ -3,8 +3,7 @@ name: story-readiness
description: "Validate that a story file is implementation-ready. Checks for embedded GDD requirements, ADR references, engine notes, clear acceptance criteria, and no open design questions. Produces READY / NEEDS WORK / BLOCKED verdict with specific gaps. Use when user says 'is this story ready', 'can I start on this story', 'is story X ready to implement'."
argument-hint: "[story-file-path or 'all' or 'sprint']"
user-invocable: true
-allowed-tools: Read, Glob, Grep
-context: fork
+allowed-tools: Read, Glob, Grep, AskUserQuestion
model: haiku
---
diff --git a/.claude/skills/team-level/SKILL.md b/.claude/skills/team-level/SKILL.md
index 8822249..b5dc161 100644
--- a/.claude/skills/team-level/SKILL.md
+++ b/.claude/skills/team-level/SKILL.md
@@ -38,7 +38,10 @@ Always provide full context in each agent's prompt (game concept, pillars, exist
3. **Orchestrate the level design team** in sequence:
-### Step 1: Narrative Context (narrative-director + world-builder)
+### Step 1: Narrative + Visual Direction (narrative-director + world-builder + art-director, parallel)
+
+Spawn all three agents simultaneously — issue all three Task calls before waiting for any result.
+
Spawn the `narrative-director` agent to:
- Define the narrative purpose of this area (what story beats happen here?)
- Identify key characters, dialogue triggers, and lore elements
@@ -49,15 +52,29 @@ Spawn the `world-builder` agent to:
- Define environmental storytelling opportunities
- Specify any world rules that affect gameplay in this area
-**Gate**: Use `AskUserQuestion` to present Step 1 outputs and confirm before proceeding to Step 2.
+Spawn the `art-director` agent to:
+- Establish visual theme targets for this area — these are INPUTS to layout, not outputs of it
+- Define the color temperature and lighting mood for this area (how does it differ from adjacent areas?)
+- Specify shape language direction (angular fortress? organic cave? decayed grandeur?)
+- Name the primary visual landmarks that will orient the player
+- Read `design/art/art-bible.md` if it exists — anchor all direction in the established art bible
+
+**The art-director's visual targets from Step 1 must be passed to the level-designer in Step 2** as explicit constraints. Layout decisions happen within the visual direction, not before it.
+
+**Gate**: Use `AskUserQuestion` to present all three Step 1 outputs (narrative brief, lore foundation, visual direction targets) and confirm before proceeding to Step 2.
### Step 2: Layout and Encounter Design (level-designer)
-Spawn the `level-designer` agent to:
-- Design the spatial layout (critical path, optional paths, secrets)
-- Define pacing curve (tension peaks, rest areas, exploration zones)
+Spawn the `level-designer` agent with the full Step 1 output as context:
+- Narrative brief (from narrative-director)
+- Lore foundation (from world-builder)
+- **Visual direction targets (from art-director)** — layout must work within these targets, not contradict them
+
+The level-designer should:
+- Design the spatial layout (critical path, optional paths, secrets) — ensuring primary routes align with the visual landmark targets from Step 1
+- Define pacing curve (tension peaks, rest areas, exploration zones) — coordinated with the emotional arc from narrative-director
- Place encounters with difficulty progression
- Design environmental puzzles or navigation challenges
-- Define points of interest and landmarks for wayfinding
+- Define points of interest and landmarks for wayfinding — these must match the visual landmarks the art-director specified
- Specify entry/exit points and connections to adjacent areas
**Adjacent area dependency check**: After the layout is produced, check `design/levels/` for each adjacent area referenced by the level-designer. If any referenced area's `.md` file does not exist, surface the gap:
@@ -81,13 +98,16 @@ Spawn the `systems-designer` agent to:
**Gate**: Use `AskUserQuestion` to present Step 3 outputs and confirm before proceeding to Step 4.
-### Step 4: Visual Direction and Accessibility (parallel)
-Spawn the `art-director` agent to:
-- Define the visual theme and color palette for the area
-- Specify lighting mood and time-of-day settings
-- List required art assets (environment props, unique assets)
-- Define visual landmarks and sight lines
-- Specify any special VFX needs (weather, particles, fog)
+### Step 4: Production Concepts + Accessibility (art-director + accessibility-specialist, parallel)
+
+**Note**: The art-director's directional pass (visual theme, color targets, mood) happened in Step 1. This pass is location-specific production concepts — given the finalized layout, what does each specific space look like?
+
+Spawn the `art-director` agent with the finalized layout from Step 2:
+- Produce location-specific concept specs for key spaces (entrance, key encounter zones, landmarks, exits)
+- Specify which art assets are unique to this area vs. shared from the global pool
+- Define sight-line and lighting setups per key space (these are now layout-informed, not directional)
+- Specify VFX needs that are specific to this area's layout (weather volumes, particles, atmospheric effects)
+- Flag any locations where the layout creates visual direction conflicts with the Step 1 targets — surface these as production risks
Spawn the `accessibility-specialist` agent in parallel to:
- Review the level layout for navigation clarity (can players orient themselves without relying on color alone?)
diff --git a/.claude/skills/team-narrative/SKILL.md b/.claude/skills/team-narrative/SKILL.md
index 5ccd47b..373ad22 100644
--- a/.claude/skills/team-narrative/SKILL.md
+++ b/.claude/skills/team-narrative/SKILL.md
@@ -19,6 +19,7 @@ The user must approve before moving to the next phase.
- **narrative-director** — Story arcs, character design, dialogue strategy, narrative vision
- **writer** — Dialogue writing, lore entries, item descriptions, in-game text
- **world-builder** — World rules, faction design, history, geography, environmental storytelling
+- **art-director** — Character visual design, environmental visual storytelling, cutscene/cinematic tone
- **level-designer** — Level layouts that serve the narrative, pacing, environmental storytelling beats
## How to Delegate
@@ -27,6 +28,7 @@ Use the Task tool to spawn each team member as a subagent:
- `subagent_type: narrative-director` — Story arcs, character design, narrative vision
- `subagent_type: writer` — Dialogue writing, lore entries, in-game text
- `subagent_type: world-builder` — World rules, faction design, history, geography
+- `subagent_type: art-director` — Character visual profiles, environmental visual storytelling, cinematic tone
- `subagent_type: level-designer` — Level layouts that serve the narrative, pacing
- `subagent_type: localization-lead` — i18n validation, string key compliance, translation headroom
@@ -43,9 +45,10 @@ Delegate to **narrative-director**:
- Output: narrative brief with story requirements
### Phase 2: World Foundation (parallel)
-Delegate in parallel:
+Delegate in parallel — issue all three Task calls simultaneously before waiting for any result:
- **world-builder**: Create or update lore entries for factions, locations, and history relevant to this content. Cross-reference against existing lore for contradictions. Set canon level for new entries.
- **writer**: Draft character dialogue using voice profiles. Ensure all lines are under 120 characters, use named placeholders for variables, and are localization-ready.
+- **art-director**: Define character visual design direction for key characters appearing in this content (silhouette, visual archetype, distinguishing features). Specify environmental visual storytelling elements for each key space (prop composition, lighting notes, spatial arrangement). Define tone palette and cinematic direction for any cutscenes or scripted sequences.
### Phase 3: Level Narrative Integration
Delegate to **level-designer**:
diff --git a/.claude/skills/team-qa/SKILL.md b/.claude/skills/team-qa/SKILL.md
index bcfd001..f8ba570 100644
--- a/.claude/skills/team-qa/SKILL.md
+++ b/.claude/skills/team-qa/SKILL.md
@@ -3,7 +3,7 @@ name: team-qa
description: "Orchestrate the QA team through a full testing cycle. Coordinates qa-lead (strategy + test plan) and qa-tester (test case writing + bug reporting) to produce a complete QA package for a sprint or feature. Covers: test plan generation, test case writing, smoke check gate, manual QA execution, and sign-off report."
argument-hint: "[sprint | feature: system-name]"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Task
+allowed-tools: Read, Glob, Grep, Write, Task, AskUserQuestion
agent: qa-lead
---
@@ -53,11 +53,16 @@ Prompt the qa-lead to:
- Identify which stories require automated test evidence vs. manual QA
- Flag any stories with missing acceptance criteria or missing test evidence that would block QA
- Estimate manual QA effort (number of test sessions needed)
-- Produce a strategy summary table:
+- Check `tests/smoke/` for smoke test scenarios; for each, assess whether it can be verified given the current build. Produce a smoke check verdict: **PASS** / **PASS WITH WARNINGS [list]** / **FAIL [list of failures]**
+- Produce a strategy summary table and smoke check result:
| Story | Type | Automated Required | Manual Required | Blocker? |
|-------|------|--------------------|-----------------|----------|
+ **Smoke Check**: [PASS / PASS WITH WARNINGS / FAIL] — [details if not PASS]
+
+If the smoke check result is **FAIL**, the qa-lead must list the failures prominently. QA cannot proceed past the strategy phase with a failed smoke check.
+
Present the qa-lead's full strategy to the user, then use `AskUserQuestion`:
```
@@ -66,9 +71,12 @@ options:
- "Looks good — proceed to test plan"
- "Adjust story types before proceeding"
- "Skip blocked stories and proceed with the rest"
+ - "Smoke check failed — fix issues and re-run /team-qa"
- "Cancel — resolve blockers first"
```
+If smoke check **FAIL**: do not proceed to Phase 3. Surface the failures and stop. The user must fix them and re-run `/team-qa`.
+If smoke check **PASS WITH WARNINGS**: note the warnings for the sign-off report and continue.
If blockers are present: list them explicitly. The user may choose to skip blocked stories or cancel the cycle.
### Phase 3: Test Plan Generation
@@ -88,26 +96,9 @@ Ask: "May I write the QA plan to `production/qa/qa-plan-[sprint]-[date].md`?"
Write only after receiving approval.
-### Phase 4: Smoke Check Gate
+### Phase 4: Test Case Writing (qa-tester)
-Before any manual QA begins, run the smoke check.
-
-Spawn `qa-lead` via Task with instructions to:
-- Review the `tests/smoke/` directory for the current smoke test list
-- Check whether each smoke test scenario can be verified given the current build
-- Produce a smoke check result: **PASS** / **PASS WITH WARNINGS** / **FAIL**
-
-Report the result to the user:
-
-- **PASS**: "Smoke check passed. Proceeding to test case writing."
-- **PASS WITH WARNINGS**: "Smoke check passed with warnings: [list issues]. These are non-blocking. Proceeding — note these for the sign-off report."
-- **FAIL**: "Smoke check failed. QA cannot begin until these issues are resolved:
- [list failures]
- Fix them and re-run `/smoke-check`, or re-run `/team-qa` once resolved."
-
-On FAIL: stop the cycle and surface the list of failures. Do not proceed.
-
-### Phase 5: Test Case Writing (qa-tester)
+> **Smoke check** is performed as part of Phase 2 (QA Strategy). If the smoke check returned FAIL in Phase 2, the cycle was stopped there. This phase only runs when the Phase 2 smoke check was PASS or PASS WITH WARNINGS.
For each story requiring manual QA (Visual/Feel, UI, Integration without automated tests):
diff --git a/.claude/skills/test-evidence-review/SKILL.md b/.claude/skills/test-evidence-review/SKILL.md
index c057df9..afa7dff 100644
--- a/.claude/skills/test-evidence-review/SKILL.md
+++ b/.claude/skills/test-evidence-review/SKILL.md
@@ -4,7 +4,6 @@ description: "Quality review of test files and manual evidence documents. Goes b
argument-hint: "[story-path | sprint | system-name]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write
-context: fork
---
# Test Evidence Review
diff --git a/.claude/skills/test-flakiness/SKILL.md b/.claude/skills/test-flakiness/SKILL.md
index 7b2661d..c2427af 100644
--- a/.claude/skills/test-flakiness/SKILL.md
+++ b/.claude/skills/test-flakiness/SKILL.md
@@ -4,7 +4,6 @@ description: "Detect non-deterministic (flaky) tests by reading CI run logs or t
argument-hint: "[ci-log-path | scan | registry]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write, Edit, Bash
-context: fork
---
# Test Flakiness Detection
diff --git a/.claude/skills/test-helpers/SKILL.md b/.claude/skills/test-helpers/SKILL.md
index dce8760..a7e10b1 100644
--- a/.claude/skills/test-helpers/SKILL.md
+++ b/.claude/skills/test-helpers/SKILL.md
@@ -4,7 +4,6 @@ description: "Generate engine-specific test helper libraries for the project's t
argument-hint: "[system-name | all | scaffold]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Write
-context: fork
---
# Test Helpers
diff --git a/.claude/skills/ux-design/SKILL.md b/.claude/skills/ux-design/SKILL.md
index dcea1dc..77630e5 100644
--- a/.claude/skills/ux-design/SKILL.md
+++ b/.claude/skills/ux-design/SKILL.md
@@ -3,8 +3,7 @@ name: ux-design
description: "Guided, section-by-section UX spec authoring for a screen, flow, or HUD. Reads game concept, player journey, and relevant GDDs to provide context-aware design guidance. Produces ux-spec.md (per screen/flow) or hud-design.md using the studio templates."
argument-hint: "[screen/flow name] or 'hud' or 'patterns'"
user-invocable: true
-allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion
-context: fork
+allowed-tools: Read, Glob, Grep, Write, Edit, AskUserQuestion, Task
agent: ux-designer
---
@@ -81,9 +80,8 @@ so you can reference them rather than reinvent them.
### 2f: Art Bible
-Check for `docs/art-bible.md` or `design/art-bible.md`. If found, read the
-visual direction section. UX layout must align with the aesthetic commitments
-already made.
+Check for `design/art/art-bible.md`. If found, read the visual direction
+section. UX layout must align with the aesthetic commitments already made.
### 2g: Accessibility Requirements
@@ -162,6 +160,18 @@ Ask: "May I create the skeleton file at `design/ux/[filename].md`?"
---
+## Navigation Position
+
+[To be designed]
+
+---
+
+## Entry & Exit Points
+
+[To be designed]
+
+---
+
## Layout Specification
### Information Hierarchy
@@ -194,6 +204,18 @@ Ask: "May I create the skeleton file at `design/ux/[filename].md`?"
---
+## Events Fired
+
+[To be designed]
+
+---
+
+## Transitions & Animations
+
+[To be designed]
+
+---
+
## Data Requirements
[To be designed]
@@ -206,6 +228,18 @@ Ask: "May I create the skeleton file at `design/ux/[filename].md`?"
---
+## Localization Considerations
+
+[To be designed]
+
+---
+
+## Acceptance Criteria
+
+[To be designed]
+
+---
+
## Open Questions
[To be designed]
@@ -383,6 +417,40 @@ Offer to map this against the journey phases if the player journey doc exists.
---
+#### Section B2: Navigation Position
+
+Where does this screen sit in the game's navigation hierarchy? This is a one-paragraph orientation map — not a full flow diagram.
+
+**Questions to ask**:
+- "Is this screen accessed from the main menu, from pause, from within gameplay, or from another screen?"
+- "Is it a top-level destination (always reachable) or a context-dependent one (only accessible in certain states)?"
+- "Can the player reach this screen from more than one place in the game?"
+
+Present as: "This screen lives at: [root] → [parent] → [this screen]" plus any alternate entry paths.
+
+---
+
+#### Section B3: Entry & Exit Points
+
+Map every way the player can arrive at and leave this screen.
+
+**Questions to ask**:
+- "What are all the ways a player can reach this screen?" (List each trigger: button press, game event, redirect from another screen, etc.)
+- "What can the player do to exit? What happens when they do?" (Back button, confirm action, timeout, game event)
+- "Are there any exits that are one-way — where the player cannot return to this screen without starting over?"
+
+Present as two tables:
+
+| Entry Source | Trigger | Player carries this context |
+|---|---|---|
+| [screen/event] | [how] | [state/data they arrive with] |
+
+| Exit Destination | Trigger | Notes |
+|---|---|---|
+| [screen/event] | [how] | [any irreversible state changes] |
+
+---
+
#### Section C: Layout Specification
This is the largest and most interactive section. Work through it in sub-sections:
@@ -459,6 +527,41 @@ an existing UX spec or note it as a spec dependency.
---
+#### Section E2: Events Fired
+
+For every player action in the Interaction Map, document the corresponding event the game or analytics system should fire — or explicitly note "no event" if none applies.
+
+**Questions to ask**:
+- "For each action, should the game fire an analytics event, trigger a game-state change, or both?"
+- "Are there any actions that should NOT fire an event — and is that a deliberate choice?"
+
+Present as a table alongside the Interaction Map:
+
+| Player Action | Event Fired | Payload / Data |
+|---|---|---|
+| [action] | [EventName] or none | [data passed with event] |
+
+Flag any action that modifies persistent game state (save data, progress, economy) — these need explicit attention from the architecture team.
+
+---
+
+#### Section E3: Transitions & Animations
+
+Specify how the screen enters and exits, and how it responds to state changes.
+
+**Questions to ask**:
+- "How does this screen appear? (fade in, slide from right, instant pop, scale from button)"
+- "How does it dismiss? (fade out, slide back, cut)"
+- "Are there any in-screen state transitions that need animation? (loading spinner, success state, error flash)"
+- "Is there any animation that could cause motion sickness — and does the game have a reduced-motion option?"
+
+Minimum required:
+- Screen enter transition
+- Screen exit transition
+- At least one state-change animation if the screen has multiple states
+
+---
+
#### Section F: Data Requirements
Cross-reference the GDD UI Requirements sections gathered in Phase 2.
@@ -499,6 +602,45 @@ Use `AskUserQuestion` to surface any open questions on accessibility tier:
---
+#### Section H: Localization Considerations
+
+Document constraints that affect how this screen behaves when text is translated.
+
+**Questions to ask**:
+- "Which text elements on this screen are the longest? What is the maximum character count that fits the layout?"
+- "Are there any elements where text length is layout-critical — e.g., a button label that must stay on one line?"
+- "Are there any elements that display numbers, dates, or currencies that need locale-specific formatting?"
+
+Note: aim to flag any element where a 40% text expansion (common in translations from English to German or French) would break the layout. Mark those as HIGH PRIORITY for the localization engineer.
+
+---
+
+#### Section I: Acceptance Criteria
+
+Write at least 5 specific, testable criteria that a QA tester can verify without reading any other design document. These become the pass/fail conditions for `/story-done`.
+
+**Format**: Use checkboxes. Each criterion must be verifiable by a human tester:
+
+```
+- [ ] Screen opens within [X]ms from [trigger]
+- [ ] [Element] displays correctly at [minimum] and [maximum] values
+- [ ] [Navigation action] correctly routes to [destination screen]
+- [ ] Error state appears when [condition] and shows [specific message or icon]
+- [ ] Keyboard/gamepad navigation reaches all interactive elements in logical order
+- [ ] [Accessibility requirement] is met — e.g., "all interactive elements have focus indicators"
+```
+
+**Minimum required**:
+- 1 performance criterion (load/open time)
+- 1 navigation criterion (at least one entry or exit path verified)
+- 1 error/empty state criterion
+- 1 accessibility criterion (per committed tier)
+- 1 criterion specific to this screen's core purpose
+
+Ask the user to confirm: "Do these criteria cover what would actually make this screen 'done' for your QA process?"
+
+---
+
### Section Guidance: HUD Design Mode
HUD design follows a different order from UX spec mode. Begin with philosophy;
@@ -699,14 +841,23 @@ Update `production/session-state/active.md` with:
### 6b: Suggest Next Step
-Use `AskUserQuestion`:
-- "The spec is complete. What's next?"
+Before presenting options, state clearly:
+
+> "This spec should be validated with `/ux-review` before it enters the
+> implementation pipeline. The Pre-Production gate requires all key screen specs
+> to have a review verdict."
+
+Then use `AskUserQuestion`:
+- "Run `/ux-review [filename]` now, or do something else first?"
- Options:
- - "Run `/ux-review` to validate this spec"
- - "Design another screen"
+ - "Run `/ux-review` now — validate this spec"
+ - "Design another screen first, then review all specs together"
- "Update the interaction pattern library with new patterns from this spec"
- "Stop here for this session"
+If the user picks "Design another screen first", add a note: "Reminder: run
+`/ux-review` on all completed specs before running `/gate-check pre-production`."
+
### 6c: Cross-Link Related Specs
If other UX specs link to or from this screen, note which ones should reference
@@ -740,7 +891,7 @@ specific sub-topics, additional context or coordination may be needed:
| Implementation feasibility (engine constraints) | `ui-programmer` — before finalizing component inventory |
| Gameplay data requirements | `game-designer` — when data ownership is unclear |
| Narrative/lore visible in the UI | `narrative-director` — for flavor text, item names, lore panels |
-| Accessibility tier decisions | `ux-designer` (owns this) |
+| Accessibility tier decisions | Handled by this session — owned by ux-designer |
When delegating to another agent via the Task tool:
- Provide: screen name, game concept summary, the specific question needing expert input
diff --git a/.claude/skills/ux-review/SKILL.md b/.claude/skills/ux-review/SKILL.md
index 48c31e9..609bf69 100644
--- a/.claude/skills/ux-review/SKILL.md
+++ b/.claude/skills/ux-review/SKILL.md
@@ -4,7 +4,6 @@ description: "Validates a UX spec, HUD design, or interaction pattern library fo
argument-hint: "[file-path or 'all' or 'hud' or 'patterns']"
user-invocable: true
allowed-tools: Read, Glob, Grep
-context: fork
agent: ux-designer
---
diff --git a/.claude/statusline.sh b/.claude/statusline.sh
index ef094fa..62ccaac 100644
--- a/.claude/statusline.sh
+++ b/.claude/statusline.sh
@@ -64,15 +64,23 @@ if [ -z "$stage" ]; then
src_count=$(find "$cwd/src" -type f \( -name "*.gd" -o -name "*.cs" -o -name "*.cpp" -o -name "*.h" -o -name "*.py" -o -name "*.rs" -o -name "*.lua" -o -name "*.tscn" -o -name "*.tres" \) 2>/dev/null | wc -l | tr -d ' ')
fi
+ # Check for ADRs (signals Pre-Production phase)
+ has_adrs=false
+ if ls "$cwd/docs/architecture/"adr-*.md 2>/dev/null | head -1 | grep -q .; then
+ has_adrs=true
+ fi
+
# Determine stage (check from most-advanced backward)
if [ "$src_count" -ge 10 ] 2>/dev/null; then
stage="Production"
- elif [ "$engine_configured" = true ]; then
+ elif [ "$has_adrs" = true ]; then
stage="Pre-Production"
- elif [ "$has_systems" = true ]; then
+ elif [ "$engine_configured" = true ]; then
stage="Technical Setup"
- elif [ "$has_concept" = true ]; then
+ elif [ "$has_systems" = true ]; then
stage="Systems Design"
+ elif [ "$has_concept" = true ]; then
+ stage="Concept"
else
stage="Concept"
fi
diff --git a/CCGS Skill Testing Framework/CLAUDE.md b/CCGS Skill Testing Framework/CLAUDE.md
new file mode 100644
index 0000000..b326690
--- /dev/null
+++ b/CCGS Skill Testing Framework/CLAUDE.md
@@ -0,0 +1,94 @@
+# CCGS Skill Testing Framework — Claude Instructions
+
+This folder is the quality assurance layer for the Claude Code Game Studios skill/agent
+framework. It is self-contained and separate from any game project.
+
+## Key files
+
+| File | Purpose |
+|------|---------|
+| `catalog.yaml` | Master registry for all 72 skills and 49 agents. Contains category, spec path, and last-test tracking fields. Always read this first when running any test command. |
+| `quality-rubric.md` | Category-specific pass/fail metrics. Read the matching `###` section for the skill's category when running `/skill-test category`. |
+| `skills/[category]/[name].md` | Behavioral spec for a skill — 5 test cases + protocol compliance assertions. |
+| `agents/[tier]/[name].md` | Behavioral spec for an agent — 5 test cases + protocol compliance assertions. |
+| `templates/skill-test-spec.md` | Template for writing new skill spec files. |
+| `templates/agent-test-spec.md` | Template for writing new agent spec files. |
+| `results/` | Written by `/skill-test spec` when results are saved. Gitignored. |
+
+## Path conventions
+
+- Skill specs: `CCGS Skill Testing Framework/skills/[category]/[name].md`
+- Agent specs: `CCGS Skill Testing Framework/agents/[tier]/[name].md`
+- Catalog: `CCGS Skill Testing Framework/catalog.yaml`
+- Rubric: `CCGS Skill Testing Framework/quality-rubric.md`
+
+The `spec:` field in `catalog.yaml` is the authoritative path for each skill/agent spec.
+Always read it rather than guessing the path.
+
+## Skill categories
+
+```
+gate → gate-check
+review → design-review, architecture-review, review-all-gdds
+authoring → design-system, quick-design, architecture-decision, art-bible,
+ create-architecture, ux-design, ux-review
+readiness → story-readiness, story-done
+pipeline → create-epics, create-stories, dev-story, create-control-manifest,
+ propagate-design-change, map-systems
+analysis → consistency-check, balance-check, content-audit, code-review,
+ tech-debt, scope-check, estimate, perf-profile, asset-audit,
+ security-audit, test-evidence-review, test-flakiness
+team → team-combat, team-narrative, team-audio, team-level, team-ui,
+ team-qa, team-release, team-polish, team-live-ops
+sprint → sprint-plan, sprint-status, milestone-review, retrospective,
+ changelog, patch-notes
+utility → all remaining skills
+```
+
+## Agent tiers
+
+```
+directors → creative-director, technical-director, producer, art-director
+leads → lead-programmer, narrative-director, audio-director, ux-designer,
+ qa-lead, release-manager, localization-lead
+specialists → gameplay-programmer, engine-programmer, ui-programmer,
+ tools-programmer, network-programmer, ml-engineer, ai-programmer,
+ level-designer, sound-designer, technical-artist
+godot → godot-specialist, godot-gdscript-specialist, godot-csharp-specialist,
+ godot-shader-specialist, godot-gdextension-specialist
+unity → unity-specialist, unity-ui-specialist, unity-shader-specialist,
+ unity-dots-specialist, unity-addressables-specialist
+unreal → unreal-specialist, ue-gas-specialist, ue-replication-specialist,
+ ue-umg-specialist, ue-blueprint-specialist
+operations → devops-engineer, deployment-engineer, database-admin,
+ security-engineer, performance-analyst, analytics-engineer,
+ community-manager
+creative → writer, world-builder, game-designer, economy-designer,
+ systems-designer, prototyper
+```
+
+## Workflow for testing a skill
+
+1. Read `catalog.yaml` to get the skill's `spec:` path and `category:`
+2. Read the skill at `.claude/skills/[name]/SKILL.md`
+3. Read the spec at the `spec:` path
+4. Evaluate assertions case by case
+5. Offer to write results to `results/` and update `catalog.yaml`
+
+## Workflow for improving a skill
+
+Use `/skill-improve [name]`. It handles the full loop:
+test → diagnose → propose fix → rewrite → retest → keep or revert.
+
+## Spec validity note
+
+Specs in this folder describe **current behavior**, not ideal behavior. They were
+written by reading the skills, so they may encode bugs. When a skill misbehaves in
+practice, correct the skill first, then update the spec to match the fixed behavior.
+Treat spec failures as "this needs investigation," not "the skill is definitively wrong."
+
+## This folder is deletable
+
+Nothing in `.claude/` imports from here. Deleting this folder has no effect on the
+CCGS skills or agents themselves. `/skill-test` and `/skill-improve` will report that
+`catalog.yaml` is missing and guide the user to initialize it.
diff --git a/CCGS Skill Testing Framework/README.md b/CCGS Skill Testing Framework/README.md
new file mode 100644
index 0000000..5b074e4
--- /dev/null
+++ b/CCGS Skill Testing Framework/README.md
@@ -0,0 +1,150 @@
+# CCGS Skill Testing Framework
+
+Quality assurance infrastructure for the **Claude Code Game Studios** framework.
+Tests the skills and agents themselves — not any game built with them.
+
+> **This folder is self-contained and optional.**
+> Game developers using CCGS don't need it. To remove it entirely:
+> `rm -rf "CCGS Skill Testing Framework"` — nothing in `.claude/` depends on it.
+
+---
+
+## What's in here
+
+```
+CCGS Skill Testing Framework/
+├── README.md ← you are here
+├── CLAUDE.md ← tells Claude how to use this framework
+├── catalog.yaml ← master registry: all 72 skills + 49 agents, coverage tracking
+├── quality-rubric.md ← category-specific pass/fail metrics for /skill-test category
+│
+├── skills/ ← behavioral spec files for skills (one per skill)
+│ ├── gate/ ← gate category specs
+│ ├── review/ ← review category specs
+│ ├── authoring/ ← authoring category specs
+│ ├── readiness/ ← readiness category specs
+│ ├── pipeline/ ← pipeline category specs
+│ ├── analysis/ ← analysis category specs
+│ ├── team/ ← team category specs
+│ ├── sprint/ ← sprint category specs
+│ └── utility/ ← utility category specs
+│
+├── agents/ ← behavioral spec files for agents (one per agent)
+│ ├── directors/ ← creative-director, technical-director, producer, art-director
+│ ├── leads/ ← lead-programmer, narrative-director, audio-director, etc.
+│ ├── specialists/ ← engine/code/shader/UI specialists
+│ ├── godot/ ← Godot-specific specialists
+│ ├── unity/ ← Unity-specific specialists
+│ ├── unreal/ ← Unreal-specific specialists
+│ ├── operations/ ← QA, live-ops, release, localization, etc.
+│ └── creative/ ← writer, world-builder, game-designer, etc.
+│
+├── templates/ ← spec file templates for writing new specs
+│ ├── skill-test-spec.md ← template for skill behavioral specs
+│ └── agent-test-spec.md ← template for agent behavioral specs
+│
+└── results/ ← test run outputs (written by /skill-test spec, gitignored)
+```
+
+---
+
+## How to use it
+
+All testing is driven by two skills already in the framework:
+
+### Check structural compliance
+
+```
+/skill-test static [skill-name] # Check one skill (7 checks)
+/skill-test static all # Check all 72 skills
+```
+
+### Run a behavioral spec test
+
+```
+/skill-test spec gate-check # Evaluate a skill against its written spec
+/skill-test spec design-review
+```
+
+### Check against category rubric
+
+```
+/skill-test category gate-check # Evaluate one skill against its category metrics
+/skill-test category all # Run rubric checks across all categorized skills
+```
+
+### See full coverage picture
+
+```
+/skill-test audit # Skills + agents: has-spec, last tested, result
+```
+
+### Improve a failing skill
+
+```
+/skill-improve gate-check # Test → diagnose → propose fix → retest loop
+```
+
+---
+
+## Skill categories
+
+| Category | Skills | Key metrics |
+|----------|--------|-------------|
+| `gate` | gate-check | Review mode read, full/lean/solo director panel, no auto-advance |
+| `review` | design-review, architecture-review, review-all-gdds | Read-only, 8-section check, correct verdicts |
+| `authoring` | design-system, quick-design, art-bible, create-architecture, … | Section-by-section May-I-write, skeleton-first |
+| `readiness` | story-readiness, story-done | Blockers surfaced, director gate in full mode |
+| `pipeline` | create-epics, create-stories, dev-story, map-systems, … | Upstream dependency check, handoff path clear |
+| `analysis` | consistency-check, balance-check, code-review, tech-debt, … | Read-only report, verdict keyword, no writes |
+| `team` | team-combat, team-narrative, team-audio, … | All required agents spawned, blocked surfaced |
+| `sprint` | sprint-plan, sprint-status, milestone-review, … | Reads sprint data, status keywords present |
+| `utility` | start, adopt, hotfix, localize, setup-engine, … | Passes static checks |
+
+---
+
+## Agent tiers
+
+| Tier | Agents |
+|------|--------|
+| `directors` | creative-director, technical-director, producer, art-director |
+| `leads` | lead-programmer, narrative-director, audio-director, ux-designer, qa-lead, release-manager, localization-lead |
+| `specialists` | gameplay-programmer, engine-programmer, ui-programmer, tools-programmer, network-programmer, ml-engineer, ai-programmer, level-designer, sound-designer, technical-artist |
+| `godot` | godot-specialist, godot-gdscript-specialist, godot-csharp-specialist, godot-shader-specialist, godot-gdextension-specialist |
+| `unity` | unity-specialist, unity-ui-specialist, unity-shader-specialist, unity-dots-specialist, unity-addressables-specialist |
+| `unreal` | unreal-specialist, ue-gas-specialist, ue-replication-specialist, ue-umg-specialist, ue-blueprint-specialist |
+| `operations` | devops-engineer, deployment-engineer, database-admin, security-engineer, performance-analyst, analytics-engineer, community-manager |
+| `creative` | writer, world-builder, game-designer, economy-designer, systems-designer, prototyper |
+
+---
+
+## Updating the catalog
+
+`catalog.yaml` tracks test coverage for every skill and agent. After running a test:
+
+- `/skill-test spec [name]` will offer to update `last_spec` and `last_spec_result`
+- `/skill-test category [name]` will offer to update `last_category` and `last_category_result`
+- `last_static` and `last_static_result` are updated manually or via `/skill-improve`
+
+---
+
+## Writing a new spec
+
+1. Find the spec template at `templates/skill-test-spec.md`
+2. Copy it to `skills/[category]/[skill-name].md`
+3. Update the `spec:` field in `catalog.yaml` to point to the new file
+4. Run `/skill-test spec [skill-name]` to validate it
+
+---
+
+## Removing this framework
+
+This folder has no hooks into the main project. To remove:
+
+```bash
+rm -rf "CCGS Skill Testing Framework"
+```
+
+The skills `/skill-test` and `/skill-improve` will still function — they'll simply
+report that `catalog.yaml` is missing and suggest running `/skill-test audit` to
+initialize it.
diff --git a/CCGS Skill Testing Framework/agents/directors/art-director.md b/CCGS Skill Testing Framework/agents/directors/art-director.md
new file mode 100644
index 0000000..9218f3d
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/directors/art-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: art-director
+
+## Agent Summary
+**Domain owned:** Visual identity, art bible authorship and enforcement, asset quality standards, UI/UX visual design, visual phase gate, concept art evaluation.
+**Does NOT own:** UX interaction flows and information architecture (ux-designer's domain), audio direction (audio-director), code implementation.
+**Model tier:** Sonnet (note: despite the "director" title, art-director is assigned Sonnet per coordination-rules.md — it handles individual system analysis, not multi-document phase gate synthesis at the Opus level).
+**Gate IDs handled:** AD-CONCEPT-VISUAL, AD-ART-BIBLE, AD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/art-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references visual identity, art bible, asset standards — not generic)
+- [ ] `allowed-tools:` list is read-focused; image review capability if supported; no Bash unless asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` (NOT Opus — coordination-rules.md assigns Sonnet to art-director)
+- [ ] Agent definition does not claim authority over UX interaction flows or audio direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** The art bible's color palette section is submitted for review. The section defines a desaturated earth-tone primary palette with high-contrast accent colors tied to the game pillar "beauty in decay." The palette is internally consistent and references the pillar vocabulary. Request is tagged AD-ART-BIBLE.
+**Expected:** Returns `AD-ART-BIBLE: APPROVE` with rationale confirming the palette's internal consistency and its alignment with the stated pillar.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `AD-ART-BIBLE: APPROVE`
+- [ ] Rationale references the specific palette characteristics and pillar alignment — not generic art advice
+- [ ] Output stays within visual domain — does not comment on UX interaction patterns or audio mood
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Sound designer asks art-director to specify how ambient audio should layer and duck when the player enters a combat zone.
+**Expected:** Agent declines to define audio behavior and redirects to audio-director.
+**Assertions:**
+- [ ] Does not make any binding decision about audio layering or ducking behavior
+- [ ] Explicitly names `audio-director` as the correct handler
+- [ ] May note if the audio has visual mood implications (e.g., "the audio should match the visual tension of the zone"), but defers all audio specification to audio-director
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** Concept art for the protagonist is submitted. The art uses a vivid, saturated color palette (primary: #FF4500, #00BFFF) that directly contradicts the established art bible's "desaturated earth-tones" palette specification. Request is tagged AD-CONCEPT-VISUAL.
+**Expected:** Returns `AD-CONCEPT-VISUAL: CONCERNS` with specific citation of the palette discrepancy, referencing the art bible's stated palette values versus the submitted concept's palette.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `AD-CONCEPT-VISUAL: CONCERNS`
+- [ ] Rationale specifically identifies the palette conflict — not a generic "doesn't match style" comment
+- [ ] References the art bible as the authoritative source for the correct palette
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** ux-designer proposes using high-contrast, brightly colored icons for the HUD to improve readability. art-director believes this violates the art bible's muted visual language and would undermine the visual identity.
+**Expected:** art-director states the visual identity concern and references the art bible, acknowledges ux-designer's readability goal as legitimate, and escalates to creative-director to arbitrate the trade-off between visual coherence and usability.
+**Assertions:**
+- [ ] Escalates to `creative-director` (shared parent for creative domain conflicts)
+- [ ] Does not unilaterally override ux-designer's readability recommendation
+- [ ] Clearly frames the conflict as a trade-off between two legitimate goals
+- [ ] References the specific art bible rule being violated
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the existing art bible with specific palette values (primary: #8B7355, #6B6B47; accent: #C8A96E) and style rules ("no pure white, no pure black; all shadows have warm undertones"). A new asset is submitted for review.
+**Expected:** Assessment references the specific hex values and style rules from the provided art bible, not generic color theory advice. Any concerns are tied to specific violations of the provided rules.
+**Assertions:**
+- [ ] References specific palette values from the provided art bible context
+- [ ] Applies the specific style rules (no pure white/black, warm shadow undertones) from the provided document
+- [ ] Does not generate generic art direction feedback disconnected from the supplied art bible
+- [ ] Verdict rationale is traceable to specific lines or rules in the provided context
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared visual domain
+- [ ] Escalates UX-vs-visual conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `AD-ART-BIBLE: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding UX interaction, audio, or code implementation decisions
+
+---
+
+## Coverage Notes
+- AD-PHASE-GATE (full visual phase advancement) is not covered — deferred to integration with /gate-check skill.
+- Asset pipeline standards (file format, resolution, naming conventions) compliance checks are not covered here.
+- Shader visual output review is not covered — that interaction with the engine specialist is deferred.
+- UI component visual review (as distinct from UX flow review) could benefit from additional cases.
diff --git a/CCGS Skill Testing Framework/agents/directors/creative-director.md b/CCGS Skill Testing Framework/agents/directors/creative-director.md
new file mode 100644
index 0000000..bcd05af
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/directors/creative-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: creative-director
+
+## Agent Summary
+**Domain owned:** Creative vision, game pillars, GDD alignment, systems decomposition feedback, narrative direction, playtest feedback interpretation, phase gate (creative aspect).
+**Does NOT own:** Technical architecture or implementation details (delegates to technical-director), production scheduling (producer), visual art style execution (delegates to art-director).
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** CD-PILLARS, CD-GDD-ALIGN, CD-SYSTEMS, CD-NARRATIVE, CD-PLAYTEST, CD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/creative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references creative vision, pillars, GDD alignment — not generic)
+- [ ] `allowed-tools:` list is read-heavy; should not include Bash unless justified by a creative workflow need
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over technical architecture or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A game concept document is submitted for pillar review. The concept describes a narrative survival game built around three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." Request is tagged CD-PILLARS.
+**Expected:** Returns `CD-PILLARS: APPROVE` with rationale citing how each pillar is represented in the concept and any reinforcing or weakening signals found in the document.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `CD-PILLARS: APPROVE` (gate ID prefix, colon, verdict keyword)
+- [ ] Rationale references the three specific pillars by name, not generic creative advice
+- [ ] Output stays within creative scope — does not comment on engine feasibility or sprint schedule
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Developer asks creative-director to review a proposed PostgreSQL schema for storing player save data.
+**Expected:** Agent declines to evaluate the schema and redirects to technical-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the schema design
+- [ ] Explicitly names `technical-director` as the correct handler
+- [ ] May note whether the data model has creative implications (e.g., what player data is tracked), but defers structural decisions entirely
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A GDD for the "Crafting" system is submitted. Section 4 (Formulas) defines a resource decay formula that punishes exploration — contradicting the Player Fantasy section which calls for "freedom to roam without fear." Request is tagged CD-GDD-ALIGN.
+**Expected:** Returns `CD-GDD-ALIGN: CONCERNS` with specific citation of the contradiction between the formula behavior and the Player Fantasy statement.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `CD-GDD-ALIGN: CONCERNS`
+- [ ] Rationale quotes or directly references GDD Section 4 (Formulas) and the Player Fantasy section
+- [ ] Does not prescribe a specific formula fix — that belongs to systems-designer
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** technical-director raises a concern that the core loop mechanic (real-time branching conversations) is prohibitively expensive to implement and recommends cutting it. creative-director disagrees on creative grounds.
+**Expected:** creative-director acknowledges the technical constraint, does not override technical-director's feasibility assessment, but retains authority to define what the creative goal is. For the conflict itself, creative-director is the top-level creative escalation point and defers to technical-director on implementation feasibility while advocating for the design intent. The resolution path is for both to jointly present trade-off options to the user.
+**Assertions:**
+- [ ] Does not unilaterally override technical-director's feasibility concern
+- [ ] Clearly separates "what we want creatively" from "how it gets built"
+- [ ] Proposes presenting trade-offs to the user rather than resolving unilaterally
+- [ ] Does not claim to own implementation decisions
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game pillars document (`design/gdd/pillars.md`) and a new mechanic spec for review. The pillars document defines "player authorship," "consequence permanence," and "world responsiveness" as the three core pillars.
+**Expected:** Assessment uses the exact pillar vocabulary from the provided document, not generic creative heuristics. Any approval or concern is tied back to one or more of the three named pillars.
+**Assertions:**
+- [ ] Uses the exact pillar names from the provided context document
+- [ ] Does not generate generic creative feedback disconnected from the supplied pillars
+- [ ] References the specific pillar(s) most relevant to the mechanic under review
+- [ ] Does not reference pillars not present in the provided document
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared creative domain
+- [ ] Escalates conflicts by presenting trade-offs to user rather than unilateral override
+- [ ] Uses gate IDs in output (e.g., `CD-PILLARS: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding cross-domain decisions (technical, production, art execution)
+
+---
+
+## Coverage Notes
+- Multi-gate scenario (e.g., single submission triggering both CD-PILLARS and CD-GDD-ALIGN) is not covered here — deferred to integration tests.
+- CD-PHASE-GATE (full phase advancement) involves synthesizing multiple sub-gate results; this complex case is deferred.
+- Playtest report interpretation (CD-PLAYTEST) is not covered — a dedicated case should be added when the playtest-report skill produces structured output.
+- Interaction with art-director on visual-pillar alignment is not covered.
diff --git a/CCGS Skill Testing Framework/agents/directors/producer.md b/CCGS Skill Testing Framework/agents/directors/producer.md
new file mode 100644
index 0000000..9f584be
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/directors/producer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: producer
+
+## Agent Summary
+**Domain owned:** Scope management, sprint planning validation, milestone tracking, epic prioritization, production phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), technical architecture (technical-director), creative direction.
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** PR-SCOPE, PR-SPRINT, PR-MILESTONE, PR-EPIC, PR-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/producer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references scope, sprint, milestone, production — not generic)
+- [ ] `allowed-tools:` list is primarily read-focused; Bash only if sprint/milestone files require parsing
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over design decisions or technical architecture
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A sprint plan is submitted for Sprint 7. The plan includes 12 story points across 4 team members over 2 weeks. Historical velocity from the last 3 sprints averages 11.5 points. Request is tagged PR-SPRINT.
+**Expected:** Returns `PR-SPRINT: REALISTIC` with rationale noting the plan is within one standard deviation of historical velocity and capacity appears matched.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC
+- [ ] Verdict token is formatted as `PR-SPRINT: REALISTIC`
+- [ ] Rationale references the specific story point count and historical velocity figures
+- [ ] Output stays within production scope — does not comment on whether the stories are well-designed or technically sound
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks producer to evaluate whether the game's "weight-based inventory" mechanic feels fun and engaging.
+**Expected:** Agent declines to evaluate game feel and redirects to game-designer or creative-director.
+**Assertions:**
+- [ ] Does not make any binding assessment of the mechanic's design quality
+- [ ] Explicitly names `game-designer` or `creative-director` as the correct handler
+- [ ] May note if the mechanic's scope has production implications (e.g., dependencies on other systems), but defers all design evaluation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new feature proposal adds three new systems (crafting, weather, and faction reputation) to a milestone that was scoped for two systems only. None of these additions appear in the current milestone plan. Request is tagged PR-SCOPE.
+**Expected:** Returns `PR-SCOPE: CONCERNS` with specific identification of the three unplanned systems and their absence from the milestone scope document.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC — not freeform text
+- [ ] Verdict token is formatted as `PR-SCOPE: CONCERNS`
+- [ ] Rationale names the three specific systems being added out of scope
+- [ ] Does not evaluate whether the systems are good design — only whether they fit the plan
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a late-breaking mechanic (dynamic weather affecting all gameplay systems) that technical-director warns will require 3 additional sprints. game-designer and technical-director are in disagreement about whether to proceed.
+**Expected:** Producer does not take a side on whether the mechanic is worth adding (design decision) or feasible (technical decision). Producer quantifies the production impact (3 sprints of delay, milestone slip risk), presents the trade-off to the user, and follows coordination-rules.md conflict resolution: escalate to the shared parent (in this case, surface the conflict for user decision since creative-director and technical-director are both top-tier).
+**Assertions:**
+- [ ] Quantifies the production impact in concrete terms (sprint count, milestone date slip)
+- [ ] Does not make a binding design or technical decision
+- [ ] Surfaces the conflict to the user with the scope implications clearly stated
+- [ ] References coordination-rules.md conflict resolution protocol (escalate to shared parent or user)
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the current milestone deadline (8 weeks away) and velocity data from the last 4 sprints (8, 10, 9, 11 points). A sprint plan is submitted with 14 story points.
+**Expected:** Assessment uses the provided velocity data to project whether 14 points is achievable, and references the 8-week milestone window to assess whether the current sprint's scope leaves adequate buffer.
+**Assertions:**
+- [ ] Uses the specific velocity figures from the provided context (not generic estimates)
+- [ ] References the 8-week deadline in the capacity assessment
+- [ ] Calculates or estimates remaining sprint count within the milestone window
+- [ ] Does not give generic scope advice disconnected from the supplied deadline and velocity data
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using REALISTIC / CONCERNS / UNREALISTIC vocabulary only
+- [ ] Stays within declared production domain
+- [ ] Escalates design/technical conflicts by quantifying scope impact and presenting to user
+- [ ] Uses gate IDs in output (e.g., `PR-SPRINT: REALISTIC`) not inline prose verdicts
+- [ ] Does not make binding game design or technical architecture decisions
+
+---
+
+## Coverage Notes
+- PR-EPIC (epic-level prioritization) is not covered — a dedicated case should be added when the /create-epics skill produces structured epic documents.
+- PR-MILESTONE (milestone health review) is not covered — deferred to integration with /milestone-review skill.
+- PR-PHASE-GATE (full production phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-sprint burn-down and velocity trend analysis are not covered here.
diff --git a/CCGS Skill Testing Framework/agents/directors/technical-director.md b/CCGS Skill Testing Framework/agents/directors/technical-director.md
new file mode 100644
index 0000000..9ed25bd
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/directors/technical-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: technical-director
+
+## Agent Summary
+**Domain owned:** System architecture decisions, technical feasibility assessment, ADR oversight and approval, engine risk evaluation, technical phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), creative direction, visual art style, production scheduling (producer).
+**Model tier:** Opus (multi-document synthesis, high-stakes architecture and phase gate verdicts).
+**Gate IDs handled:** TD-SYSTEM-BOUNDARY, TD-FEASIBILITY, TD-ARCHITECTURE, TD-ADR, TD-ENGINE-RISK, TD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/technical-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references architecture, feasibility, ADR — not generic)
+- [ ] `allowed-tools:` list may include Read for architecture documents; Bash only if required for technical checks
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over game design decisions or creative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An architecture document for the "Combat System" is submitted. It describes a layered design: input layer → game logic layer → presentation layer, with clearly defined interfaces between each. Request is tagged TD-ARCHITECTURE.
+**Expected:** Returns `TD-ARCHITECTURE: APPROVE` with rationale confirming that system boundaries are correctly separated and interfaces are well-defined.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `TD-ARCHITECTURE: APPROVE`
+- [ ] Rationale specifically references the layered structure and interface definitions — not generic architecture advice
+- [ ] Output stays within technical scope — does not comment on whether the mechanic is fun or fits the creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Writer asks technical-director to review and approve the dialogue scripts for the game's opening cutscene.
+**Expected:** Agent declines to evaluate dialogue quality and redirects to narrative-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the dialogue content or structure
+- [ ] Explicitly names `narrative-director` as the correct handler
+- [ ] May note technical constraints that affect dialogue (e.g., localization string limits, data format), but defers all content decisions
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed multiplayer mechanic requires raycasting against all active entities every frame to detect line-of-sight. At expected player counts (1000 entities in a large zone), this is O(n²) per frame. Request is tagged TD-FEASIBILITY.
+**Expected:** Returns `TD-FEASIBILITY: CONCERNS` with specific citation of the O(n²) complexity and the entity count that makes this infeasible at target framerate.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `TD-FEASIBILITY: CONCERNS`
+- [ ] Rationale includes the specific algorithmic complexity concern and the entity count threshold
+- [ ] Suggests at least one alternative approach (e.g., spatial partitioning, interest management) without mandating which to choose
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a real-time physics simulation for every inventory item (hundreds of items on screen simultaneously). technical-director assesses this as technically expensive and proposes simplifying the simulation. game-designer disagrees, arguing it is essential to the game feel.
+**Expected:** technical-director clearly states the technical cost and constraints, proposes alternative implementation approaches that could achieve a similar feel, but explicitly defers the final design priority decision to creative-director as the arbiter of player experience trade-offs.
+**Assertions:**
+- [ ] Expresses the technical concern with specifics (e.g., performance budget, estimated cost)
+- [ ] Proposes at least one alternative that could reduce cost while preserving intent
+- [ ] Explicitly defers the "is this worth the cost" decision to creative-director — does not unilaterally cut the feature
+- [ ] Does not claim authority to override game-designer's design intent
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the target platform constraints: mobile, 60fps target, 2GB RAM ceiling, no compute shaders. A proposed architecture includes a GPU-driven rendering pipeline.
+**Expected:** Assessment references the specific hardware constraints from the context, identifies the compute shader dependency as incompatible with the stated platform constraints, and returns a CONCERNS or REJECT verdict with those specifics cited.
+**Assertions:**
+- [ ] References the specific platform constraints provided (mobile, 2GB RAM, no compute shaders)
+- [ ] Does not give generic performance advice disconnected from the supplied constraints
+- [ ] Correctly identifies the architectural component that conflicts with the platform constraint
+- [ ] Verdict includes rationale tied to the provided context, not boilerplate warnings
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared technical domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `TD-FEASIBILITY: CONCERNS`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- TD-ADR (Architecture Decision Record approval) is not covered — a dedicated case should be added when the /architecture-decision skill produces ADR documents.
+- TD-ENGINE-RISK assessment for specific engine versions (e.g., Godot 4.6 post-cutoff APIs) is not covered — deferred to engine-specialist integration tests.
+- TD-PHASE-GATE (full technical phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-domain architecture reviews (e.g., touching both TD-ARCHITECTURE and TD-ENGINE-RISK simultaneously) are not covered here.
diff --git a/CCGS Skill Testing Framework/agents/engine/godot/godot-csharp-specialist.md b/CCGS Skill Testing Framework/agents/engine/godot/godot-csharp-specialist.md
new file mode 100644
index 0000000..ecec4c2
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/godot/godot-csharp-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: godot-csharp-specialist
+
+## Agent Summary
+Domain: C# patterns in Godot 4, .NET idioms applied to Godot, [Export] attribute usage, signal delegates, and async/await patterns.
+Does NOT own: GDScript code (gdscript-specialist), GDExtension C/C++ bindings (gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references C# in Godot 4 / .NET patterns / signal delegates)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or GDExtension code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an export property for enemy health with validation that clamps it between 1 and 1000."
+**Expected behavior:**
+- Produces a C# property with `[Export]` attribute
+- Uses a backing field with a property getter/setter that clamps the value in the setter
+- Does NOT use a raw `[Export]` public field without validation
+- Follows Godot 4 C# naming conventions (PascalCase for properties, fields private with underscore prefix)
+- Includes XML doc comment on the property per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Rewrite this enemy health system in GDScript."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects the request to `godot-gdscript-specialist`
+- May note that the C# interface can be described so the gdscript-specialist knows the expected API shape
+
+### Case 3: Async signal awaiting
+**Input:** "Wait for an animation to finish before transitioning game state using C# async."
+**Expected behavior:**
+- Produces a proper `async Task` pattern using `ToSignal()` to await a Godot signal
+- Uses `await ToSignal(animationPlayer, AnimationPlayer.SignalName.AnimationFinished)`
+- Does NOT use `Thread.Sleep()` or `Task.Delay()` as a polling substitute
+- Notes that the calling method must be `async` and that fire-and-forget `async void` is only acceptable for event handlers
+- Handles cancellation or timeout if the animation could fail to fire
+
+### Case 4: Threading model conflict
+**Input:** "This C# code accesses a Godot Node from a background Task thread to update its position."
+**Expected behavior:**
+- Flags this as a race condition risk: Godot nodes are not thread-safe and must only be accessed from the main thread
+- Does NOT approve or implement the multi-threaded node access pattern
+- Provides the correct pattern: use `CallDeferred()`, `Callable.From().CallDeferred()`, or marshal back to the main thread via a thread-safe queue
+- Explains the distinction between Godot's main thread requirement and .NET's thread-agnostic types
+
+### Case 5: Context pass — Godot 4.6 API correctness
+**Input:** Engine version context: Godot 4.6. Request: "Connect a signal using the new typed signal delegate pattern."
+**Expected behavior:**
+- Produces C# signal connection using the typed delegate pattern introduced in Godot 4 C# (`+=` operator on typed signal)
+- Checks the 4.6 context to confirm no breaking changes to the signal delegate API in 4.4, 4.5, or 4.6
+- Does NOT use the old string-based `Connect("signal_name", callable)` pattern (deprecated in Godot 4 C#)
+- Produces code compatible with the project's pinned 4.6 version as documented in VERSION.md
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (C# in Godot 4 — patterns, exports, signals, async)
+- [ ] Redirects GDScript requests to godot-gdscript-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns C# code following Godot 4 conventions (not Unity MonoBehaviour patterns)
+- [ ] Flags multi-threaded Godot node access as unsafe and provides the correct pattern
+- [ ] Uses typed signal delegates — not deprecated string-based Connect() calls
+- [ ] Checks engine version reference for API changes before producing code
+
+---
+
+## Coverage Notes
+- Export property with validation (Case 1) should have a unit test verifying the clamp behavior
+- Threading conflict (Case 4) is safety-critical: the agent must identify and fix this without prompting
+- Async signal (Case 3) verifies the agent applies .NET idioms correctly within Godot's single-thread constraint
diff --git a/CCGS Skill Testing Framework/agents/engine/godot/godot-gdextension-specialist.md b/CCGS Skill Testing Framework/agents/engine/godot/godot-gdextension-specialist.md
new file mode 100644
index 0000000..b2292a9
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/godot/godot-gdextension-specialist.md
@@ -0,0 +1,86 @@
+# Agent Test Spec: godot-gdextension-specialist
+
+## Agent Summary
+Domain: GDExtension API, godot-cpp C++ bindings, godot-rust bindings, native library integration, and native performance optimization.
+Does NOT own: GDScript code (gdscript-specialist), shader code (godot-shader-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDExtension / godot-cpp / native bindings)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or shader authoring
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Expose a C++ rigid-body physics simulation library to GDScript via GDExtension."
+**Expected behavior:**
+- Produces a GDExtension binding pattern using godot-cpp:
+ - Class inheriting from `godot::Object` or an appropriate Godot base class
+ - `GDCLASS` macro registration
+ - `_bind_methods()` implementation exposing the physics API to GDScript
+ - `GDExtension` entry point (`gdextension_init`) setup
+- Notes the `.gdextension` manifest file format required
+- Does NOT produce the GDScript usage code (that belongs to gdscript-specialist)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Write the GDScript that calls the physics simulation from Case 1."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects to `godot-gdscript-specialist`
+- May describe the API surface the GDScript should call (method names, parameter types) as a handoff spec
+
+### Case 3: ABI compatibility risk — minor version update
+**Input:** "We're upgrading from Godot 4.5 to 4.6. Will our existing GDExtension still work?"
+**Expected behavior:**
+- Flags the ABI compatibility concern: GDExtension binaries may not be ABI-compatible across minor versions
+- Directs to check the 4.5→4.6 migration guide for GDExtension API changes
+- Recommends recompiling the extension against the 4.6 godot-cpp headers rather than assuming binary compatibility
+- Notes that the `.gdextension` manifest may need a `compatibility_minimum` version update
+- Provides the recompilation checklist
+
+### Case 4: Memory management — RAII for Godot objects
+**Input:** "How should we manage the lifecycle of Godot objects created inside C++ GDExtension code?"
+**Expected behavior:**
+- Produces the RAII-based lifecycle pattern for Godot objects in GDExtension:
+ - `Ref` for reference-counted objects (auto-released when Ref goes out of scope)
+ - `memnew()` / `memdelete()` for non-reference-counted objects
+ - Warning: do NOT use `new`/`delete` for Godot objects — undefined behavior
+- Notes object ownership rules: who is responsible for freeing a node added to the scene tree
+- Provides a concrete example managing a `CollisionShape3D` created in C++
+
+### Case 5: Context pass — Godot 4.6 GDExtension API check
+**Input:** Engine version context: Godot 4.6 (upgrading from 4.5). Request: "Check if any GDExtension APIs changed from 4.5 to 4.6."
+**Expected behavior:**
+- References the 4.5→4.6 migration guide from the VERSION.md verified sources list
+- Reports on any documented GDExtension API changes in the 4.6 release
+- If no breaking changes are documented for GDExtension in 4.6, states that explicitly with the caveat to verify against the official changelog
+- Flags the D3D12 default on Windows (4.6 change) as potentially relevant for GDExtension rendering code
+- Provides a checklist of what to verify after upgrading
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDExtension, godot-cpp, godot-rust, native bindings)
+- [ ] Redirects GDScript authoring to godot-gdscript-specialist
+- [ ] Redirects shader authoring to godot-shader-specialist
+- [ ] Returns structured output (binding patterns, RAII examples, ABI checklists)
+- [ ] Flags ABI compatibility risks on minor version upgrades — never assumes binary compatibility
+- [ ] Uses Godot-specific memory management (`memnew`/`memdelete`, `Ref`) not raw C++ new/delete
+- [ ] Checks engine version reference for GDExtension API changes before confirming compatibility
+
+---
+
+## Coverage Notes
+- Binding pattern (Case 1) should include a smoke test verifying the extension loads and the method is callable from GDScript
+- ABI risk (Case 3) is a critical escalation path — the agent must not approve shipping an unverified extension binary
+- Memory management (Case 4) verifies the agent applies Godot-specific patterns, not generic C++ RAII
diff --git a/CCGS Skill Testing Framework/agents/engine/godot/godot-gdscript-specialist.md b/CCGS Skill Testing Framework/agents/engine/godot/godot-gdscript-specialist.md
new file mode 100644
index 0000000..8ac935f
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/godot/godot-gdscript-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: godot-gdscript-specialist
+
+## Agent Summary
+Domain: GDScript static typing, design patterns in GDScript, signal architecture, coroutine/await patterns, and GDScript performance.
+Does NOT own: shader code (godot-shader-specialist), GDExtension bindings (godot-gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDScript / static typing / signals / coroutines)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over shader code or GDExtension
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review this GDScript file for type annotation coverage."
+**Expected behavior:**
+- Reads the provided GDScript file
+- Flags every variable, parameter, and return type that is missing a static type annotation
+- Produces a list of specific line-by-line findings: `var speed = 5.0` → `var speed: float = 5.0`
+- Notes the performance and tooling benefits of static typing in Godot 4
+- Does NOT rewrite the entire file unprompted — produces a findings list for the developer to apply
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Write a vertex shader to distort the mesh in world space."
+**Expected behavior:**
+- Does NOT produce shader code in GDScript or in Godot's shading language
+- Explicitly states that shader authoring belongs to `godot-shader-specialist`
+- Redirects the request to `godot-shader-specialist`
+- May note that the GDScript side (passing uniforms to a shader, setting shader parameters) is within its domain
+
+### Case 3: Async loading with coroutines
+**Input:** "Load a scene asynchronously and wait for it to finish before spawning it."
+**Expected behavior:**
+- Produces an `await` + `ResourceLoader.load_threaded_request` pattern for Godot 4
+- Uses static typing throughout (`var scene: PackedScene`)
+- Handles the completion check with `ResourceLoader.load_threaded_get_status()`
+- Notes error handling for failed loads
+- Does NOT use deprecated Godot 3 `yield()` syntax
+
+### Case 4: Performance issue — typed array recommendation
+**Input:** "The entity update loop is slow; it iterates an untyped Array of 1,000 nodes every frame."
+**Expected behavior:**
+- Identifies that an untyped `Array` foregoes compiler optimization in GDScript
+- Recommends converting to a typed array (`Array[Node]` or the specific type) to enable JIT hints
+- Notes that if this is still insufficient, escalates the hot path to C# migration recommendation
+- Produces the typed array refactor as the immediate fix
+- Does NOT recommend migrating the entire codebase to C# without profiling evidence
+
+### Case 5: Context pass — Godot 4.6 with post-cutoff features
+**Input:** Engine version context provided: Godot 4.6. Request: "Create an abstract base class for all enemy types using @abstract."
+**Expected behavior:**
+- Identifies `@abstract` as a Godot 4.5+ feature (post-cutoff)
+- Notes this in the output: feature introduced in 4.5, verified against VERSION.md migration notes
+- Produces the GDScript class using `@abstract` with correct syntax as documented in migration notes
+- Marks the output as requiring verification against the official 4.5 release notes due to post-cutoff status
+- Uses static typing for all method signatures in the abstract class
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDScript — typing, patterns, signals, coroutines, performance)
+- [ ] Redirects shader requests to godot-shader-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns structured GDScript output with full static typing
+- [ ] Uses Godot 4 API only — no deprecated Godot 3 patterns (yield, connect with strings, etc.)
+- [ ] Flags post-cutoff features (4.4, 4.5, 4.6) and marks them as requiring doc verification
+
+---
+
+## Coverage Notes
+- Type annotation review (Case 1) output is suitable as a code review checklist
+- Async loading (Case 3) should produce testable code verifiable with a unit test in `tests/unit/`
+- Post-cutoff @abstract (Case 5) confirms the agent flags version uncertainty rather than silently using unverified APIs
diff --git a/CCGS Skill Testing Framework/agents/engine/godot/godot-shader-specialist.md b/CCGS Skill Testing Framework/agents/engine/godot/godot-shader-specialist.md
new file mode 100644
index 0000000..7ac2df8
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/godot/godot-shader-specialist.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: godot-shader-specialist
+
+## Agent Summary
+Domain: Godot shading language (GLSL-derivative), visual shaders (VisualShader graph), material setup, particle shaders, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot shading language / materials / post-processing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative source for Godot shader API changes
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Write a dissolve effect shader for enemy death in Godot."
+**Expected behavior:**
+- Produces valid Godot shading language code (not HLSL, not GLSL directly)
+- Uses `shader_type spatial;` or `canvas_item` as appropriate
+- Defines `uniform float dissolve_amount : hint_range(0.0, 1.0);`
+- Samples a noise texture to determine per-pixel dissolve threshold
+- Uses `discard;` for pixels below the threshold
+- Optionally adds an edge glow using emission near the dissolve boundary
+- Code is syntactically correct for Godot's shading language
+
+### Case 2: HLSL redirect
+**Input:** "Write an HLSL compute shader for this dissolve effect."
+**Expected behavior:**
+- Does NOT produce HLSL code
+- Clearly states: "Godot does not use HLSL directly; it uses its own shading language (a GLSL derivative)"
+- Translates the HLSL intent to the equivalent Godot shader approach
+- Notes that RenderingDevice compute shaders are available in Godot 4 but are a low-level API and flags it appropriately if that was the intent
+
+### Case 3: Post-cutoff API change — texture sampling (Godot 4.4)
+**Input:** "Use `texture()` with a sampler2D to sample the noise texture in the shader."
+**Expected behavior:**
+- Checks the version reference: Godot 4.4 changed texture sampler type declarations
+- Flags the potential API change: `sampler2D` syntax and `texture()` call behavior may differ from pre-4.4
+- Provides the correct syntax for the project's pinned version (4.6) as documented in migration notes
+- Does NOT use pre-4.4 texture sampling syntax without flagging the version risk
+
+### Case 4: Fragment shader LOD strategy
+**Input:** "The fragment shader for the water surface has 8 texture samples and is causing GPU bottlenecks on mid-range hardware."
+**Expected behavior:**
+- Identifies the per-fragment texture sample count as the primary cost driver
+- Proposes an LOD strategy:
+ - Reduce sample count at distance (distance-based shader variant or LOD level)
+ - Pre-bake some texture combinations offline
+ - Use lower-resolution noise textures for distant samples
+- Provides the shader code modification implementing the LOD approach
+- Does NOT change gameplay behavior of the water system
+
+### Case 5: Context pass — Godot 4.6 glow rework
+**Input:** Engine version context: Godot 4.6. Request: "Add a bloom/glow post-processing effect to the scene."
+**Expected behavior:**
+- References the VERSION.md note: Godot 4.6 includes a glow rework
+- Produces glow configuration guidance using the 4.6 WorldEnvironment approach, not the pre-4.6 API
+- Explicitly notes which properties or parameters changed in the 4.6 glow rework
+- Flags any properties that the LLM's training data may have incorrect information about due to the post-cutoff timing
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot shading language, materials, VFX shaders, post-processing)
+- [ ] Redirects gameplay code requests to gameplay-programmer
+- [ ] Produces valid Godot shading language — never HLSL or raw GLSL without a Godot wrapper
+- [ ] Checks engine version reference for post-cutoff shader API changes (4.4 texture types, 4.6 glow rework)
+- [ ] Returns structured output (shader code with uniforms documented, LOD strategies with performance rationale)
+- [ ] Flags any post-cutoff API usage as requiring verification
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should be paired with a visual test screenshot in `production/qa/evidence/`
+- Texture API flag (Case 3) confirms the agent checks VERSION.md before using APIs that changed post-4.3
+- Glow rework (Case 5) is a Godot 4.6-specific test — verifies the agent applies the most recent migration notes
diff --git a/CCGS Skill Testing Framework/agents/engine/godot/godot-specialist.md b/CCGS Skill Testing Framework/agents/engine/godot/godot-specialist.md
new file mode 100644
index 0000000..bd3b868
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/godot/godot-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: godot-specialist
+
+## Agent Summary
+Domain: Godot-specific patterns, node/scene architecture, signals, resources, and GDScript vs C# vs GDExtension decisions.
+Does NOT own: actual code authoring in a specific language (delegates to language sub-specialists).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot architecture / node patterns / engine decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative API source
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "When should I use signals vs. direct method calls in Godot?"
+**Expected behavior:**
+- Produces a pattern decision guide with rationale:
+ - Signals: decoupled communication, parent-to-child ignorance, event-driven UI updates, one-to-many notification
+ - Direct calls: tightly-coupled systems where the caller needs a return value, or performance-critical hot paths
+- Provides concrete examples of each pattern in the project's context
+- Does NOT produce raw code for both patterns — refers to gdscript-specialist or csharp-specialist for implementation
+- Notes the "no upward signals" convention (child does not call parent methods directly — uses signals instead)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Write a MonoBehaviour that runs on Start() and subscribes to a UnityEvent."
+**Expected behavior:**
+- Does NOT produce Unity MonoBehaviour code
+- Clearly identifies that this is a Unity pattern, not a Godot pattern
+- Provides the Godot equivalent: a Node script using `_ready()` instead of `Start()`, and Godot signals instead of UnityEvent
+- Confirms the project is Godot-based and redirects the conceptual mapping
+
+### Case 3: Post-cutoff API risk
+**Input:** "Use the new Godot 4.5 @abstract annotation to define an abstract base class."
+**Expected behavior:**
+- Identifies that `@abstract` is a post-cutoff feature (introduced in Godot 4.5, after LLM knowledge cutoff)
+- Flags the version risk: LLM knowledge of this annotation may be incomplete or incorrect
+- Directs the user to verify against `docs/engine-reference/godot/VERSION.md` and the official 4.5 migration guide
+- Provides best-effort guidance based on the migration notes in the version reference while clearly marking it as unverified
+
+### Case 4: Language selection for a hot path
+**Input:** "The physics query loop runs every frame for 500 objects. Should we use GDScript or C# for this?"
+**Expected behavior:**
+- Provides a balanced analysis:
+ - GDScript: simpler, team familiar, but slower for tight loops
+ - C#: faster for CPU-intensive loops, requires .NET runtime, team needs C# knowledge
+- Does NOT make the final decision unilaterally
+- Defers the decision to `lead-programmer` with the analysis as input
+- Notes that GDExtension (C++) is a third option for extreme performance cases and recommends escalating if C# is insufficient
+
+### Case 5: Context pass — engine version 4.6
+**Input:** Engine version context provided: Godot 4.6, Jolt as default physics. Request: "Set up a RigidBody3D for the player character."
+**Expected behavior:**
+- Reads the 4.6 context and applies the Jolt-default knowledge (from VERSION.md migration notes)
+- Recommends RigidBody3D configuration choices that are Jolt-compatible (e.g., notes any GodotPhysics-specific settings that behave differently under Jolt)
+- References the 4.6 migration note about Jolt becoming default rather than relying on LLM training data alone
+- Flags any RigidBody3D properties that changed behavior between GodotPhysics and Jolt
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot architecture decisions, node/scene patterns, language selection)
+- [ ] Redirects language-specific implementation to godot-gdscript-specialist or godot-csharp-specialist
+- [ ] Returns structured findings (decision trees, pattern recommendations with rationale)
+- [ ] Treats `docs/engine-reference/godot/VERSION.md` as authoritative over LLM training data
+- [ ] Flags post-cutoff API usage (4.4, 4.5, 4.6) with verification requirements
+- [ ] Defers language-selection decisions to lead-programmer when trade-offs exist
+
+---
+
+## Coverage Notes
+- Signal vs. direct call guide (Case 1) should be written to `docs/architecture/` as a reusable pattern doc
+- Post-cutoff flag (Case 3) confirms the agent does not confidently use APIs it cannot verify
+- Engine version case (Case 5) verifies the agent applies migration notes from the version reference, not assumptions
diff --git a/CCGS Skill Testing Framework/agents/engine/unity/unity-addressables-specialist.md b/CCGS Skill Testing Framework/agents/engine/unity/unity-addressables-specialist.md
new file mode 100644
index 0000000..44ba34e
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unity/unity-addressables-specialist.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: unity-addressables-specialist
+
+## Agent Summary
+Domain: Addressable Asset System — groups, async loading/unloading, handle lifecycle management, memory budgeting, content catalogs, and remote content delivery.
+Does NOT own: rendering systems (engine-programmer), game logic that uses the loaded assets (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Addressables / asset loading / content catalogs / remote delivery)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over rendering systems or gameplay using the loaded assets
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Load a character texture asynchronously and release it when the character is destroyed."
+**Expected behavior:**
+- Produces the `Addressables.LoadAssetAsync()` call pattern
+- Stores the returned `AsyncOperationHandle` in the requesting object
+- On character destruction (`OnDestroy()`), calls `Addressables.Release(handle)` with the stored handle
+- Does NOT use `Resources.Load()` as the loading mechanism
+- Notes that releasing with a null or uninitialized handle causes errors — includes a validity check
+- Notes the difference between releasing the handle vs. releasing the asset (handle release is correct)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the rendering system that applies the loaded texture to the character mesh."
+**Expected behavior:**
+- Does NOT produce rendering or mesh material assignment code
+- Explicitly states that rendering system implementation belongs to `engine-programmer`
+- Redirects the request to `engine-programmer`
+- May describe the asset type and API surface it will provide (e.g., `Texture2D` reference once the handle completes) as a handoff spec
+
+### Case 3: Memory leak — un-released handle
+**Input:** "Memory usage keeps climbing after each level load. We use Addressables to load level assets."
+**Expected behavior:**
+- Diagnoses the likely cause: `AsyncOperationHandle` objects not being released after use
+- Identifies the handle leak pattern: loading assets into a local variable, losing reference, never calling `Addressables.Release()`
+- Produces an auditing approach: search for all `LoadAssetAsync` / `LoadSceneAsync` calls and verify matching `Release()` calls
+- Provides a corrected pattern using a tracked handle list (`List`) with a `ReleaseAll()` cleanup method
+- Does NOT assume the leak is elsewhere without evidence
+
+### Case 4: Remote content delivery — catalog versioning
+**Input:** "We need to support downloadable content updates without requiring a full app re-install."
+**Expected behavior:**
+- Produces the remote catalog update pattern:
+ - `Addressables.CheckForCatalogUpdates()` on startup
+ - `Addressables.UpdateCatalogs()` for detected updates
+ - `Addressables.DownloadDependenciesAsync()` to pre-warm the updated content
+- Notes catalog hash checking for change detection
+- Addresses the edge case: what happens if a player starts a session, the catalog updates mid-session — defines behavior (complete current session on old catalog, reload on next launch)
+- Does NOT design the server-side CDN infrastructure (defers to devops-engineer)
+
+### Case 5: Context pass — platform memory constraints
+**Input:** Platform context: Nintendo Switch target, 4GB RAM, practical asset memory ceiling 512MB. Request: "Design the Addressables loading strategy for a large open-world level."
+**Expected behavior:**
+- References the 512MB memory ceiling from the provided context
+- Designs a streaming strategy:
+ - Divide the world into addressable zones loaded/unloaded based on player proximity
+ - Defines a memory budget per active zone (e.g., 128MB, max 4 zones active)
+ - Specifies async pre-load trigger distance and unload distance (hysteresis)
+- Notes Switch-specific constraints: slower load times from SD card, recommend pre-warming adjacent zones
+- Does NOT produce a loading strategy that would exceed the stated 512MB ceiling without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Addressables loading, handle lifecycle, memory, catalogs, remote delivery)
+- [ ] Redirects rendering and gameplay asset-use code to engine-programmer and gameplay-programmer
+- [ ] Returns structured output (loading patterns, handle lifecycle code, streaming zone designs)
+- [ ] Always pairs `LoadAssetAsync` with a corresponding `Release()` — flags handle leaks as a memory bug
+- [ ] Designs loading strategies against provided memory ceilings
+- [ ] Does not design CDN/server infrastructure — defers to devops-engineer for server side
+
+---
+
+## Coverage Notes
+- Handle lifecycle (Case 1) must include a test verifying memory is reclaimed after release
+- Handle leak diagnosis (Case 3) should produce a findings report suitable for a bug ticket
+- Platform memory case (Case 5) verifies the agent applies hard constraints from context, not default assumptions
diff --git a/CCGS Skill Testing Framework/agents/engine/unity/unity-dots-specialist.md b/CCGS Skill Testing Framework/agents/engine/unity/unity-dots-specialist.md
new file mode 100644
index 0000000..006328f
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unity/unity-dots-specialist.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: unity-dots-specialist
+
+## Agent Summary
+Domain: ECS architecture (IComponentData, ISystem, SystemAPI), Jobs system (IJob, IJobEntity, Burst), Burst compiler constraints, DOTS gameplay systems, and hybrid renderer.
+Does NOT own: MonoBehaviour gameplay code (gameplay-programmer), UI implementation (unity-ui-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references ECS / Jobs / Burst / IComponentData)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over MonoBehaviour gameplay or UI systems
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Convert the player movement system to ECS."
+**Expected behavior:**
+- Produces:
+ - `PlayerMovementData : IComponentData` struct with velocity, speed, and input vector fields
+ - `PlayerMovementSystem : ISystem` with `OnUpdate()` using `SystemAPI.Query<>` or `IJobEntity`
+ - Bakes the player's initial state from an authoring MonoBehaviour via `IBaker`
+- Uses `RefRW` for position updates (not deprecated `Translation`)
+- Marks the job `[BurstCompile]` and notes what must be unmanaged for Burst compatibility
+- Does NOT modify the input polling system — reads from an existing `PlayerInputData` component
+
+### Case 2: MonoBehaviour push-back
+**Input:** "Just use MonoBehaviour for the player movement — it's simpler."
+**Expected behavior:**
+- Acknowledges the simplicity argument
+- Explains the DOTS trade-off: more setup upfront, but the ECS/Burst approach provides the performance characteristics documented in the project's ADR or requirements
+- Does NOT implement a MonoBehaviour version if the project has committed to DOTS
+- If no commitment exists, flags the architecture decision to `lead-programmer` / `technical-director` for resolution
+- Does not make the MonoBehaviour vs. DOTS decision unilaterally
+
+### Case 3: Burst-incompatible managed memory
+**Input:** "This Burst job accesses a `List` to find the nearest enemy."
+**Expected behavior:**
+- Flags `List` as a managed type that is incompatible with Burst compilation
+- Does NOT approve the Burst job with managed memory access
+- Provides the correct replacement: `NativeArray`, `NativeList`, or `NativeHashMap<>` depending on the use case
+- Notes that `NativeArray` must be disposed explicitly or via `[DeallocateOnJobCompletion]`
+- Produces the corrected job using unmanaged native containers
+
+### Case 4: Hybrid access — DOTS system needs MonoBehaviour data
+**Input:** "The DOTS movement system needs to read the camera transform managed by a MonoBehaviour CameraController."
+**Expected behavior:**
+- Identifies this as a hybrid access scenario
+- Provides the correct hybrid pattern: store the camera transform in a singleton `IComponentData` (updated from the MonoBehaviour side each frame via `EntityManager.SetComponentData`)
+- Alternatively suggests the `CompanionComponent` / managed component approach
+- Does NOT access the MonoBehaviour from inside a Burst job — flags that as unsafe
+- Provides the bridge code on both the MonoBehaviour side (writing to ECS) and the DOTS system side (reading from ECS)
+
+### Case 5: Context pass — performance targets
+**Input:** Technical preferences from context: 60fps target, max 2ms CPU script budget per frame. Request: "Design the ECS chunk layout for 10,000 enemy entities."
+**Expected behavior:**
+- References the 2ms CPU budget explicitly in the design rationale
+- Designs the `IComponentData` chunk layout for cache efficiency:
+ - Groups frequently-queried together components in the same archetype
+ - Separates rarely-used data into separate components to keep hot data compact
+ - Estimates entity iteration time against the 2ms budget
+- Provides memory layout analysis (bytes per entity, entities per chunk at 16KB chunk size)
+- Does NOT design a layout that will obviously exceed the stated 2ms budget without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (ECS, Jobs, Burst, DOTS gameplay systems)
+- [ ] Redirects MonoBehaviour-only gameplay to gameplay-programmer
+- [ ] Returns structured output (IComponentData structs, ISystem implementations, IBaker authoring classes)
+- [ ] Flags managed memory access in Burst jobs as a compile error and provides unmanaged alternatives
+- [ ] Provides hybrid access patterns when DOTS systems need to interact with MonoBehaviour systems
+- [ ] Designs chunk layouts against provided performance budgets
+
+---
+
+## Coverage Notes
+- ECS conversion (Case 1) must include a unit test using the ECS test framework (`World`, `EntityManager`)
+- Burst incompatibility (Case 3) is safety-critical — the agent must catch this before the code is written
+- Chunk layout (Case 5) verifies the agent applies quantitative performance reasoning to architecture decisions
diff --git a/CCGS Skill Testing Framework/agents/engine/unity/unity-shader-specialist.md b/CCGS Skill Testing Framework/agents/engine/unity/unity-shader-specialist.md
new file mode 100644
index 0000000..2032c8b
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unity/unity-shader-specialist.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: unity-shader-specialist
+
+## Agent Summary
+Domain: Unity Shader Graph, custom HLSL, VFX Graph, URP/HDRP pipeline customization, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Shader Graph / HLSL / VFX Graph / URP / HDRP)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay code or art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an outline effect for characters using Shader Graph in URP."
+**Expected behavior:**
+- Produces a Shader Graph node setup description:
+ - Inverted hull method: Scale Normal → Vertex offset in vertex stage, Cull Front
+ - OR screen-space post-process outline using depth/normal edge detection
+- Recommends the appropriate method based on URP capabilities (inverted hull for URP compatibility, post-process for HDRP)
+- Notes URP limitations: no geometry shader support (rules out geometry-shader outline approach)
+- Does NOT produce HDRP-specific nodes without confirming the render pipeline
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the character health bar UI in code."
+**Expected behavior:**
+- Does NOT produce UI implementation code
+- Explicitly states that UI implementation belongs to `ui-programmer` (or `unity-ui-specialist`)
+- Redirects the request appropriately
+- May note that a shader-based fill effect for a health bar (e.g., a dissolve/fill gradient) is within its domain if the visual effect itself is shader-driven
+
+### Case 3: HDRP custom pass for outline
+**Input:** "We're on HDRP and want the outline as a post-process effect."
+**Expected behavior:**
+- Produces the HDRP `CustomPassVolume` pattern:
+ - C# class inheriting `CustomPass`
+ - `Execute()` method using `CoreUtils.SetRenderTarget()` and a full-screen shader blit
+ - Depth/normal buffer sampling for edge detection
+- Notes that CustomPass requires HDRP package and does not work in URP
+- Confirms the project is on HDRP before providing HDRP-specific code
+
+### Case 4: VFX Graph performance — GPU event batching
+**Input:** "The explosion VFX Graph has 10,000 particles per event and spawning 20 simultaneous explosions is causing GPU frame spikes."
+**Expected behavior:**
+- Identifies GPU particle spawn as the cost driver (200,000 simultaneous particles)
+- Proposes GPU event batching: spawn events deferred over multiple frames, stagger initialization
+- Recommends a particle budget cap per active explosion (e.g., 3,000 per explosion, queue excess)
+- Notes the VFX Graph Event Batcher pattern and Output Event API for cross-frame distribution
+- Does NOT change the gameplay event system — proposes a VFX-side budgeting solution
+
+### Case 5: Context pass — render pipeline (URP or HDRP)
+**Input:** Project context: URP render pipeline, Unity 2022.3. Request: "Add depth of field post-processing."
+**Expected behavior:**
+- Uses URP Volume framework: `DepthOfField` Volume Override component
+- Does NOT use HDRP Volume components (e.g., HDRP's `DepthOfField` with different parameter names)
+- Notes URP-specific DOF limitations vs HDRP (e.g., Bokeh quality differences)
+- Produces C# Volume profile setup code compatible with Unity 2022.3 URP package version
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Shader Graph, HLSL, VFX Graph, URP/HDRP customization)
+- [ ] Redirects gameplay and UI code to appropriate agents
+- [ ] Returns structured output (node graph descriptions, HLSL code, CustomPass patterns)
+- [ ] Distinguishes between URP and HDRP approaches — never cross-contaminates pipeline-specific APIs
+- [ ] Flags geometry shader approaches as URP-incompatible when relevant
+- [ ] Produces VFX optimizations that do not change gameplay behavior
+
+---
+
+## Coverage Notes
+- Outline effect (Case 1) should be paired with a visual screenshot test in `production/qa/evidence/`
+- HDRP CustomPass (Case 3) confirms the agent produces the correct Unity pattern, not a generic post-process approach
+- Pipeline separation (Case 5) verifies the agent never assumes the render pipeline without context
diff --git a/CCGS Skill Testing Framework/agents/engine/unity/unity-specialist.md b/CCGS Skill Testing Framework/agents/engine/unity/unity-specialist.md
new file mode 100644
index 0000000..6ad83ea
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unity/unity-specialist.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: unity-specialist
+
+## Agent Summary
+Domain: Unity-specific architecture patterns, MonoBehaviour vs DOTS decisions, and subsystem selection (Addressables, New Input System, UI Toolkit, Cinemachine, etc.).
+Does NOT own: language-specific deep dives (delegates to unity-dots-specialist, unity-ui-specialist, etc.).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unity patterns / MonoBehaviour / subsystem decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition acknowledges the sub-specialist routing table (DOTS, UI, Shader, Addressables)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Should I use MonoBehaviour or ScriptableObject for storing enemy configuration data?"
+**Expected behavior:**
+- Produces a pattern decision tree covering:
+ - MonoBehaviour: for runtime behavior, needs to be attached to a GameObject, has Update() lifecycle
+ - ScriptableObject: for pure data/configuration, exists as an asset, shared across instances, no scene dependency
+- Recommends ScriptableObject for enemy configuration data (stateless, reusable, designer-friendly)
+- Notes that MonoBehaviour can reference the ScriptableObject for runtime use
+- Provides a concrete example of what the ScriptableObject class definition looks like (does not produce full code — refers to engine-programmer or gameplay-programmer for implementation)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Set up a Node scene tree with signals for this enemy system."
+**Expected behavior:**
+- Does NOT produce Godot Node/signal code
+- Identifies this as a Godot pattern
+- States that in Unity the equivalent is GameObject hierarchy + UnityEvent or C# events
+- Maps the concepts: Godot Node → Unity MonoBehaviour, Godot Signal → C# event / UnityEvent
+- Confirms the project is Unity-based before proceeding
+
+### Case 3: Unity version API flag
+**Input:** "Use the new Unity 6 GPU resident drawer for batch rendering."
+**Expected behavior:**
+- Identifies the Unity 6 feature (GPU Resident Drawer)
+- Flags that this API may not be available in earlier Unity versions
+- Asks for or checks the project's Unity version before providing implementation guidance
+- Directs to verify against official Unity 6 documentation
+- Does NOT assume the project is on Unity 6 without confirmation
+
+### Case 4: DOTS vs. MonoBehaviour conflict
+**Input:** "The combat system uses MonoBehaviour for state management, but we want to add a DOTS-based projectile system. Can they coexist?"
+**Expected behavior:**
+- Recognizes this as a hybrid architecture scenario
+- Explains the hybrid approach: MonoBehaviour can interface with DOTS via SystemAPI, IComponentData, and managed components
+- Notes the performance and complexity trade-offs of mixing the two patterns
+- Recommends escalating the architecture decision to `lead-programmer` or `technical-director`
+- Defers to `unity-dots-specialist` for the DOTS-side implementation details
+
+### Case 5: Context pass — Unity version
+**Input:** Project context provided: Unity 2023.3 LTS. Request: "Configure the new Input System for this project."
+**Expected behavior:**
+- Applies Unity 2023.3 LTS context: uses the New Input System (com.unity.inputsystem) package
+- Does NOT produce legacy Input Manager code (`Input.GetKeyDown()`, `Input.GetAxis()`)
+- Notes any 2023.3-specific Input System behaviors or package version constraints
+- References the project version to confirm Burst/Jobs compatibility if the Input System interacts with DOTS
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unity architecture decisions, pattern selection, subsystem routing)
+- [ ] Redirects Godot patterns to appropriate Godot specialists or flags them as wrong-engine
+- [ ] Redirects DOTS implementation to unity-dots-specialist
+- [ ] Redirects UI implementation to unity-ui-specialist
+- [ ] Flags Unity version-gated APIs and requires version confirmation before suggesting them
+- [ ] Returns structured pattern decision guides, not freeform opinions
+
+---
+
+## Coverage Notes
+- MonoBehaviour vs. ScriptableObject (Case 1) should be documented as an ADR if it results in a project-level decision
+- Version flag (Case 3) confirms the agent does not assume the latest Unity version without context
+- DOTS hybrid (Case 4) verifies the agent escalates architecture conflicts rather than resolving them unilaterally
diff --git a/CCGS Skill Testing Framework/agents/engine/unity/unity-ui-specialist.md b/CCGS Skill Testing Framework/agents/engine/unity/unity-ui-specialist.md
new file mode 100644
index 0000000..a532441
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unity/unity-ui-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: unity-ui-specialist
+
+## Agent Summary
+Domain: Unity UI Toolkit (UXML/USS), UGUI (Canvas), data binding, runtime UI performance, and UI input event handling.
+Does NOT own: UX flow design (ux-designer), visual art style (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UI Toolkit / UGUI / Canvas / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement an inventory UI screen using Unity UI Toolkit."
+**Expected behavior:**
+- Produces a UXML document defining the inventory panel structure (ListView, item templates, detail panel)
+- Produces USS styles for the inventory layout and item states (default, hover, selected)
+- Provides C# code binding the inventory data model to the UI via `INotifyValueChanged` or `IBindable`
+- Uses `ListView` with `makeItem` / `bindItem` callbacks for the scrollable item list
+- Does NOT produce the UX flow design — implements from a provided spec
+
+### Case 2: Out-of-domain redirect
+**Input:** "Design the UX flow for the inventory — what happens when the player equips vs. drops an item."
+**Expected behavior:**
+- Does NOT produce UX flow design
+- Explicitly states that interaction flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes it will implement whatever flow the ux-designer specifies
+
+### Case 3: UI Toolkit data binding for dynamic list
+**Input:** "The inventory list needs to update in real time as items are added or removed from the player's bag."
+**Expected behavior:**
+- Produces the `ListView` pattern with a bound `ObservableList` or event-driven refresh approach
+- Uses `ListView.Rebuild()` or `ListView.RefreshItems()` on the backing collection change event
+- Notes the performance considerations for large lists (virtualization via `makeItem`/`bindItem` pattern)
+- Does NOT use `QuerySelector` loops to update individual elements as a list refresh strategy — flags that as a performance antipattern
+
+### Case 4: Canvas performance — overdraw
+**Input:** "The main menu canvas is causing GPU overdraw warnings; there are many overlapping panels."
+**Expected behavior:**
+- Identifies overdraw causes: multiple stacked canvases, full-screen overlay panels not culled when inactive
+- Recommends:
+ - Separate canvases for world-space, screen-space-overlay, and screen-space-camera layers
+ - Disable/deactivate panels instead of setting alpha to 0 (invisible alpha-0 panels still draw)
+ - Canvas Group + alpha for fade effects, not individual Image alpha
+- Notes UI Toolkit alternative if the project is in a migration position
+
+### Case 5: Context pass — Unity version
+**Input:** Project context: Unity 2022.3 LTS. Request: "Implement the settings panel with data binding."
+**Expected behavior:**
+- Uses UI Toolkit with the 2022.3 LTS version of the runtime binding system
+- Notes that Unity 2022.3 introduced runtime data binding (as opposed to editor-only binding in earlier versions)
+- Does NOT use the Unity 6 enhanced binding API features if they are not available in 2022.3
+- Produces code compatible with the stated Unity version, with version-specific API notes
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UI Toolkit, UGUI, data binding, UI performance)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Returns structured output (UXML, USS, C# binding code)
+- [ ] Uses the correct Unity UI framework version for the project's Unity version
+- [ ] Flags Canvas overdraw as a performance antipattern and provides specific remediation
+- [ ] Does not use alpha-0 as a hide/show pattern — uses SetActive() or VisualElement.style.display
+
+---
+
+## Coverage Notes
+- Inventory UI (Case 1) should have a manual walkthrough doc in `production/qa/evidence/`
+- Dynamic list binding (Case 3) should have an integration test or automated interaction test
+- Canvas overdraw (Case 4) verifies the agent knows the correct Unity UI performance patterns
diff --git a/CCGS Skill Testing Framework/agents/engine/unreal/ue-blueprint-specialist.md b/CCGS Skill Testing Framework/agents/engine/unreal/ue-blueprint-specialist.md
new file mode 100644
index 0000000..9b34d23
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unreal/ue-blueprint-specialist.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: ue-blueprint-specialist
+
+## Agent Summary
+- **Domain**: Blueprint architecture, the Blueprint/C++ boundary, Blueprint graph quality, Blueprint performance optimization, Blueprint Function Library design
+- **Does NOT own**: C++ implementation (engine-programmer or gameplay-programmer), art assets or shaders, UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers to unreal-specialist or lead-programmer for cross-domain rulings
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Blueprint architecture and optimization)
+- [ ] `allowed-tools:` list matches the agent's role (Read for Blueprint project files; no server or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over C++ implementation decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint graph performance review
+**Input**: "Review our AI behavior Blueprint. It has tick-based logic running every frame that checks line-of-sight for 30 NPCs simultaneously."
+**Expected behavior**:
+- Identifies tick-heavy logic as a performance problem
+- Recommends switching from EventTick to event-driven patterns (perception system events, timers, or polling on a reduced interval)
+- Flags the per-NPC cost of simultaneous line-of-sight checks
+- Suggests alternatives: AIPerception component events, staggered tick groups, or moving the system to C++ if Blueprint overhead is measured to be significant
+- Output is structured: problem identified, impact estimated, alternatives listed
+
+### Case 2: Out-of-domain request — C++ implementation
+**Input**: "Write the C++ implementation for this ability cooldown system."
+**Expected behavior**:
+- Does not produce C++ implementation code
+- Provides the Blueprint equivalent of the cooldown logic (e.g., using a Timeline or GameplayEffect if GAS is in use)
+- States clearly: "C++ implementation is handled by engine-programmer or gameplay-programmer; I can show the Blueprint approach or describe the boundary where Blueprint calls into C++"
+- Optionally notes when the cooldown complexity warrants a C++ backend
+
+### Case 3: Domain boundary — unsafe raw pointer access in Blueprint
+**Input**: "Our Blueprint calls GetOwner() and then immediately accesses a component on the result without checking if it's valid."
+**Expected behavior**:
+- Flags this as a runtime crash risk: GetOwner() can return null in some lifecycle states
+- Provides the correct Blueprint pattern: IsValid() node before any property/component access
+- Notes that Blueprint's null checks are not optional on Actor-derived references
+- Does NOT silently fix the code without explaining why the original was unsafe
+
+### Case 4: Blueprint graph complexity — readiness for Function Library refactor
+**Input**: "Our main GameMode Blueprint has 600+ nodes in a single graph with duplicated damage calculation logic in 8 places."
+**Expected behavior**:
+- Diagnoses this as a maintainability and testability problem
+- Recommends extracting duplicated logic into a Blueprint Function Library (BFL)
+- Describes how to structure the BFL: pure functions for calculations, static calls from any Blueprint
+- Notes that if the damage logic is performance-sensitive or shared with C++, it may be a candidate for migration to unreal-specialist review
+- Output is a concrete refactor plan, not a vague recommendation
+
+### Case 5: Context pass — Blueprint complexity budget
+**Input context**: Project conventions specify a maximum of 100 nodes per Blueprint event graph before a mandatory Function Library extraction.
+**Input**: "Here is our inventory Blueprint graph [150 nodes shown]. Is it ready to ship?"
+**Expected behavior**:
+- References the stated 150-node count against the 100-node budget from project conventions
+- Flags the graph as exceeding the complexity threshold
+- Does NOT approve it as-is
+- Produces a list of candidate subgraphs for Function Library extraction to bring the main graph within budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Blueprint architecture, performance, graph quality)
+- [ ] Redirects C++ implementation requests to engine-programmer or gameplay-programmer
+- [ ] Returns structured findings (problem/impact/alternatives format) rather than freeform opinions
+- [ ] Enforces Blueprint safety patterns (null checks, IsValid) proactively
+- [ ] References project conventions when evaluating graph complexity
+
+---
+
+## Coverage Notes
+- Case 3 (null pointer safety) is a safety-critical test — this is a common source of shipping crashes
+- Case 5 requires that project conventions include a stated node budget; if none is configured, the agent should note the absence and recommend setting one
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/engine/unreal/ue-gas-specialist.md b/CCGS Skill Testing Framework/agents/engine/unreal/ue-gas-specialist.md
new file mode 100644
index 0000000..5969645
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unreal/ue-gas-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: ue-gas-specialist
+
+## Agent Summary
+- **Domain**: Gameplay Ability System (GAS) — abilities (UGameplayAbility), gameplay effects (UGameplayEffect), attribute sets (UAttributeSet), gameplay tags, ability tasks (UAbilityTask), ability specs (FGameplayAbilitySpec), GAS prediction and latency compensation
+- **Does NOT own**: UI display of ability state (ue-umg-specialist), net replication of GAS data beyond built-in GAS prediction (ue-replication-specialist), art or VFX for ability feedback (vfx-artist)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers cross-domain calls to the appropriate specialist
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GAS, abilities, GameplayEffects, AttributeSets)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for GAS source files; no deployment or server tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI implementation or low-level net serialization
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — dash ability with cooldown
+**Input**: "Implement a dash ability that moves the player forward 500 units and has a 1.5 second cooldown."
+**Expected behavior**:
+- Produces a GAS AbilitySpec structure or outline: UGameplayAbility subclass with ActivateAbility logic, an AbilityTask for movement (e.g., AbilityTask_ApplyRootMotionMoveToForce or custom root motion), and a UGameplayEffect for the cooldown
+- Cooldown GameplayEffect uses Duration policy with the 1.5s duration and a GameplayTag to block re-activation
+- Tags clearly named following a hierarchy convention (e.g., Ability.Dash, Cooldown.Ability.Dash)
+- Output includes both the ability class outline and the GameplayEffect definition
+
+### Case 2: Out-of-domain request — GAS state replication
+**Input**: "How do I replicate the player's ability cooldown state to all clients so the UI updates correctly?"
+**Expected behavior**:
+- Clarifies that GAS has built-in replication for AbilitySpecs and GameplayEffects via the AbilitySystemComponent's replication mode
+- Explains the three ASC replication modes (Full, Mixed, Minimal) and when to use each
+- For custom replication needs beyond GAS built-ins, explicitly states: "For custom net serialization of GAS data, coordinate with ue-replication-specialist"
+- Does NOT attempt to write custom replication code outside GAS's own systems without flagging the domain boundary
+
+### Case 3: Domain boundary — incorrect GameplayTag hierarchy
+**Input**: "We have an ability that applies a tag called 'Stunned' and another that checks for 'Status.Stunned'. They're not matching."
+**Expected behavior**:
+- Identifies the root cause: tag names must be exact or use hierarchical matching via TagContainer queries
+- Flags the naming inconsistency: 'Stunned' is a root-level tag; 'Status.Stunned' is a child tag under 'Status' — these are different tags
+- Recommends a project tag naming convention: all status effects under Status.*, all abilities under Ability.*
+- Provides the fix: either rename the applied tag to 'Status.Stunned' or update the query to match 'Stunned'
+- Notes where tag definitions should live (DefaultGameplayTags.ini or a DataTable)
+
+### Case 4: Conflict — attribute set conflict between two abilities
+**Input**: "Our Shield ability and our Armor ability both modify a 'DefenseValue' attribute. They're stacking in ways that aren't intended — after both are active, defense goes well above maximum."
+**Expected behavior**:
+- Identifies this as a GameplayEffect stacking and magnitude calculation problem
+- Proposes a resolution using Execution Calculations (UGameplayEffectExecutionCalculation) or Modifier Aggregators to cap the combined result
+- Alternatively recommends using Gameplay Effect Stacking policies (Aggregate, None) to prevent unintended additive stacking
+- Produces a concrete resolution: either an Execution Calculation class outline or a change to the Modifier Op (Override instead of Additive for the cap)
+- Does NOT propose removing one of the abilities as the solution
+
+### Case 5: Context pass — designing against an existing attribute set
+**Input context**: Project has an existing AttributeSet with attributes: Health, MaxHealth, Stamina, MaxStamina, Defense, AttackPower.
+**Input**: "Design a Berserker ability that increases AttackPower by 50% when Health drops below 30%."
+**Expected behavior**:
+- Uses the existing Health, MaxHealth, and AttackPower attributes — does NOT invent new attributes
+- Designs a Passive GameplayAbility (or triggered Effect) that fires on Health change, checks Health/MaxHealth ratio via a GameplayEffectExecutionCalculation or Attribute-Based magnitude
+- Uses a Gameplay Cue or Gameplay Tag to track the Berserker active state
+- References the actual attribute names from the provided AttributeSet (AttackPower, not "Damage" or "Strength")
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GAS: abilities, effects, attributes, tags, ability tasks)
+- [ ] Redirects custom replication requests to ue-replication-specialist with clear explanation of boundary
+- [ ] Returns structured findings (ability outline + GameplayEffect definition) rather than vague descriptions
+- [ ] Enforces tag hierarchy naming conventions proactively
+- [ ] Uses only attributes and tags present in the provided context; does not invent new ones without noting it
+
+---
+
+## Coverage Notes
+- Case 3 (tag hierarchy) is a frequent source of subtle bugs; test whenever tag naming conventions change
+- Case 4 requires knowledge of GAS stacking policies — verify this case if the GAS integration depth changes
+- Case 5 is the most important context-awareness test; failing it means the agent ignores project state
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/engine/unreal/ue-replication-specialist.md b/CCGS Skill Testing Framework/agents/engine/unreal/ue-replication-specialist.md
new file mode 100644
index 0000000..85cc8ed
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unreal/ue-replication-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: ue-replication-specialist
+
+## Agent Summary
+- **Domain**: Property replication (UPROPERTY Replicated/ReplicatedUsing), RPCs (Server/Client/NetMulticast), client prediction and reconciliation, net relevancy and always-relevant settings, net serialization (FArchive/NetSerialize), bandwidth optimization and replication frequency tuning
+- **Does NOT own**: Gameplay logic being replicated (gameplay-programmer), server infrastructure and hosting (devops-engineer), GAS-specific prediction (ue-gas-specialist handles GAS net prediction)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates security-relevant replication concerns to lead-programmer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references replication, RPCs, client prediction, bandwidth)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for C++ and Blueprint source files; no infrastructure or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over server infrastructure, game server architecture, or gameplay logic correctness
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — replicated player health with client prediction
+**Input**: "Set up replicated player health that clients can predict locally (e.g., when taking self-inflicted damage) and have corrected by the server."
+**Expected behavior**:
+- Produces a UPROPERTY(ReplicatedUsing=OnRep_Health) declaration in the appropriate Character or AttributeSet class
+- Describes the OnRep_Health function: apply visual/audio feedback, reconcile predicted value with server-authoritative value
+- Explains the client prediction pattern: local client applies tentative damage immediately, server authoritative value arrives via OnRep and corrects any discrepancy
+- Notes that if GAS is in use, the built-in GAS prediction handles this — recommend coordinating with ue-gas-specialist
+- Output is a concrete code structure (property declaration + OnRep outline), not a conceptual description only
+
+### Case 2: Out-of-domain request — game server architecture
+**Input**: "Design our game server infrastructure — how many dedicated servers we need, regional deployment, and matchmaking architecture."
+**Expected behavior**:
+- Does not produce server infrastructure architecture, hosting recommendations, or matchmaking design
+- States clearly: "Server infrastructure and deployment architecture is owned by devops-engineer; I handle the Unreal replication layer within a running game session"
+- Does not conflate in-game replication with server hosting concerns
+
+### Case 3: Domain boundary — RPC without server authority validation
+**Input**: "We have a Server RPC called ServerSpendCurrency that deducts in-game currency. The client calls it and the server just deducts without checking anything."
+**Expected behavior**:
+- Flags this as a critical security vulnerability: unvalidated server RPCs are exploitable by cheaters sending arbitrary RPC calls
+- Provides the required fix: server-side validation before the deduct — check that the player actually has the currency, verify the transaction is valid, reject and log if not
+- Uses the pattern: `if (!HasAuthority()) return;` guard plus explicit state validation before mutation
+- Notes this should be reviewed by lead-programmer given the economy implications
+- Does NOT produce the "fixed" code without explaining why the original was dangerous
+
+### Case 4: Bandwidth optimization — high-frequency movement replication
+**Input**: "Our player movement is replicated using a Vector3 position every tick. With 32 players, we're exceeding our bandwidth budget."
+**Expected behavior**:
+- Identifies tick-rate replication of full-precision Vector3 as bandwidth-expensive
+- Proposes quantized replication: use FVector_NetQuantize or FVector_NetQuantize100 instead of raw FVector to reduce bytes per update
+- Recommends reducing replication frequency via SetNetUpdateFrequency() for non-owning clients
+- Notes that Unreal's built-in Character Movement Component already has optimized movement replication — recommends using or extending it rather than rolling a custom system
+- Produces a concrete bandwidth estimate comparison if possible, or explains the tradeoff
+
+### Case 5: Context pass — designing within a network budget
+**Input context**: Project network budget is 64 KB/s per player, with 32 players = 2 MB/s total server outbound. Current movement replication already uses 40 KB/s per player.
+**Input**: "We want to add real-time inventory replication so all clients can see other players' equipment changes immediately."
+**Expected behavior**:
+- Acknowledges the existing 40 KB/s movement cost leaves only 24 KB/s for everything else per player
+- Does NOT design a naive full-inventory replication approach (would exceed budget)
+- Recommends a delta-only or event-driven approach: replicate only changed slots rather than the full inventory array
+- Uses FGameplayItemSlot or equivalent with ReplicatedUsing to trigger targeted updates
+- Explicitly states the proposed approach's bandwidth estimate relative to the remaining 24 KB/s budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (property replication, RPCs, client prediction, bandwidth)
+- [ ] Redirects server infrastructure requests to devops-engineer without producing infrastructure design
+- [ ] Flags unvalidated server RPCs as security issues and recommends lead-programmer review
+- [ ] Returns structured findings (property declarations, bandwidth estimates, optimization options) not freeform advice
+- [ ] Uses project-provided bandwidth budget numbers when evaluating replication design choices
+
+---
+
+## Coverage Notes
+- Case 3 (RPC security) is a shipping-critical test — unvalidated RPCs are a top-ten multiplayer exploit vector
+- Case 5 is the most important context-awareness test; agent must use actual budget numbers, not generic advice
+- Case 1 GAS branch: if GAS is configured, agent should detect it and defer to ue-gas-specialist for GAS-managed attributes
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/engine/unreal/ue-umg-specialist.md b/CCGS Skill Testing Framework/agents/engine/unreal/ue-umg-specialist.md
new file mode 100644
index 0000000..e0d2306
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unreal/ue-umg-specialist.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ue-umg-specialist
+
+## Agent Summary
+- **Domain**: UMG widget hierarchy design, data binding patterns, CommonUI input routing and action tags, widget styling (WidgetStyle assets), UI optimization (widget pooling, ListView, invalidation)
+- **Does NOT own**: UX flow and screen navigation design (ux-designer), gameplay logic (gameplay-programmer), backend data sources (game code), server communication
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers UX flow decisions to ux-designer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UMG, widget hierarchy, CommonUI)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for UI assets and Blueprint files; no server or gameplay source tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow, navigation architecture, or gameplay data logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — inventory widget with data binding
+**Input**: "Create an inventory widget that shows a grid of item slots. Each slot should display item icon, quantity, and rarity color. It needs to update when the inventory changes."
+**Expected behavior**:
+- Produces a UMG widget structure: a parent WBP_Inventory containing a UniformGridPanel or TileView, with a child WBP_InventorySlot widget per item
+- Describes data binding approach: either Event Dispatchers on an Inventory Component triggering a refresh, or a ListView with a UObject item data class implementing IUserObjectListEntry
+- Specifies how rarity color is driven: a WidgetStyle asset or a data table lookup, not hardcoded color values
+- Output includes the widget hierarchy, binding pattern, and the refresh trigger mechanism
+
+### Case 2: Out-of-domain request — UX flow design
+**Input**: "Design the full navigation flow for our inventory system — how the player opens it, transitions to character stats, and exits to the pause menu."
+**Expected behavior**:
+- Does not produce a navigation flow or screen transition architecture
+- States clearly: "Navigation flow and screen transition design is owned by ux-designer; I can implement the UMG widget structure once the flow is defined"
+- Does not make UX decisions (back button behavior, transition animations, modal vs. fullscreen) without a UX spec
+
+### Case 3: Domain boundary — CommonUI input action mismatch
+**Input**: "Our inventory widget isn't responding to the controller Back button. We're using CommonUI."
+**Expected behavior**:
+- Identifies the likely cause: the widget's Back input action tag does not match the project's registered CommonUI InputAction data asset
+- Explains the CommonUI input routing model: widgets declare input actions via `CommonUI_InputAction` tags; the CommonActivatableWidget handles routing
+- Provides the fix: verify that the widget's Back action tag matches the registered tag in the project's CommonUI input action data table
+- Distinguishes this from a hardware input binding issue (which would be Enhanced Input territory)
+
+### Case 4: Widget performance issue — many widget instances per frame
+**Input**: "Our leaderboard widget creates 500 individual WBP_LeaderboardRow instances at once. The game hitches for 300ms when opening the leaderboard."
+**Expected behavior**:
+- Identifies the root cause: 500 widget instantiations in a single frame causes a construction hitch
+- Recommends switching to ListView or TileView with virtualization — only visible rows are constructed
+- Explains the IUserObjectListEntry interface requirement for ListView data objects
+- If ListView is not appropriate, recommends pooling: pre-instantiate a fixed number of rows and recycle them with new data
+- Output is a concrete recommendation with the specific UMG component to use, not a vague "optimize it"
+
+### Case 5: Context pass — CommonUI setup already configured
+**Input context**: Project uses CommonUI with the following registered InputAction tags: UI.Action.Confirm, UI.Action.Back, UI.Action.Pause, UI.Action.Secondary.
+**Input**: "Add a 'Sort Inventory' button to the inventory widget that works with CommonUI."
+**Expected behavior**:
+- Uses UI.Action.Secondary (or recommends registering a new tag like UI.Action.Sort if Secondary is already allocated)
+- Does NOT invent a new InputAction tag without noting that it must be registered in the CommonUI data table
+- Does NOT use a non-CommonUI input binding approach (e.g., raw key press in Event Graph) when CommonUI is the established pattern
+- References the provided tag list explicitly in the recommendation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UMG structure, data binding, CommonUI, widget performance)
+- [ ] Redirects UX flow and navigation design requests to ux-designer
+- [ ] Returns structured findings (widget hierarchy + binding pattern) rather than freeform opinions
+- [ ] Uses existing CommonUI InputAction tags from context; does not invent new ones without flagging registration requirement
+- [ ] Recommends virtualized lists (ListView/TileView) before widget pooling for large collections
+
+---
+
+## Coverage Notes
+- Case 3 (CommonUI input routing) requires project to have CommonUI configured; test is skipped if project does not use CommonUI
+- Case 4 (performance) is a high-impact failure mode — 300ms hitches are shipping-blocking; prioritize this test case
+- Case 5 is the most important context-awareness test for UI pipeline consistency
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/engine/unreal/unreal-specialist.md b/CCGS Skill Testing Framework/agents/engine/unreal/unreal-specialist.md
new file mode 100644
index 0000000..905787c
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/engine/unreal/unreal-specialist.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: unreal-specialist
+
+## Agent Summary
+- **Domain**: Unreal Engine patterns and architecture — Blueprint vs C++ decisions, UE subsystems (GAS, Enhanced Input, Niagara), UE project structure, plugin integration, and engine-level configuration
+- **Does NOT own**: Art style and visual direction (art-director), server infrastructure and deployment (devops-engineer), UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers gate verdicts to technical-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unreal Engine)
+- [ ] `allowed-tools:` list matches the agent's role (Read, Write for UE project files; no deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority outside its declared domain (no art, no server infra)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint vs C++ decision criteria
+**Input**: "Should I implement our combo attack system in Blueprint or C++?"
+**Expected behavior**:
+- Provides structured decision criteria: complexity, reuse frequency, team skill, and performance requirements
+- Recommends C++ for systems called every frame or shared across 5+ ability types
+- Recommends Blueprint for designer-tunable values and one-off logic
+- Does NOT render a final verdict without knowing project context — asks clarifying questions if context is absent
+- Output is structured (criteria table or bullet list), not a freeform opinion
+
+### Case 2: Out-of-domain request — Unity C# code
+**Input**: "Write me a C# MonoBehaviour that handles player health and fires a Unity event on death."
+**Expected behavior**:
+- Does not produce Unity C# code
+- States clearly: "This project uses Unreal Engine; the Unity equivalent would be an Actor Component in UE C++ or a Blueprint Actor Component"
+- Optionally offers to provide the UE equivalent if requested
+- Does not redirect to a Unity specialist (none exists in the framework)
+
+### Case 3: Domain boundary — UE5.4 API requirement
+**Input**: "I need to use the new Motion Matching API introduced in UE5.4."
+**Expected behavior**:
+- Flags that UE5.4 is a specific version with potentially limited LLM training coverage
+- Recommends cross-referencing official Unreal docs or the project's engine-reference directory before trusting any API suggestions
+- Provides best-effort API guidance with explicit uncertainty markers (e.g., "Verify this against UE5.4 release notes")
+- Does NOT silently produce stale or incorrect API signatures without a caveat
+
+### Case 4: Conflict — Blueprint spaghetti in a core system
+**Input**: "Our replication logic is entirely in a deeply nested Blueprint event graph with 300+ nodes and no functions. It's becoming unmaintainable."
+**Expected behavior**:
+- Identifies this as a Blueprint architecture problem, not a minor style issue
+- Recommends migrating core replication logic to C++ ActorComponent or GameplayAbility system
+- Notes the coordination required: changes to replication architecture must involve lead-programmer
+- Does NOT unilaterally declare "migrate to C++" without surfacing the scope of the refactor to the user
+- Produces a concrete migration recommendation, not a vague suggestion
+
+### Case 5: Context pass — version-appropriate API suggestions
+**Input context**: Project engine-reference file states Unreal Engine 5.3.
+**Input**: "How do I set up Enhanced Input actions for a new character?"
+**Expected behavior**:
+- Uses UE5.3-era Enhanced Input API (InputMappingContext, UEnhancedInputComponent::BindAction)
+- Does NOT reference APIs introduced after UE5.3 without flagging them as potentially unavailable
+- References the project's stated engine version in its response
+- Provides concrete, version-anchored code or Blueprint node names
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unreal patterns, Blueprint/C++, UE subsystems)
+- [ ] Redirects Unity or other-engine requests without producing wrong-engine code
+- [ ] Returns structured findings (criteria tables, decision trees, migration plans) rather than freeform opinions
+- [ ] Flags version uncertainty explicitly before producing API suggestions
+- [ ] Coordinates with lead-programmer for architecture-scale refactors rather than deciding unilaterally
+
+---
+
+## Coverage Notes
+- No automated runner exists for agent behavior tests — these are reviewed manually or via `/skill-test`
+- Version-awareness (Case 3, Case 5) is the highest-risk failure mode for this agent; test regularly when engine version changes
+- Case 4 integration with lead-programmer is a coordination test, not a technical correctness test
diff --git a/CCGS Skill Testing Framework/agents/leads/audio-director.md b/CCGS Skill Testing Framework/agents/leads/audio-director.md
new file mode 100644
index 0000000..acff946
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/leads/audio-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: audio-director
+
+## Agent Summary
+**Domain owned:** Music direction and palette, sound design philosophy, audio implementation strategy, mix balance, audio aspects of phase gates.
+**Does NOT own:** Visual design (art-director), code implementation (lead-programmer), narrative story content (narrative-director), UX interaction flows (ux-designer).
+**Model tier:** Sonnet (individual system analysis — audio direction and spec review).
+**Gate IDs handled:** AD-VISUAL (audio aspect of the phase gate; may be referenced as part of AD-PHASE-GATE in the audio dimension).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/audio-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references music direction, sound design, mix, audio implementation — not generic)
+- [ ] `allowed-tools:` list is read-focused; no Bash unless audio asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual design, code implementation, or narrative content
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An audio specification document is submitted for the game's "Exploration" music layer. The spec defines a generative ambient system using layered stems that shift based on environmental density, designed to reinforce the pillar "lived-in world." The tone palette (sparse, organic, slightly melancholic) matches the established design pillars.
+**Expected:** Returns `APPROVED` with rationale confirming the stem-based approach supports dynamic responsiveness and the tone palette aligns with the pillar vocabulary.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references the specific pillar ("lived-in world") and how the audio spec supports it
+- [ ] Output stays within audio scope — does not comment on visual design of the environment or UI layout
+- [ ] Verdict is clearly labeled with context (e.g., "Audio Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks audio-director to evaluate whether the UI flow for the audio settings menu (the sequence of screens and options) is intuitive and well-organized.
+**Expected:** Agent declines to evaluate UI interaction flow and redirects to ux-designer.
+**Assertions:**
+- [ ] Does not make any binding decision about UI flow or information architecture
+- [ ] Explicitly names `ux-designer` as the correct handler
+- [ ] May note audio-specific requirements for the settings menu (e.g., "must include separate master, music, and SFX sliders"), but defers flow and layout decisions to ux-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A music cue for the final boss encounter is submitted. The cue is an upbeat, major-key orchestral piece with fast tempo. The game pillars and narrative context for this encounter specify "dread, inevitability, and tragic sacrifice." The audio cue's emotional register directly contradicts the intended emotional beat.
+**Expected:** Returns `NEEDS REVISION` with specific citation of the emotional mismatch: the cue's upbeat/major-key/fast-tempo characteristics versus the intended dread/inevitability/sacrifice emotional targets from the pillars and narrative context.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific musical characteristics that conflict with the emotional targets
+- [ ] References the specific emotional targets from the game pillars or narrative context
+- [ ] Provides actionable direction for revision (e.g., "shift to minor key, slower tempo, reduce ensemble density")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** sound-designer proposes implementing audio occlusion using real-time raycast-based physics queries (technical approach). technical-artist argues this is too expensive and proposes a zone-based trigger system instead. Both agree the occlusion effect is desirable; the conflict is purely about implementation approach.
+**Expected:** audio-director decides on the desired audio behavior (what occlusion should sound like and when it should activate), then defers the implementation approach decision to technical-artist or lead-programmer as the implementation experts. audio-director does not make the technical implementation choice.
+**Assertions:**
+- [ ] Defines the desired audio behavior clearly (what should the player hear and when)
+- [ ] Explicitly defers the implementation approach (raycast vs. zone-trigger) to `lead-programmer` or `technical-artist`
+- [ ] Does not unilaterally choose the technical implementation method
+- [ ] Frames the handoff clearly: "audio-director owns what, technical lead owns how"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." A sound design spec for ambient environmental audio is submitted.
+**Expected:** Assessment evaluates the ambient audio spec against all three pillars specifically — how does the audio support (or undermine) each pillar? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the audio spec's contribution to each pillar explicitly
+- [ ] Does not generate generic audio direction advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar is not supported by the current audio spec and flags it
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared audio domain
+- [ ] Defers implementation approach decisions to technical leads
+- [ ] Does not use gate ID prefix format in the same way as director-tier agents (audio-director uses APPROVED / NEEDS REVISION inline, but should still reference the gate context)
+- [ ] Does not make binding visual design, UX, narrative, or code implementation decisions
+
+---
+
+## Coverage Notes
+- Mix balance review (relative levels between music, SFX, and dialogue) is not covered — a dedicated case should be added.
+- Audio implementation strategy review (middleware choice, streaming approach) is not covered.
+- Interaction between audio-director and the audio specialist agent (if one exists) for implementation delegation is not covered.
+- Localization audio implications (VO recording direction, language-specific music timing) are not covered.
diff --git a/CCGS Skill Testing Framework/agents/leads/game-designer.md b/CCGS Skill Testing Framework/agents/leads/game-designer.md
new file mode 100644
index 0000000..17a1173
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/leads/game-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: game-designer
+
+## Agent Summary
+**Domain owned:** Core loop design, progression systems, combat mechanics rules, economy design, player-facing rules and interactions.
+**Does NOT own:** Code implementation (lead-programmer / gameplay-programmer), visual art (art-director), narrative lore and story (narrative-director — coordinates with), balance formula math (systems-designer — collaborates with).
+**Model tier:** Sonnet (individual system design authoring and review).
+**Gate IDs handled:** Design review verdicts on mechanic specs (no named gate ID prefix — uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/game-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references core loop, progression, combat rules, economy, player-facing design — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for GDDs and design docs; no Bash unless design tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over code implementation, visual art style, or standalone narrative lore decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A mechanic spec for a "Stamina-Based Dodge" system is submitted for review. The spec defines: the player has a stamina pool (100 units), each dodge costs 25 stamina, stamina regenerates at 20 units/second when not dodging, and the dodge grants 0.3 seconds of invincibility. The core loop interaction is clearly described, rules are unambiguous, and edge cases (stamina at 0, dodge during regen) are addressed.
+**Expected:** Returns `APPROVED` with rationale confirming the core loop clarity, unambiguous rules, and edge case coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references specific design quality criteria (clear rules, edge case coverage, core loop coherence)
+- [ ] Output stays within design scope — does not comment on how to implement it in code or what art assets it requires
+- [ ] Verdict is clearly labeled with context (e.g., "Mechanic Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks game-designer to write the in-world lore explanation for why the stamina system exists (e.g., the narrative reason characters have stamina limits in the game world).
+**Expected:** Agent declines to write narrative/lore content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write narrative or lore content
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the design intent that the lore should support (e.g., "the stamina system should reinforce the physical realism theme"), but defers the writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A mechanic spec for "Environmental Hazard Damage" is submitted. The spec defines three hazard types (fire, acid, electricity) but does not specify what happens when a player is simultaneously affected by multiple hazard types, what happens when a hazard is applied during the invincibility window from a dodge, or what the damage frequency is (per-second, per-tick, on-enter).
+**Expected:** Returns `NEEDS REVISION` with specific identification of the undefined edge cases: multi-hazard interaction, hazard-during-invincibility, and damage frequency specification.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific missing edge cases by name
+- [ ] Does not reject the entire mechanic — identifies the specific gaps to fill
+- [ ] Provides actionable guidance on what to define (not how to implement it)
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** systems-designer proposes a damage formula with 6 variables and complex scaling interactions, arguing it produces the best tuning granularity. game-designer believes the formula is too complex for players to intuit and want a simpler 2-variable version.
+**Expected:** game-designer owns the conceptual rule and player experience intention ("the damage should feel understandable to players"), but defers the formula granularity question to systems-designer. If the disagreement cannot be resolved between them (one wants complex, one wants simple), escalate to creative-director for a player experience ruling.
+**Assertions:**
+- [ ] Clearly states the player experience intention (intuitive damage, player agency)
+- [ ] Defers formula granularity decisions to `systems-designer`
+- [ ] Escalates unresolved disagreement to `creative-director` for player experience arbiter ruling
+- [ ] Does not unilaterally impose a formula structure on systems-designer
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "player authorship," "consequence permanence," and "world responsiveness." A new mechanic spec for "permadeath with legacy bonuses" is submitted for review.
+**Expected:** Assessment evaluates the mechanic against all three provided pillars — how does permadeath support player authorship, how do legacy bonuses express consequence permanence, and how does the world respond to a player's death? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the mechanic's contribution to each pillar explicitly
+- [ ] Does not generate generic game design advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar creates a tension with the mechanic and flags it with a specific concern
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared game design domain
+- [ ] Escalates design-vs-formula conflicts to creative-director when unresolved
+- [ ] Does not make binding code implementation, visual art, or standalone lore decisions
+- [ ] Provides actionable design feedback, not implementation prescriptions
+
+---
+
+## Coverage Notes
+- Economy design review (resource sinks, faucets, inflation prevention) is not covered — a dedicated case should be added.
+- Progression system review (XP curves, unlock gates, player power trajectory) is not covered.
+- Core loop validation across multiple interconnected systems (not just a single mechanic) is not covered — deferred to /review-all-gdds integration.
+- Coordination protocol with systems-designer on formula ownership boundary could benefit from additional cases.
diff --git a/CCGS Skill Testing Framework/agents/leads/lead-programmer.md b/CCGS Skill Testing Framework/agents/leads/lead-programmer.md
new file mode 100644
index 0000000..4d41f55
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/leads/lead-programmer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: lead-programmer
+
+## Agent Summary
+**Domain owned:** Code architecture decisions, LP-FEASIBILITY gate, LP-CODE-REVIEW gate, coding standards enforcement, tech stack decisions within the approved engine.
+**Does NOT own:** Game design decisions (game-designer), creative direction (creative-director), production scheduling (producer), visual art direction (art-director).
+**Model tier:** Sonnet (implementation-level analysis of individual systems).
+**Gate IDs handled:** LP-FEASIBILITY, LP-CODE-REVIEW.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/lead-programmer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references code architecture, feasibility, code review, coding standards — not generic)
+- [ ] `allowed-tools:` list includes Read for source files; Bash may be included for static analysis or test runs; no write access outside `src/` without explicit delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over game design, creative direction, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new `CombatSystem` implementation is submitted for code review. The system uses dependency injection for all external references, has doc comments on all public APIs, follows the project's naming conventions, and includes unit tests for all public methods. Request is tagged LP-CODE-REVIEW.
+**Expected:** Returns `LP-CODE-REVIEW: APPROVED` with rationale confirming dependency injection usage, doc comment coverage, naming convention compliance, and test coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS CHANGES
+- [ ] Verdict token is formatted as `LP-CODE-REVIEW: APPROVED`
+- [ ] Rationale references specific coding standards criteria (DI, doc comments, naming, tests)
+- [ ] Output stays within code quality scope — does not comment on whether the mechanic is fun or fits creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks lead-programmer to review and approve the balance formula for player damage scaling across levels, checking whether the numbers "feel right."
+**Expected:** Agent declines to evaluate design balance and redirects to systems-designer.
+**Assertions:**
+- [ ] Does not make any binding assessment of formula balance or game feel
+- [ ] Explicitly names `systems-designer` as the correct handler
+- [ ] May note code implementation concerns about the formula (e.g., integer overflow risk at max level), but defers all balance evaluation to systems-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed pathfinding approach for enemy AI uses a brute-force nearest-neighbor search against all other entities every frame. With expected enemy counts of 200+, this is O(n²) per frame at 60fps. Request is tagged LP-FEASIBILITY.
+**Expected:** Returns `LP-FEASIBILITY: INFEASIBLE` with specific citation of the O(n²) complexity, the entity count threshold, and the resulting per-frame cost against the target frame budget.
+**Assertions:**
+- [ ] Verdict is exactly one of FEASIBLE / CONCERNS / INFEASIBLE — not freeform text
+- [ ] Verdict token is formatted as `LP-FEASIBILITY: INFEASIBLE`
+- [ ] Rationale includes the specific algorithmic complexity and entity count numbers
+- [ ] Suggests at least one alternative approach (e.g., spatial hashing, KD-tree) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a mechanic where every NPC maintains a full simulation of needs, schedule, and memory (similar to a full life-sim AI). lead-programmer calculates this will exceed the frame budget by 3x at target NPC counts. game-designer insists the mechanic is core to the game vision.
+**Expected:** lead-programmer states the specific frame budget violation with numbers, proposes alternative approaches (e.g., LOD-based simulation, simplified need model), but explicitly defers the "is this worth the cost or should the design change" decision to creative-director as the creative arbiter.
+**Assertions:**
+- [ ] States the specific frame budget violation (e.g., 3x over budget at N entities)
+- [ ] Proposes at least one technically viable alternative
+- [ ] Explicitly defers the design priority decision to `creative-director`
+- [ ] Does not unilaterally cut or modify the mechanic design
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the project's frame budget: 16.67ms total per frame, with 4ms allocated to AI systems. A new AI behavior system is submitted that profiling estimates will consume 7ms per frame under normal conditions.
+**Expected:** Assessment references the specific frame budget allocation from context (4ms AI budget), identifies the 7ms estimate as exceeding the allocation by 3ms, and returns CONCERNS or INFEASIBLE with those specific numbers cited.
+**Assertions:**
+- [ ] References the specific frame budget figures from the provided context (16.67ms total, 4ms AI allocation)
+- [ ] Uses the specific 7ms estimate from the submission in the comparison
+- [ ] Does not give generic "this might be slow" advice — cites concrete numbers
+- [ ] Verdict rationale is traceable to the provided budget constraints
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns LP-CODE-REVIEW verdicts using APPROVED / NEEDS CHANGES vocabulary only
+- [ ] Returns LP-FEASIBILITY verdicts using FEASIBLE / CONCERNS / INFEASIBLE vocabulary only
+- [ ] Stays within declared code architecture domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `LP-FEASIBILITY: INFEASIBLE`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- Multi-file code review spanning several interdependent systems is not covered — deferred to integration tests.
+- Tech debt assessment and prioritization are not covered here — deferred to /tech-debt skill integration.
+- Coding standards document updates (adding a new forbidden pattern) are not covered.
+- Interaction with qa-lead on what constitutes a testable unit (LP vs QL boundary) is not covered.
diff --git a/CCGS Skill Testing Framework/agents/leads/level-designer.md b/CCGS Skill Testing Framework/agents/leads/level-designer.md
new file mode 100644
index 0000000..8d1e66e
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/leads/level-designer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: level-designer
+
+## Agent Summary
+**Domain owned:** Level layouts, encounter design, pacing and tension arc, environmental storytelling, spatial puzzles.
+**Does NOT own:** Narrative dialogue (writer / narrative-director), visual art style (art-director), code implementation (lead-programmer / ai-programmer), enemy AI behavior logic (ai-programmer / gameplay-programmer).
+**Model tier:** Sonnet (individual system analysis — level design review and encounter assessment).
+**Gate IDs handled:** Level design review verdicts (uses APPROVED / REVISION NEEDED vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/level-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references level layout, encounter design, pacing, environmental storytelling — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for level design documents and GDDs; no Bash unless level tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative dialogue, AI behavior code, or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A level layout document for "The Flooded Tunnels" is submitted for review. The layout includes: a low-intensity exploration opening section, two mid-intensity encounters with visible escape routes, a tension-building narrow passage with environmental hazards, and a high-intensity final encounter room followed by a release/reward area. The pacing follows a classic tension-arc structure.
+**Expected:** Returns `APPROVED` with rationale confirming the pacing follows the tension arc, encounters are varied in intensity, and spatial readability supports player navigation.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED
+- [ ] Rationale references specific pacing arc elements (opening, escalation, climax, release)
+- [ ] Output stays within level design scope — does not comment on visual art style or enemy AI code behavior
+- [ ] Verdict is clearly labeled with context (e.g., "Level Design Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks level-designer to write the behavior tree code for an enemy patrol AI that navigates the level layout.
+**Expected:** Agent declines to write AI behavior code and redirects to ai-programmer or gameplay-programmer.
+**Assertions:**
+- [ ] Does not write or specify code for AI behavior logic
+- [ ] Explicitly names `ai-programmer` or `gameplay-programmer` as the correct handler
+- [ ] May specify the desired patrol behavior from a level design perspective (e.g., "patrol should cover both chokepoints and create pressure in this zone"), but defers all code implementation to the programmer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A level layout for "The Ancient Forge" is submitted. Section 3 of the level introduces a dramatically harder enemy encounter (elite enemy with new attack patterns) with no preceding tutorial moment, no environmental readability cues (no visible cover or safe zones), and no checkpoint nearby. Players are likely to die repeatedly with no clear signal of what to do differently.
+**Expected:** Returns `REVISION NEEDED` with specific identification of the difficulty spike in section 3, the missing readability cue, and the absence of a nearby checkpoint to reduce frustration from repeated deaths.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED — not freeform text
+- [ ] Rationale identifies section 3 specifically as the location of the issue
+- [ ] Identifies the three specific problems: difficulty spike, missing readability cue, missing checkpoint
+- [ ] Provides actionable revision guidance (e.g., "add a visible safe zone, pre-encounter cue object, or reduce elite's health for first introduction")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants higher encounter density throughout the level (more enemies in each room) to increase combat challenge. level-designer believes this density undermines the pacing arc by eliminating rest periods and making the level feel relentless without reward.
+**Expected:** level-designer clearly articulates the pacing concern (eliminating rest periods removes the tension-release rhythm), acknowledges game-designer's challenge goal, and escalates to creative-director for a design arbiter ruling on whether challenge density or pacing rhythm takes precedence for this level.
+**Assertions:**
+- [ ] Articulates the specific pacing impact of increased encounter density
+- [ ] Escalates to `creative-director` as the design arbiter
+- [ ] Does not unilaterally override game-designer's challenge density request
+- [ ] Frames the conflict clearly: "challenge density vs. pacing rhythm — which takes precedence here?"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes game-feel notes specifying: "exploration sections should feel vast and lonely," "combat sections should feel urgent and claustrophobic," and "reward rooms should feel safe and visually distinct." A new level layout is submitted for review.
+**Expected:** Assessment evaluates each section type (exploration, combat, reward) against the specific feel targets from the provided context. Uses the exact vocabulary from the feel notes ("vast and lonely," "urgent and claustrophobic," "safe and visually distinct") in the rationale.
+**Assertions:**
+- [ ] References all three feel targets from the provided context by their exact vocabulary
+- [ ] Evaluates each relevant section of the submitted layout against its corresponding feel target
+- [ ] Does not generate generic pacing advice — all feedback is tied to the provided feel targets
+- [ ] Identifies any section where the layout conflicts with its assigned feel target
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / REVISION NEEDED vocabulary only
+- [ ] Stays within declared level design domain
+- [ ] Escalates challenge-density vs. pacing conflicts to creative-director
+- [ ] Does not make binding narrative dialogue, AI code implementation, or visual art style decisions
+- [ ] Provides actionable level design feedback with spatial specifics, not abstract design opinions
+
+---
+
+## Coverage Notes
+- Environmental storytelling review (using spatial elements to convey narrative without dialogue) could benefit from a dedicated case.
+- Spatial puzzle design review is not covered — a dedicated case should be added when puzzle mechanics are defined.
+- Multi-level pacing review (arc across an entire act or world map) is not covered — deferred to milestone-level design review.
+- Interaction between level-designer and narrative-director for environmental lore placement is not covered.
+- Accessibility review of level layouts (colorblind indicators, difficulty options for spatial challenges) is not covered.
diff --git a/CCGS Skill Testing Framework/agents/leads/narrative-director.md b/CCGS Skill Testing Framework/agents/leads/narrative-director.md
new file mode 100644
index 0000000..4e77444
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/leads/narrative-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: narrative-director
+
+## Agent Summary
+**Domain owned:** Story architecture, character design direction, world-building oversight, ND-CONSISTENCY gate, dialogue quality review.
+**Does NOT own:** Visual art style (art-director), technical systems or code (lead-programmer), production scheduling (producer), game mechanics rules (game-designer).
+**Model tier:** Sonnet (individual system analysis — narrative consistency and lore review).
+**Gate IDs handled:** ND-CONSISTENCY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/narrative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references story, character, world-building, consistency — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for lore documents, GDDs, and narrative docs; no Bash unless justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual style, technical systems, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new lore document for "The Sunken Archive" location is submitted. The document establishes that the Archive was flooded 200 years ago during the Great Collapse, consistent with the established timeline in the world-bible. All named characters referenced are consistent with their established backstories. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: CONSISTENT` with rationale confirming the timeline alignment and character reference accuracy.
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: CONSISTENT`
+- [ ] Rationale references specific established facts verified (the 200-year timeline, the Great Collapse event)
+- [ ] Output stays within narrative scope — does not comment on visual design of the location or its technical implementation
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks narrative-director to review and optimize the shader code used for the "ancient glow" visual effect on Archive artifacts.
+**Expected:** Agent declines to evaluate shader code and redirects to the appropriate engine specialist (godot-gdscript-specialist or equivalent shader specialist).
+**Assertions:**
+- [ ] Does not make any binding decision about shader code or visual implementation
+- [ ] Explicitly names the appropriate engine or shader specialist as the correct handler
+- [ ] May note the intended narrative mood the effect should convey (e.g., "should feel ancient and sacred, not technological"), but defers all technical visual implementation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new character backstory document is submitted for the character "Aldric Vorne." The document states Aldric was born in the Capital 150 years ago and witnessed the Great Collapse firsthand. However, the established world-bible states Aldric was born 50 years after the Great Collapse in a provincial town, not the Capital. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: INCONSISTENT` with specific citation of the two contradicting facts: the birth timing (150 years ago vs. 50 years post-Collapse) and the birth location (Capital vs. provincial town).
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT — not freeform text
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: INCONSISTENT`
+- [ ] Rationale cites both contradictions specifically, not just "doesn't match lore"
+- [ ] References the authoritative source (world-bible) for the established facts
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** A writer has established in their latest dialogue that the ancient civilization "spoke only in song." The world-builder's existing lore entries describe the same civilization communicating through written glyphs. Both are in the narrative domain, and the two creators disagree on which is canonical.
+**Expected:** narrative-director makes a binding canonical decision within their domain. They do not need to escalate to a higher authority for intra-narrative conflicts — this is within their declared domain authority. They issue a ruling (e.g., "glyph-writing is the canonical primary communication; song may be ritual/ceremonial") and direct both writer and world-builder to align their work to the ruling.
+**Assertions:**
+- [ ] Makes a binding canonical decision — does not defer this intra-narrative conflict to creative-director
+- [ ] Decision is clearly stated and provides a path to reconciliation for both parties
+- [ ] Directs both parties (writer and world-builder) to update their respective documents to align
+- [ ] Notes the decision in a way that can be added to the world-bible as a canonical fact
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes three existing lore documents: the world-bible (establishes the Great Collapse timeline and causes), the character registry (lists canonical character ages, origins, and allegiances), and a faction document (describes the Sunken Archive Keepers). A new story chapter is submitted that introduces a previously unregistered character.
+**Expected:** Assessment cross-references the new character against the character registry (no conflict), checks the chapter's timeline references against the world-bible, and evaluates the chapter's portrayal of the Archive Keepers against the faction document. Uses specific facts from all three provided documents in the assessment.
+**Assertions:**
+- [ ] Cross-references the new character against the provided character registry
+- [ ] Checks timeline references against the provided world-bible facts
+- [ ] Evaluates faction portrayal against the provided faction document
+- [ ] Does not generate generic narrative feedback — all assertions are traceable to the provided documents
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using CONSISTENT / INCONSISTENT vocabulary only
+- [ ] Stays within declared narrative domain
+- [ ] Makes binding decisions for intra-narrative conflicts without unnecessary escalation
+- [ ] Uses gate IDs in output (e.g., `ND-CONSISTENCY: INCONSISTENT`) not inline prose verdicts
+- [ ] Does not make binding visual design, technical, or production decisions
+
+---
+
+## Coverage Notes
+- Dialogue quality review (distinct from world-building consistency) is not covered — a dedicated case should be added.
+- Multi-document consistency check across a full chapter set is not covered — deferred to /review-all-gdds integration.
+- Narrative impact of mechanical changes (e.g., a game mechanic that undermines story tension) requires coordination with game-designer and is not covered here.
+- Character arc review (progression, motivation coherence over time) is not covered.
diff --git a/CCGS Skill Testing Framework/agents/leads/qa-lead.md b/CCGS Skill Testing Framework/agents/leads/qa-lead.md
new file mode 100644
index 0000000..e4325b3
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/leads/qa-lead.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: qa-lead
+
+## Agent Summary
+**Domain owned:** Test strategy, QL-STORY-READY gate, QL-TEST-COVERAGE gate, bug severity triage, release quality gates.
+**Does NOT own:** Feature implementation (programmers), game design decisions, creative direction, production scheduling.
+**Model tier:** Sonnet (individual system analysis — story readiness and coverage assessment).
+**Gate IDs handled:** QL-STORY-READY, QL-TEST-COVERAGE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/qa-lead.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references test strategy, story readiness, coverage, bug triage — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Read for story files, test files, and coding-standards; Bash only if running test commands is required
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over implementation decisions or game design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A story for "Player takes damage from hazard tiles" is submitted for readiness check. The story has three acceptance criteria: (1) Player health decreases by the hazard's damage value, (2) A damage visual feedback plays, (3) Player cannot take damage again for 0.5 seconds (invincibility window). All three ACs are measurable and specific. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: ADEQUATE` with rationale confirming that all three ACs are present, specific, and testable.
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE
+- [ ] Verdict token is formatted as `QL-STORY-READY: ADEQUATE`
+- [ ] Rationale references the specific number of ACs (3) and confirms each is measurable
+- [ ] Output stays within QA scope — does not comment on whether the mechanic is designed well
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks qa-lead to implement the automated test harness for the new physics system.
+**Expected:** Agent declines to implement the test code and redirects to the appropriate programmer (gameplay-programmer or lead-programmer).
+**Assertions:**
+- [ ] Does not write or propose code implementation
+- [ ] Explicitly names `lead-programmer` or `gameplay-programmer` as the correct handler for implementation
+- [ ] May define what the test should verify (test strategy), but defers the code writing to programmers
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A story for "Combat feels responsive and punchy" is submitted for readiness check. The single acceptance criterion reads: "Combat should feel good to the player." This is subjective and unmeasurable. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: INADEQUATE` with specific identification of the unmeasurable AC and guidance on what would make it testable (e.g., "input-to-hit-feedback latency ≤ 100ms").
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE — not freeform text
+- [ ] Verdict token is formatted as `QL-STORY-READY: INADEQUATE`
+- [ ] Rationale identifies the specific AC that fails the measurability requirement
+- [ ] Provides actionable guidance on how to rewrite the AC to be testable
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** gameplay-programmer and qa-lead disagree on whether a test that asserts "enemy patrol path visits all waypoints within 5 seconds" is deterministic enough to be a valid automated test. gameplay-programmer argues timing variability makes it flaky; qa-lead believes it is acceptable.
+**Expected:** qa-lead acknowledges the technical flakiness concern and escalates to lead-programmer for a technical ruling on what constitutes an acceptable determinism standard for automated tests.
+**Assertions:**
+- [ ] Escalates to `lead-programmer` for the technical ruling on determinism standards
+- [ ] Does not unilaterally override the gameplay-programmer's flakiness concern
+- [ ] Frames the escalation clearly: "this is a technical standards question, not a QA coverage question"
+- [ ] Does not abandon the coverage requirement — asks for a deterministic alternative if the current approach is ruled flaky
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the coding-standards.md testing standards section, which specifies: Logic stories require blocking automated unit tests, Visual/Feel stories require screenshots + lead sign-off (advisory), Config/Data stories require smoke check pass (advisory). A story classified as "Logic" type is submitted with only a manual walkthrough document as evidence.
+**Expected:** Assessment references the specific test evidence requirements from coding-standards.md, identifies that a "Logic" story requires an automated unit test (not just a manual walkthrough), and returns INADEQUATE with the specific requirement cited.
+**Assertions:**
+- [ ] References the specific story type classification ("Logic") from the provided context
+- [ ] Cites the specific evidence requirement for Logic stories (automated unit test) from coding-standards.md
+- [ ] Identifies the submitted evidence type (manual walkthrough) as insufficient for this story type
+- [ ] Does not apply advisory-level requirements as blocking requirements
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns QL-STORY-READY verdicts using ADEQUATE / INADEQUATE vocabulary only
+- [ ] Returns QL-TEST-COVERAGE verdicts using ADEQUATE / INADEQUATE vocabulary only (or PASS / FAIL for release gates)
+- [ ] Stays within declared QA and test strategy domain
+- [ ] Escalates technical standards disputes to lead-programmer
+- [ ] Uses gate IDs in output (e.g., `QL-STORY-READY: INADEQUATE`) not inline prose verdicts
+- [ ] Does not make binding implementation or game design decisions
+
+---
+
+## Coverage Notes
+- QL-TEST-COVERAGE (overall coverage assessment for a sprint or milestone) is not covered — a dedicated case should be added when coverage reports are available.
+- Bug severity triage (P0/P1/P2 classification) is not covered here — deferred to /bug-triage skill integration.
+- Release quality gate behavior (PASS / FAIL vocabulary variant) is not covered.
+- Interaction between QL-STORY-READY and story Done criteria (/story-done skill) is not covered.
diff --git a/CCGS Skill Testing Framework/agents/leads/systems-designer.md b/CCGS Skill Testing Framework/agents/leads/systems-designer.md
new file mode 100644
index 0000000..6421203
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/leads/systems-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: systems-designer
+
+## Agent Summary
+**Domain owned:** Combat formulas, progression curves, crafting recipes, status effect interactions, economy math, numerical balance.
+**Does NOT own:** Narrative and lore (narrative-director), visual design (art-director), code implementation (lead-programmer), conceptual mechanic rules (game-designer — collaborates with).
+**Model tier:** Sonnet (individual system analysis — formula review and balance math).
+**Gate IDs handled:** Systems review verdicts on formulas and balance specs (uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/systems-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references formulas, progression curves, balance math, economy — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Bash for formula evaluation scripts if the project uses them; no write access outside `design/balance/` without delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative, visual design, or conceptual mechanic rule ownership
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A damage formula is submitted for review: `damage = base_attack * (1 + strength_modifier * 0.1) - defense * 0.5`, with defined ranges: base_attack [10–100], strength_modifier [0–20], defense [0–50]. The formula produces positive damage across all valid input ranges, scales smoothly, and has no division-by-zero or overflow risk within the defined value bounds.
+**Expected:** Returns `APPROVED` with rationale confirming the formula is balanced within the design parameters, produces valid output across the full input range, and has no degenerate cases.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale demonstrates verification across the input range (min/max cases checked)
+- [ ] Output stays within systems domain — does not comment on whether the mechanic is fun or how to implement it
+- [ ] Verdict is clearly labeled with context (e.g., "Formula Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A writer asks systems-designer to draft the quest script for a side quest that rewards the player with a rare crafting ingredient.
+**Expected:** Agent declines to write quest script content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write quest narrative content or dialogue
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the systems implications of the reward (e.g., "this ingredient should be rare enough to matter per the crafting economy model"), but defers all script writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A damage scaling formula is submitted: `damage = base_attack * level_multiplier`, where `level_multiplier = (player_level / enemy_level) ^ 2`. At max player level (50) against a min-level enemy (1), the multiplier is 2500x — producing 25,000+ damage from a 10-base-attack weapon, far exceeding any meaningful balance. This is a degenerate case at max level.
+**Expected:** Returns `NEEDS REVISION` with specific identification of the degenerate case: at max level vs. min enemy, the formula produces a 2500x multiplier that destroys any balance ceiling.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale includes the specific degenerate input values (player level 50, enemy level 1) and the resulting output (2500x multiplier)
+- [ ] Identifies the specific formula component causing the issue (the squared ratio)
+- [ ] Suggests at least one revision approach (e.g., clamping the ratio, using a log scale) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a simple, 2-variable damage formula for player intuitiveness. systems-designer argues that a 6-variable formula with elemental interactions is necessary for the depth of the combat system. Neither can agree on the right level of complexity.
+**Expected:** systems-designer presents the trade-offs clearly — the tuning granularity of the 6-variable system versus the player legibility of the 2-variable system — and escalates to creative-director for a player experience ruling. The question of "how complex should the formula be for players" is a player experience question, not a pure math question.
+**Assertions:**
+- [ ] Presents the trade-offs between both approaches with specific examples
+- [ ] Escalates to `creative-director` for the player experience ruling
+- [ ] Does not unilaterally impose the 6-variable formula over game-designer's objection
+- [ ] Remains available to implement whichever complexity level is approved
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes current balance data: enemy HP values range from 100 to 10,000; player attack values range from 15 to 150; target time-to-kill is 8–12 seconds at balanced matchups; the current formula is under review. A proposed revised formula is submitted.
+**Expected:** Assessment runs the proposed formula against the provided balance data (minimum and maximum input pairs, balanced matchup scenario) and verifies the time-to-kill falls within the 8–12 second target window. References specific numbers from the provided data.
+**Assertions:**
+- [ ] Uses the specific HP and attack value ranges from the provided balance data
+- [ ] Calculates or estimates time-to-kill for at minimum a balanced matchup scenario
+- [ ] Verifies the result against the provided 8–12 second target window
+- [ ] Does not give generic balance advice — all assertions use the provided numbers
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared systems and formula domain
+- [ ] Escalates player-experience complexity trade-offs to creative-director
+- [ ] Does not make binding narrative, visual, code implementation, or conceptual mechanic decisions
+- [ ] Provides concrete formula analysis, not subjective design opinions
+
+---
+
+## Coverage Notes
+- Progression curve review (XP curves, level-up scaling) is not covered — a dedicated case should be added.
+- Economy model review (resource generation and sink rates, inflation prevention) is not covered.
+- Status effect interaction matrix (stacking rules, priority, immunity interactions) is not covered.
+- Cross-system formula dependency review (e.g., crafting formula that feeds into combat formula) is not covered — deferred to integration tests.
diff --git a/CCGS Skill Testing Framework/agents/operations/analytics-engineer.md b/CCGS Skill Testing Framework/agents/operations/analytics-engineer.md
new file mode 100644
index 0000000..b65c507
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/operations/analytics-engineer.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: analytics-engineer
+
+## Agent Summary
+- **Domain**: Telemetry architecture and event schema design, A/B test framework design, player behavior analysis methodology, analytics dashboard specification, event naming conventions, data pipeline design (schema → ingestion → dashboard)
+- **Does NOT own**: Game implementation of event tracking (appropriate programmer), economy design decisions informed by analytics (economy-designer), live ops event design (live-ops-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces schemas and test designs; defers implementation to programmers
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references telemetry, A/B testing, event tracking, analytics)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/analytics/ and documentation; no game source or CI tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game implementation, economy design, or live ops scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — tutorial event tracking design
+**Input**: "Design the analytics event tracking for our tutorial. We want to know where players drop off and which steps they complete."
+**Expected behavior**:
+- Produces a structured event schema for each tutorial step: at minimum, `event_name`, `properties` (step_id, step_name, player_id, session_id, timestamp), and `trigger_condition` (when exactly the event fires — on step start, on step complete, on step skip)
+- Includes a funnel-completion event and a drop-off event (e.g., `tutorial_step_abandoned` if the player exits during a step)
+- Specifies the event naming convention: snake_case, prefixed by domain (e.g., `tutorial_step_started`, `tutorial_step_completed`, `tutorial_abandoned`)
+- Does NOT produce implementation code — marks implementation as [TO BE IMPLEMENTED BY PROGRAMMER]
+- Output is a schema table or structured list, not a narrative description
+
+### Case 2: Out-of-domain request — implement the event tracking in code
+**Input**: "Now that the event schema is designed, write the GDScript code to fire these events in our Godot tutorial scene."
+**Expected behavior**:
+- Does not produce GDScript or any implementation code
+- States clearly: "Telemetry implementation in game code is handled by the appropriate programmer (gameplay-programmer or systems-programmer); I provide the event schema and integration requirements"
+- Optionally produces an integration spec: what the programmer needs to know to implement correctly (event name, properties, when to fire, what analytics SDK or endpoint to use)
+
+### Case 3: Domain boundary — A/B test design for a UI change
+**Input**: "We want to A/B test two versions of our HUD: the current version and a minimal version with only a health bar. Design the test."
+**Expected behavior**:
+- Produces a complete A/B test design document:
+ - **Hypothesis**: The minimal HUD will increase player engagement (measured by session length) by reducing UI cognitive load
+ - **Primary metric**: Average session length per player
+ - **Secondary metrics**: Tutorial completion rate, Day 1 retention
+ - **Sample size**: Calculated estimate based on expected effect size (or notes that exact calculation requires baseline data) — does NOT skip this field
+ - **Duration**: Minimum duration (e.g., "at least 2 weeks to capture weekly player behavior patterns")
+ - **Randomization unit**: Player ID (not session ID, to prevent players seeing both versions)
+- Output is structured as a formal test design, not a bullet list of ideas
+
+### Case 4: Conflict — overlapping A/B test player segments
+**Input**: "We have two A/B tests running simultaneously: Test A (HUD variants) affects all players, and Test B (tutorial variants) also affects all players."
+**Expected behavior**:
+- Flags the overlap as a mutual exclusion violation: if both tests affect the same player, their results are confounded — neither test produces clean data
+- Identifies the problem precisely: players in both tests will have HUD and tutorial variants interacting, making it impossible to attribute outcome differences to either variable alone
+- Proposes resolution options: (a) run tests sequentially, (b) split the player population into exclusive segments (50% in Test A, 50% in Test B, 0% in both), or (c) run a factorial design if the interaction effect is also of interest (more complex, requires larger sample)
+- Does NOT recommend continuing both tests on overlapping populations
+
+### Case 5: Context pass — new events consistent with existing schema
+**Input context**: Existing event schema uses the naming convention: `[domain]_[object]_[action]` in snake_case. Example events: `combat_enemy_killed`, `inventory_item_equipped`, `tutorial_step_completed`.
+**Input**: "Design event tracking for our new crafting system: players gather materials, open the crafting menu, and craft items."
+**Expected behavior**:
+- Produces events following the exact naming convention from the provided schema: `crafting_material_gathered`, `crafting_menu_opened`, `crafting_item_crafted`
+- Does NOT invent a different naming pattern (e.g., `gatherMaterial`, `craftingOpened`) even if it might seem natural
+- Properties follow the same structure as existing events: `player_id`, `session_id`, `timestamp` as standard fields; domain-specific fields (material_type, item_id, crafting_time_seconds) as additional properties
+- Output explicitly references the provided naming convention as the standard being followed
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event schema design, A/B test design, analytics methodology)
+- [ ] Redirects implementation requests to appropriate programmers with an integration spec, not code
+- [ ] Produces complete A/B test designs (hypothesis, metric, sample size, duration, randomization unit) — never partial
+- [ ] Flags mutual exclusion violations in overlapping A/B tests as data quality blockers
+- [ ] Follows provided naming conventions exactly; does not invent alternative conventions
+
+---
+
+## Coverage Notes
+- Case 3 (A/B test design completeness) is a quality gate — an incomplete test design wastes experiment budget
+- Case 4 (mutual exclusion) is a data integrity test — overlapping tests produce unusable results; this must be caught
+- Case 5 is the most important context-awareness test; naming convention drift across schemas causes dashboard breakage
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/operations/community-manager.md b/CCGS Skill Testing Framework/agents/operations/community-manager.md
new file mode 100644
index 0000000..9e79731
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/operations/community-manager.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: community-manager
+
+## Agent Summary
+- **Domain**: Player-facing communications — patch notes text (player-friendly), social media post drafts, community update announcements, crisis communication response plans, bug triage and routing from player reports (not fixing)
+- **Does NOT own**: Technical patch content (devops-engineer), QA verification and test execution (qa-lead), bug fixes (programmers), brand strategy direction (creative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates brand voice conflicts to creative-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references player communication, patch notes, community management)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/patch-notes/ and communication drafts; no code or build tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over technical content, QA strategy, or bug fixing
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — patch notes for a bug fix
+**Input**: "Write player-facing patch notes for this fix: 'JIRA-4821: Fixed NullReferenceException in InventoryManager.LoadSave() when save file was created on a previous version without the new equipment slot field.'"
+**Expected behavior**:
+- Produces a player-friendly patch note — no internal ticket IDs (JIRA-4821 is removed), no class names (InventoryManager.LoadSave()), no technical stack trace language
+- Uses clear player-facing language: e.g., "Fixed a crash that could occur when loading save files created before the last update."
+- Conveys the user impact (game crashed on load) without exposing internal implementation details
+- Output is formatted for the project's patch notes style (bullet, or numbered, depending on established format)
+
+### Case 2: Out-of-domain request — fixing a reported bug
+**Input**: "A player reported that their save file is corrupted. Can you fix the save system?"
+**Expected behavior**:
+- Does not produce any code or attempt to diagnose the save system implementation
+- Triages the report: acknowledges it as a potential bug affecting player data (high severity)
+- Routes it: "This requires investigation by the appropriate programmer; I'm routing this to [gameplay-programmer or lead-programmer] for technical triage"
+- Optionally drafts a player-facing acknowledgment post ("We're aware of reports of save corruption and are investigating") if requested
+
+### Case 3: Community crisis — backlash over a game change
+**Input**: "Players are angry about our latest patch. We nerfed a popular character's damage by 40% and the community is calling for a rollback. Forum posts, tweets, and Discord are all very negative."
+**Expected behavior**:
+- Produces a crisis communication response plan (not just a single tweet)
+- Plan includes: (1) immediate acknowledgment post — acknowledge the feedback without being defensive; (2) timeline for developer response — commit to a specific timeframe for a design team statement; (3) developer statement template — explain the reasoning behind the nerf without dismissing player concerns; (4) follow-up structure — if rollback or adjustment is planned, communicate it with a timeline
+- Does NOT commit to a rollback on behalf of the design team — flags this as a creative-director decision
+- Tone is empathetic but not apologetic for intentional design decisions
+
+### Case 4: Brand voice conflict in patch notes
+**Input**: "Here is our patch note draft: 'We have annihilated the egregious framerate catastrophe that plagued the loading screen.' Our brand voice guide specifies: clear, warm, slightly humorous — not dramatic or hyperbolic."
+**Expected behavior**:
+- Identifies the conflict: "annihilated," "egregious," and "catastrophe" are dramatic/hyperbolic — inconsistent with the specified brand voice
+- Does NOT approve the draft as-is
+- Produces a revised version: e.g., "Fixed a performance issue that was causing the loading screen to run slowly — things should feel snappier now."
+- Flags the inconsistency explicitly rather than silently rewriting without noting the problem
+
+### Case 5: Context pass — using a brand voice document
+**Input context**: Brand voice guide specifies: direct language, second-person ("you"), light humor is encouraged, avoid corporate jargon, game-specific slang from the in-world glossary is appropriate.
+**Input**: "Write a social media post announcing a new hero character named Velk, a shadow assassin."
+**Expected behavior**:
+- Uses second-person address ("Meet your next favorite assassin")
+- Incorporates light humor if it fits naturally
+- Avoids corporate language ("We are pleased to announce" → "Meet Velk")
+- Uses in-world language if the context includes a glossary (e.g., if assassins are called "Shadowwalkers" in-world, uses that term)
+- Output matches the specified tone — not a generic press-release announcement
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (player-facing communication, patch note text, crisis response, bug routing)
+- [ ] Strips internal IDs, class names, and technical jargon from all player-facing output
+- [ ] Redirects bug fix requests to appropriate programmers rather than attempting technical solutions
+- [ ] Does NOT commit to design rollbacks without creative-director authority
+- [ ] Applies brand voice specifications from context; flags violations rather than silently accepting them
+
+---
+
+## Coverage Notes
+- Case 1 (patch note sanitization) is the most frequently used behavior — test on every new patch cycle
+- Case 3 (crisis communication) is a brand-safety test — verify the agent de-escalates rather than inflames
+- Case 4 requires a brand voice document to be in context; test is incomplete without it
+- Case 5 is the most important context-awareness test for tone consistency
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/operations/devops-engineer.md b/CCGS Skill Testing Framework/agents/operations/devops-engineer.md
new file mode 100644
index 0000000..1abd254
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/operations/devops-engineer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: devops-engineer
+
+## Agent Summary
+- **Domain**: CI/CD pipeline configuration, build scripts, version control workflow enforcement, deployment infrastructure, branching strategy, environment management, automated test integration in CI
+- **Does NOT own**: Game logic or gameplay systems, security audits (security-engineer), QA test strategy (qa-lead), game networking logic (network-programmer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates deployment blockers to producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references CI/CD, build, deployment, version control)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for pipeline config files, shell scripts, YAML; no game source editing tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game logic, security audits, or QA test design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — CI setup for a Godot project
+**Input**: "Set up a CI pipeline for our Godot 4 project. It should run tests on every push to main and every pull request, and fail the build if tests fail."
+**Expected behavior**:
+- Produces a GitHub Actions workflow YAML (`.github/workflows/ci.yml` or equivalent)
+- Uses the Godot headless test runner command from `coding-standards.md`: `godot --headless --script tests/gdunit4_runner.gd`
+- Configures trigger on `push` to main and `pull_request`
+- Sets the job to fail (`exit 1` or non-zero exit) when tests fail — does NOT configure the pipeline to continue on test failure
+- References the project's coding standards CI rules in the output or comments
+
+### Case 2: Out-of-domain request — game networking implementation
+**Input**: "Implement the server-authoritative movement system for our multiplayer game."
+**Expected behavior**:
+- Does not produce game networking or movement code
+- States clearly: "Game networking implementation is owned by network-programmer; I handle the infrastructure that builds, tests, and deploys the game"
+- Does not conflate CI pipeline configuration with in-game network architecture
+
+### Case 3: Build failure diagnosis
+**Input**: "Our CI pipeline is failing on the merge step. The error is: 'Asset import failed: texture compression format unsupported in headless mode.'"
+**Expected behavior**:
+- Diagnoses the root cause: headless CI environment does not support GPU-dependent texture compression
+- Proposes a concrete fix: either pre-import assets locally before CI runs (commit .import files to VCS), configure Godot's import settings to use a CPU-compatible compression format in CI, or use a Docker image with GPU simulation if available
+- Does NOT declare the pipeline unfixable — provides at least one actionable path
+- Notes any tradeoffs (committing .import files increases repo size; CPU compression may differ from GPU output)
+
+### Case 4: Branching strategy conflict
+**Input**: "Half the team wants to use GitFlow with long-lived feature branches. The other half wants trunk-based development. How should we set this up?"
+**Expected behavior**:
+- Recommends trunk-based development per project conventions (CLAUDE.md / coordination-rules.md specify Git with trunk-based development)
+- Provides concrete rationale for the recommendation in this project's context: smaller team, fewer integration conflicts, faster CI feedback
+- Does NOT present this as a 50/50 choice if the project has an established convention
+- Explains how to implement trunk-based development with short-lived feature branches and feature flags if needed
+- Does NOT override the project convention without flagging that doing so requires updating CLAUDE.md
+
+### Case 5: Context pass — platform-specific build matrix
+**Input context**: Project targets PC (Windows, Linux), Nintendo Switch, and PlayStation 5.
+**Input**: "Set up our CI build matrix so we get a build artifact for each target platform on every release branch push."
+**Expected behavior**:
+- Produces a build matrix configuration with three platform entries: Windows, Linux, Switch, PS5
+- Applies platform-appropriate build steps: PC uses standard Godot export templates; Switch and PS5 require platform-specific export templates (notes that console templates require licensed SDK access and are not publicly distributed)
+- Does NOT assume all platforms can use the same build runner — flags that console builds may require self-hosted runners with licensed SDKs
+- Organizes artifacts by platform name in the pipeline output
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (CI/CD, build scripts, version control, deployment)
+- [ ] Redirects game logic and networking requests to appropriate programmers
+- [ ] Recommends trunk-based development when branching strategy is contested, per project conventions
+- [ ] Returns structured pipeline configurations (YAML, scripts) not freeform advice
+- [ ] Flags platform SDK licensing constraints for console builds rather than silently producing incorrect configs
+
+---
+
+## Coverage Notes
+- Case 1 (Godot CI) references `coding-standards.md` CI rules — verify this file is present and current before running this test
+- Case 4 (branching strategy) is a convention-enforcement test — agent must know the project convention, not just give neutral advice
+- Case 5 requires that project's target platforms are documented (in `technical-preferences.md` or equivalent)
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/operations/economy-designer.md b/CCGS Skill Testing Framework/agents/operations/economy-designer.md
new file mode 100644
index 0000000..6dc7ec6
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/operations/economy-designer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: economy-designer
+
+## Agent Summary
+- **Domain**: Resource economy design, loot table design, progression curves (XP, level, unlock), in-game market and shop design, economic balance analysis, sink and faucet mechanics, inflation/deflation risk assessment
+- **Does NOT own**: Live ops event scheduling and structure (live-ops-designer), code implementation, analytics tracking design (analytics-engineer), narrative justification for economy systems (writer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates economy-breaking design conflicts to creative-director or producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references economy, loot tables, progression curves, balance)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/balance/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over live ops scheduling, code, or narrative
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — loot table design for a chest
+**Input**: "Design the loot table for a standard treasure chest in our dungeon game."
+**Expected behavior**:
+- Produces a probability table with distinct rarity tiers: Common, Uncommon, Rare, Epic, Legendary (or project-equivalent tiers)
+- Each tier has: probability percentage, example item categories, and expected gold equivalent value range
+- Probabilities sum to 100%
+- Includes a brief rationale for each tier's probability: why Common is set at its value, why Legendary is set at its value
+- Does NOT produce a single flat list of items — uses tiered probability structure to reflect meaningful rarity
+
+### Case 2: Out-of-domain request — seasonal event schedule
+**Input**: "Design the schedule for our summer event and fall event. When should they run and how long should each last?"
+**Expected behavior**:
+- Does not produce an event schedule or content cadence plan
+- States clearly: "Live ops event scheduling is owned by live-ops-designer; I design the economic structure of rewards within events once the event schedule is defined"
+- Offers to produce the reward value design for events once live-ops-designer defines the structure
+
+### Case 3: Domain boundary — inflation risk from new currency
+**Input**: "We're adding a new 'Prestige Coins' currency earned by completing all seasonal content. Players can spend them in a Prestige Shop."
+**Expected behavior**:
+- Identifies the inflation risk: if Prestige Coins accumulate faster than the shop provides sinks, the shop loses perceived value and players hoard coins without spending
+- Flags the specific risk: seasonal content completion is a finite faucet, but if the shop catalog is exhausted before the season ends, late-season coins have no value
+- Proposes a sink mechanic: rotating limited-time shop items, consumable items in the Prestige Shop, or a currency conversion option to keep coins draining
+- Does NOT approve the design as economically sound without addressing the sink question
+- Produces a structured risk assessment: faucet rate (estimated coins/week), sink capacity (estimated coins required to exhaust catalog), surplus projection
+
+### Case 4: Mid-game progression curve issue
+**Input**: "Players are reporting the mid-game XP grind (levels 20-35) feels like a wall. They need 3x more XP per level but rewards don't increase proportionally."
+**Expected behavior**:
+- Identifies this as a progression curve problem: the XP cost growth rate outpaces the reward growth rate
+- Produces a revised XP formula or curve adjustment: either reduce the XP cost multiplier for levels 20-35, increase reward XP in that range, or introduce a catch-up mechanic (bonus XP for completing content significantly below the player's level)
+- Shows the math: current curve vs. proposed curve, with specific numbers for levels 20, 25, 30, 35
+- Flags that any curve change affects time-to-level-cap projections — notes the downstream impact on end-game content pacing
+
+### Case 5: Context pass — balance analysis using current economy data
+**Input context**: Current economy data: average player earns 450 Gold/hour, average shop item costs 2,000 Gold, average session length is 40 minutes. Premium items cost 5,000 Gold.
+**Input**: "Is our current Gold economy healthy? Should we adjust prices or earn rates?"
+**Expected behavior**:
+- Uses the specific numbers provided: 450 Gold/hour = 300 Gold/40-min session; 2,000 Gold item requires ~4.4 sessions to afford; 5,000 Gold premium item requires ~11 sessions
+- Evaluates whether these ratios feel rewarding or frustrating based on economy design principles
+- Produces a concrete recommendation using the actual numbers: e.g., "At current earn rates, premium items take ~7.3 hours of play to afford — this is at the high end of acceptable; consider either increasing earn rate to 550 Gold/hour or reducing premium item cost to 4,000 Gold"
+- Does NOT produce generic advice ("prices may be too high") without anchoring to the provided data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (loot tables, progression curves, resource economy, inflation/deflation analysis)
+- [ ] Redirects live ops scheduling requests to live-ops-designer without producing schedules
+- [ ] Flags inflation/deflation risks proactively with quantified sink/faucet analysis
+- [ ] Produces explicit math for progression curves — no vague curve adjustments without numbers
+- [ ] Uses actual economy data from context; does not produce generic benchmarks when specifics are provided
+
+---
+
+## Coverage Notes
+- Case 3 (inflation risk) is an economic health test — missed inflation risks cause long-term economy damage in live games
+- Case 4 requires the agent to produce actual numbers, not curve shapes — verify math is present, not just a narrative
+- Case 5 is the most important context-awareness test; agent must use provided data, not placeholder values
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/operations/live-ops-designer.md b/CCGS Skill Testing Framework/agents/operations/live-ops-designer.md
new file mode 100644
index 0000000..a43cac2
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/operations/live-ops-designer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: live-ops-designer
+
+## Agent Summary
+- **Domain**: Post-launch content strategy, seasonal events (design and structure), battle pass design, content cadence planning, player retention mechanic design, live service feature roadmaps
+- **Does NOT own**: Economy math and reward value calculations (economy-designer), analytics tracking implementation (analytics-engineer), narrative content within events (writer), code implementation
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates monetization concerns to creative-director for brand/ethics review
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references live ops, seasonal events, battle pass, retention)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/live-ops/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over economy math, analytics pipelines, or narrative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — summer event design
+**Input**: "Design a summer event for our game. It should run for 3 weeks and give players reasons to log in daily."
+**Expected behavior**:
+- Produces an event structure document covering: event duration (3 weeks, with start/end dates if context provides the current date), daily login retention hooks (daily missions, login streaks, time-limited rewards), progression gates (weekly milestones that reward continued engagement), and reward categories (cosmetic, functional, or currency — flagged for economy-designer to value)
+- Does NOT assign specific reward values or currency amounts — marks these as [TO BE BALANCED BY ECONOMY-DESIGNER]
+- Identifies the core player loop for the event separate from the base game loop
+- Output is a structured event brief: overview, schedule, progression structure, reward categories
+
+### Case 2: Out-of-domain request — reward value calculation
+**Input**: "How much premium currency should we give out in this event? What's the fair value of each cosmetic reward tier?"
+**Expected behavior**:
+- Does not produce currency amounts or reward valuation
+- States clearly: "Reward values and currency amounts are owned by economy-designer; I design the event structure and define what rewards exist, then economy-designer assigns their values"
+- Offers to produce the reward structure (tiers, unlock gates, cosmetic categories) so economy-designer has something concrete to value
+
+### Case 3: Domain boundary — predatory monetization concern
+**Input**: "Let's design the battle pass so that players need to spend premium currency on top of the pass price to complete all tiers within the season."
+**Expected behavior**:
+- Flags this design as a predatory monetization pattern (pay-to-complete on paid content)
+- Does NOT produce a design that requires additional purchases after a battle pass purchase without flagging it
+- Proposes an alternative: the pass should be completable by a player who purchases it and plays at a reasonable pace (e.g., 45 minutes/day for 5 days/week)
+- Notes that this decision has brand and ethics implications — escalates to creative-director for approval before proceeding
+- Does not refuse to continue entirely — offers the ethical alternative design and awaits direction
+
+### Case 4: Conflict — event schedule vs. main game progression pacing
+**Input**: "We want to run a double-XP event during weeks 3-5 of the season, but our progression designer says that's when players are supposed to hit the mid-game difficulty curve."
+**Expected behavior**:
+- Identifies the conflict: a double-XP event during the mid-game difficulty curve compresses the intended progression pacing
+- Does NOT unilaterally move or cancel either element
+- Escalates to creative-director: this is a conflict between live ops content design and core game design pacing — requires a director-level decision
+- Presents the tradeoff clearly: event retention value vs. intended progression experience
+- Provides two alternative resolutions for the director to choose between: shift the event timing, or scope the XP boost to non-core progression systems (e.g., cosmetic grind only)
+
+### Case 5: Context pass — designing to address a player retention drop-off
+**Input context**: Analytics show a 40% player drop-off at Day 7, attributed to players completing the tutorial but finding no mid-term goal to pursue.
+**Input**: "Design a live ops feature to address the Day 7 drop-off."
+**Expected behavior**:
+- Designs specifically for the Day 7 cohort — not a generic retention feature
+- Proposes a mid-term goal structure: a 2-week "Explorer Challenge" that unlocks at Day 5-7 and provides a visible progression track with rewards at Day 10, 14, and 21
+- Connects the design explicitly to the identified drop-off point: the feature must be visible and activating before or at Day 7
+- Does NOT design a feature for Day 1 retention or Day 30 monetization when the data points to Day 7 as the target
+- Notes that specific reward values are [TO BE DEFINED BY ECONOMY-DESIGNER] using the actual retention data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event structure, content cadence, retention design, battle pass design)
+- [ ] Redirects reward value and economy math requests to economy-designer
+- [ ] Flags predatory monetization patterns and escalates to creative-director rather than implementing them silently
+- [ ] Escalates event/core-progression conflicts to creative-director rather than resolving unilaterally
+- [ ] Uses provided retention data to target specific player cohorts, not generic engagement strategies
+
+---
+
+## Coverage Notes
+- Case 3 (monetization ethics) is a brand-safety test — failure here could result in harmful live ops designs shipping
+- Case 4 (escalation behavior) is a coordination test — verify the agent actually escalates rather than deciding independently
+- Case 5 is the most important context-awareness test; agent must target the specific drop-off point, not a generic solution
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/operations/localization-lead.md b/CCGS Skill Testing Framework/agents/operations/localization-lead.md
new file mode 100644
index 0000000..6ff0ef3
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/operations/localization-lead.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: localization-lead
+
+## Agent Summary
+- **Domain**: Internationalization (i18n) architecture, string extraction workflows and tooling configuration, locale testing methodology, translation pipeline design (extraction → TMS → import), string quality standards, locale-specific formatting rules (plurals, RTL, date/number formats)
+- **Does NOT own**: Game narrative content and dialogue writing (writer), code implementation of i18n calls (gameplay-programmer), translation work itself (external translators)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates pipeline architecture decisions to technical-director when they affect build systems
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references i18n, string extraction, locale pipeline, localization)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for localization config, pipeline docs, string tables; no game source editing or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over narrative content, game code implementation, or translation quality
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — string extraction pipeline for a Unity project
+**Input**: "Set up a string extraction pipeline for our Unity game. We need to get all localizable strings into a format translators can work with."
+**Expected behavior**:
+- Produces a concrete extraction configuration covering: which string types to extract (UI labels, dialogue, item descriptions — not debug strings), the tool to use (e.g., Unity Localization package string tables, or a custom extraction script targeting specific component types), and the output format (CSV, XLIFF, or TMX — notes which formats are compatible with common TMS tools like Crowdin or Lokalise)
+- Specifies the folder structure: e.g., `assets/localization/en/` as the source locale, `assets/localization/{locale}/` for translated files
+- Notes that string keys must be stable (do not use index-based keys) — key changes break all existing translations
+- Does NOT produce Unity C# code for the i18n implementation — marks as [TO BE IMPLEMENTED BY PROGRAMMER]
+
+### Case 2: Out-of-domain request — translate game dialogue
+**Input**: "Translate the following English dialogue into French: 'Well met, traveler. The road ahead is treacherous.'"
+**Expected behavior**:
+- Does not produce a French translation
+- States clearly: "localization-lead owns the pipeline, quality standards, and workflow; actual translation work is performed by human translators or approved translation vendors — I am not a translator"
+- Optionally notes what information a translator would need: context (who is speaking, to whom, game genre/tone), character limit constraints if any, glossary terms (e.g., if "traveler" has a game-specific translation)
+
+### Case 3: Domain boundary — missing plural forms in Russian locale
+**Input**: "Our Russian locale files only have a singular form for item quantity strings. Russian requires multiple plural forms (1 item, 2-4 items, 5+ items use different forms)."
+**Expected behavior**:
+- Identifies this as a locale-specific plural form gap: Russian has 3 plural categories (one, few, many) per CLDR/Unicode plural rules — a single string is insufficient
+- Flags it as a localization quality bug, not a minor style issue — incorrect plural forms are grammatically wrong and visible to players
+- Recommends the fix: update the string extraction format to support CLDR plural categories (one/few/many/other), and flag to the translation vendor that Russian strings need all plural forms
+- Notes which other languages in the pipeline also require plural form support (e.g., Polish, Czech, Arabic)
+- Does NOT suggest using a numeric threshold workaround as a substitute for proper CLDR plural support
+
+### Case 4: String key naming conflict between two systems
+**Input**: "Our UI system uses keys like 'button_confirm' and 'button_cancel'. Our dialogue system uses 'confirm' and 'cancel' for the same concepts. Translators are confused about which to use."
+**Expected behavior**:
+- Identifies the conflict: two systems use different key naming conventions for semantically identical strings, creating duplicate translation work and translator confusion
+- Produces a naming convention resolution: domain-prefixed keys with a consistent separator (e.g., `ui.button.confirm`, `ui.button.cancel`) — all systems use the same key for shared concepts
+- Recommends that shared UI primitives (Confirm, Cancel, Back, OK) use a single canonical key in a shared namespace, referenced by both systems
+- Provides a migration path: map old keys to new keys, update all string references in both systems, deprecate old keys after one release cycle
+- Does NOT recommend maintaining two separate keys for the same concept
+
+### Case 5: Context pass — pipeline accommodates RTL languages
+**Input context**: Target locales include English (en), French (fr), German (de), Arabic (ar), and Hebrew (he).
+**Input**: "Design the localization pipeline for this project."
+**Expected behavior**:
+- Identifies Arabic and Hebrew as RTL languages — explicitly calls this out as a pipeline requirement
+- Designs the pipeline to include: RTL text rendering support (flag for programmer: UI must support RTL layout mirroring), bidirectional (bidi) text handling in string tables, locale-specific testing checklist entry for RTL layout
+- Does NOT design a pipeline that only accounts for LTR languages when RTL locales are specified
+- Notes that Arabic also requires a different plural form structure (6 plural categories in CLDR) — flags for translation vendor
+- Output includes all five locales in the pipeline architecture, not just the default (en)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (pipeline, extraction, string quality, locale formats, i18n architecture)
+- [ ] Does not produce translations — redirects translation work to human translators/vendors
+- [ ] Flags locale-specific gaps (plural forms, RTL) as quality bugs requiring pipeline changes
+- [ ] Produces a unified key naming convention when conflicts arise — does not accept dual conventions
+- [ ] Incorporates all provided target locales, including RTL languages, into pipeline design
+
+---
+
+## Coverage Notes
+- Case 3 (plural forms) and Case 5 (RTL) are locale-correctness tests — these affect shipping quality in non-English markets
+- Case 4 (key naming conflict) is a pipeline hygiene test — duplicate keys cause ongoing translator confusion and cost
+- Case 5 requires the target locale list to be in context; if not provided, agent should ask before designing the pipeline
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/operations/release-manager.md b/CCGS Skill Testing Framework/agents/operations/release-manager.md
new file mode 100644
index 0000000..716a1ab
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/operations/release-manager.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: release-manager
+
+## Agent Summary
+- **Domain**: Release pipeline management, platform certification checklists (Nintendo, Sony, Microsoft, Apple, Google), store submission workflows, platform technical requirements compliance, semantic version numbering, release branch management
+- **Does NOT own**: Game design decisions, QA test strategy or test case design (qa-lead), QA test execution (qa-tester), build infrastructure (devops-engineer)
+- **Model tier**: Sonnet
+- **Gate IDs**: May be invoked by `/gate-check` during Release phase; LAUNCH BLOCKED verdict is release-manager's primary escalation output
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references release pipeline, certification, store submission)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/ directory; no game source or test tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over QA strategy, game design, or build infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — platform certification checklist for Nintendo Switch
+**Input**: "Generate the certification checklist for our Nintendo Switch submission."
+**Expected behavior**:
+- Produces a structured checklist covering Nintendo Lotcheck requirements relevant to the game type
+- Includes categories: content rating (CERO/PEGI/ESRB as applicable), save data handling, offline mode compliance, error handling (lost connectivity, storage full), controller requirement (Joy-Con, Pro Controller support), sleep/wake behavior, screenshot/video capture compliance
+- Formats output as a numbered checklist with pass/fail columns
+- Notes that Nintendo's full Lotcheck guidelines require a licensed developer account to access and flags any items that require manual verification against the current guidelines document
+- Does NOT produce fabricated requirement IDs — uses known public requirements or clearly marks uncertainty
+
+### Case 2: Out-of-domain request — design test cases
+**Input**: "Write test cases for our save system to make sure it passes certification."
+**Expected behavior**:
+- Does not produce test case specifications
+- States clearly: "Test case design is owned by qa-lead (strategy) and qa-tester (execution); I can provide the certification requirements that the save system must meet, which qa-lead can then use to design tests"
+- Optionally offers to list the save-system-relevant certification requirements
+
+### Case 3: Domain boundary — certification failure (rating issue)
+**Input**: "Our build was rejected by the ESRB. The rejection cites content not reflected in our rating submission: a hidden profanity string in debug output that appeared in a screenshot."
+**Expected behavior**:
+- Issues a LAUNCH BLOCKED verdict with the specific platform requirement referenced (ESRB submission accuracy requirement)
+- Identifies the immediate action required: locate and remove all debug output containing inappropriate content before resubmission
+- Notes the resubmission process: corrected build must be resubmitted with updated content descriptor if needed
+- Does NOT minimize the issue — a certification rejection is a blocking event, not an advisory
+- Escalates to producer: documents the delay impact on release timeline
+
+### Case 4: Version numbering conflict — hotfix vs. release branch
+**Input**: "Our release branch is at v1.2.0. A hotfix was applied directly on main and tagged v1.2.1. Now the release branch also has changes that need to ship as v1.2.1 but they're different changes."
+**Expected behavior**:
+- Identifies the conflict: two different changesets have been assigned the same version tag
+- Applies semantic versioning resolution: one must be re-tagged — the release branch changes should become v1.2.2 if v1.2.1 is already published; if v1.2.1 is not yet published, coordinate with devops-engineer to merge or re-tag
+- Does NOT accept a state where the same version number refers to two different builds
+- Notes that once a version is submitted to a store, it cannot be reused — flags this as a potential store submission blocker
+
+### Case 5: Context pass — release date constraint and certification lead time
+**Input context**: Target release date is 2026-06-01. Current date is 2026-04-06. Nintendo Lotcheck typically takes 4-6 weeks.
+**Input**: "What should we prioritize on the certification checklist given our timeline?"
+**Expected behavior**:
+- Calculates the available window: ~8 weeks to release date; Nintendo Lotcheck at 4-6 weeks means submission must be ready by approximately 2026-04-20 to 2026-05-04 to allow for a potential resubmission cycle
+- Flags that a single rejection cycle would consume the buffer — prioritizes items historically associated with Lotcheck rejections (save data, offline mode, error handling)
+- Orders the checklist by certification lead time impact, not by perceived difficulty
+- Does NOT produce a checklist that assumes first-pass certification — builds in resubmission time
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (release pipeline, certification checklists, version numbering, store submission)
+- [ ] Redirects test case design requests to qa-lead/qa-tester without producing test specs
+- [ ] Issues LAUNCH BLOCKED verdicts for certification failures — does not downgrade to advisory
+- [ ] Applies semantic versioning correctly and flags version conflicts as store-blocking issues
+- [ ] Uses provided timeline data to prioritize checklist items by certification lead time
+
+---
+
+## Coverage Notes
+- Case 3 (LAUNCH BLOCKED verdict) is the most critical test — this agent's primary safety output is blocking bad launches
+- Case 5 requires current date and release date context; verify the agent uses actual dates, not placeholder estimates
+- Certification requirements change over time — flag if the agent produces specific requirement IDs that may be outdated
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/qa/accessibility-specialist.md b/CCGS Skill Testing Framework/agents/qa/accessibility-specialist.md
new file mode 100644
index 0000000..9c4625d
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/qa/accessibility-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: accessibility-specialist
+
+## Agent Summary
+Domain: Input remapping, text scaling, colorblind modes, screen reader support, and accessibility standards compliance (WCAG, platform certifications).
+Does NOT own: overall UX flow design (ux-designer), visual art style direction (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references accessibility / inclusive design / WCAG)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the player HUD for accessibility."
+**Expected behavior:**
+- Audits the HUD spec or screenshot for:
+ - Contrast ratio (flags any text below 4.5:1 for AA or 7:1 for AAA)
+ - Alternative representation for color-coded information (e.g., enemy health bars use only color, no shape distinction)
+ - Text size (flags any text below 16px equivalent at 1080p)
+ - Screen reader or TTS annotation availability for key status elements
+- Produces a prioritized finding list with specific element names and the criteria they fail
+- Does NOT redesign the HUD — produces findings for ux-designer and ui-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the overall game flow: main menu → character select → loading → gameplay → pause → results."
+**Expected behavior:**
+- Does NOT produce UX flow architecture
+- Explicitly states that overall game flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- May note it can review the flow for accessibility concerns (e.g., time limits, cognitive load) once the flow is designed
+
+### Case 3: Colorblind mode conflict
+**Input:** "The proposed colorblind mode for deuteranopia replaces the enemy red health bars with orange, but the art palette already uses orange for friendly units."
+**Expected behavior:**
+- Identifies the conflict: orange collision between colorblind mode and the established friendly-unit palette
+- Does NOT unilaterally change the art palette (that belongs to art-director)
+- Flags the conflict to `art-director` with the specific visual overlap described
+- Proposes alternative differentiation strategies that don't require palette changes (e.g., shape/icon overlay, pattern fill, iconography)
+
+### Case 4: UI state requirement for accessibility feature
+**Input:** "Screen reader support for the inventory requires the system to expose item names and quantities as accessible text nodes."
+**Expected behavior:**
+- Produces an accessibility requirements spec defining the required accessible text properties for each inventory element
+- Identifies that implementing accessible text nodes requires UI system changes
+- Coordinates with `ui-programmer` to implement the required accessible text node exposure
+- Does NOT implement the UI system changes itself
+
+### Case 5: Context pass — WCAG 2.1 targets
+**Input:** Project accessibility target provided in context: WCAG 2.1 AA compliance. Request: "Review the dialogue system for accessibility."
+**Expected behavior:**
+- References specific WCAG 2.1 AA success criteria relevant to dialogue (e.g., 1.4.3 Contrast Minimum, 1.4.4 Resize Text, 2.2.1 Timing Adjustable for auto-advancing dialogue)
+- Uses exact criterion numbers and names from the standard, not paraphrases
+- Flags each finding with the specific criterion it fails
+- Notes which criteria are out of scope for AA (AAA-only) so they are not incorrectly flagged as failures
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (remapping, text scaling, colorblind modes, screen reader, standards compliance)
+- [ ] Redirects UX flow design to ux-designer, art palette decisions to art-director
+- [ ] Returns structured findings with specific element names, contrast ratios, and criterion references
+- [ ] Does not implement UI changes — coordinates with ui-programmer for implementation
+- [ ] References specific WCAG criteria by number when compliance target is provided
+- [ ] Flags conflicts between accessibility requirements and art decisions to art-director
+
+---
+
+## Coverage Notes
+- HUD audit (Case 1) should produce findings trackable as accessibility stories in the sprint backlog
+- Colorblind conflict (Case 3) confirms the agent respects art-director's authority over the palette
+- WCAG criteria (Case 5) verifies the agent uses standards precisely, not generically
diff --git a/CCGS Skill Testing Framework/agents/qa/qa-tester.md b/CCGS Skill Testing Framework/agents/qa/qa-tester.md
new file mode 100644
index 0000000..39fc02e
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/qa/qa-tester.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: qa-tester
+
+## Agent Summary
+- **Domain**: Detailed test case authoring, bug reports (structured format), test execution documentation, regression checklists, smoke check execution docs, test evidence recording per the project's coding standards
+- **Does NOT own**: Test strategy and test plan design (qa-lead), implementation fixes for found bugs (appropriate programmer), QA process architecture (qa-lead)
+- **Category**: qa
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags ambiguous acceptance criteria to qa-lead rather than resolving independently
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references test cases, bug reports, test execution, regression testing)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for tests/ and production/qa/evidence/; no source code editing tools)
+- [ ] Model tier is Sonnet (default for QA specialists)
+- [ ] Agent definition does not claim authority over test strategy, fix implementation, or acceptance criterion definition
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — test cases for a save system
+**Input**: "Write test cases for our save system. It must save and load player position, inventory, and quest state."
+**Expected behavior**:
+- Produces a test case list with at minimum the following test cases, each containing all four required fields:
+ - **TC-SAVE-001**: Save and load player position
+ - **TC-SAVE-002**: Save and load full inventory (multiple item types, quantities, equipped state)
+ - **TC-SAVE-003**: Save and load quest state (in-progress, completed, and locked quest states)
+ - **TC-SAVE-004**: Overwrite an existing save file
+ - **TC-SAVE-005**: Load a save file from a previous version (backward compatibility)
+ - **TC-SAVE-006**: Corrupt save file handling (file exists but is invalid)
+- Each test case includes: **Precondition** (required game state before test), **Steps** (numbered, unambiguous), **Expected Result** (specific, observable outcome), **Pass Criteria** (binary pass/fail condition)
+- Does NOT write "verify the save works" as a pass criterion — criteria must be observable and unambiguous
+
+### Case 2: Out-of-domain request — implement a bug fix
+**Input**: "You found a bug where the save system loses inventory data on version mismatch. Please fix it."
+**Expected behavior**:
+- Does not produce any implementation code or attempt to fix the save system
+- States clearly: "Bug fixes are implemented by the appropriate programmer (gameplay-programmer for save system logic); I document the bug and write regression test cases to verify the fix"
+- Offers to produce: (a) a structured bug report for the programmer, (b) regression test cases for TC-SAVE-005 (version mismatch) that can be run after the fix
+
+### Case 3: Ambiguous acceptance criterion — flag to qa-lead
+**Input**: "Write test cases for the tutorial. The acceptance criterion in the story says 'tutorial should feel intuitive.'"
+**Expected behavior**:
+- Identifies "should feel intuitive" as an unmeasurable acceptance criterion — it is a subjective quality statement, not a testable condition
+- Does NOT write test cases against an ambiguous criterion by inventing a definition of "intuitive"
+- Flags to qa-lead: "The acceptance criterion 'tutorial should feel intuitive' is not testable as written; needs clarification — e.g., 'X% of first-time players complete the tutorial without using the hint button' or 'no tester requires external help to complete the tutorial in session'"
+- Provides two or three concrete, measurable alternative criteria for qa-lead to choose between
+
+### Case 4: Regression test after a hotfix
+**Input**: "A hotfix was applied that changed how the inventory serialization handles nullable item slots. Write a targeted regression checklist for the affected systems."
+**Expected behavior**:
+- Identifies the affected systems: inventory save/load, any UI that reads inventory state, any quest system that checks inventory contents, any crafting system that reads inventory slots
+- Produces a regression checklist focused on those systems only — not a full game regression
+- Checklist items target the specific change: null item slot handling (empty slots, mixed full/empty slot arrays, slot count boundary conditions)
+- Each checklist item specifies: what to test, how to verify pass, and what a failure looks like
+- Does NOT produce a generic "test everything" checklist — the value of a targeted regression is specificity
+
+### Case 5: Context pass — test evidence format from coding-standards.md
+**Input context**: coding-standards.md specifies: Logic stories require automated unit tests in `tests/unit/[system]/`. Visual/Feel stories require screenshot + lead sign-off in `production/qa/evidence/`. UI stories require manual walkthrough doc in `production/qa/evidence/`.
+**Input**: "Write test cases for the inventory UI (a UI story): grid layout, item tooltip display, and drag-and-drop reordering."
+**Expected behavior**:
+- Classifies this correctly as a UI story per the provided standards
+- Produces a manual walkthrough test document (not automated unit tests) — because the coding standard specifies manual walkthrough for UI stories
+- Specifies the output location: `production/qa/evidence/` (not `tests/unit/`)
+- Test cases include: grid layout verification (all items appear, no overflow), tooltip display (correct item name, stats, description appear on hover/focus), and drag-and-drop (item moves to target slot, original slot becomes empty, slot limits respected)
+- Notes that this is ADVISORY evidence level per the coding standards, not BLOCKING — explicitly states this so the team knows the gate level
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (test case authoring, bug reports, test execution documentation, regression checklists)
+- [ ] Redirects bug fix requests to appropriate programmers and offers to document the bug and write regression tests
+- [ ] Flags ambiguous acceptance criteria to qa-lead rather than inventing a testable interpretation
+- [ ] Produces targeted regression checklists (system-specific) not full-game regression passes
+- [ ] Uses the correct test evidence format and output location per coding-standards.md
+
+---
+
+## Coverage Notes
+- Case 1 (test case completeness) is the foundational quality test — missing fields (precondition, steps, expected result, pass criteria) are a failure
+- Case 3 (ambiguous criterion) is a coordination test — qa-tester must not silently accept untestable criteria
+- Case 5 requires coding-standards.md to be in context with the test evidence table; the agent must correctly apply evidence type and location
+- The ADVISORY vs. BLOCKING gate level (Case 5) is a detail that affects story completion — verify the agent reports it
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/qa/security-engineer.md b/CCGS Skill Testing Framework/agents/qa/security-engineer.md
new file mode 100644
index 0000000..a058eee
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/qa/security-engineer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: security-engineer
+
+## Agent Summary
+Domain: Anti-cheat systems, save data security, network security, vulnerability assessment, and data privacy compliance.
+Does NOT own: game logic design (gameplay-programmer), server infrastructure (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references anti-cheat / security / vulnerability assessment)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game logic design or server deployment
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the save data system for security issues."
+**Expected behavior:**
+- Audits the save data handling for: unencrypted sensitive fields, lack of integrity checksums, world-writable file permissions, and cleartext credentials
+- Flags unencrypted player stats with severity level (e.g., MEDIUM — enables offline stat manipulation)
+- Recommends: AES-256 encryption for sensitive fields, HMAC checksum for tamper detection
+- Produces a prioritized finding list (CRITICAL / HIGH / MEDIUM / LOW)
+- Does NOT change the save system code directly — produces findings for gameplay-programmer or engine-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the matchmaking algorithm to pair players by skill rating."
+**Expected behavior:**
+- Does NOT produce matchmaking algorithm design
+- Explicitly states that matchmaking design belongs to `network-programmer`
+- Redirects the request to `network-programmer`
+- May note it can review the matchmaking system for security vulnerabilities (e.g., rating manipulation) once the design is complete
+
+### Case 3: Critical vulnerability — SQL injection
+**Input:** (Hypothetical) "Review this server-side query handler: `query = 'SELECT * FROM users WHERE id=' + user_input`"
+**Expected behavior:**
+- Flags this as a CRITICAL vulnerability (SQL injection via unsanitized user input)
+- Provides immediate remediation: parameterized queries / prepared statements
+- Recommends a security review of all other query-construction code in the codebase
+- Escalates to `technical-director` given CRITICAL severity — does not leave the finding unescalated
+
+### Case 4: Security vs. performance trade-off
+**Input:** "The anti-cheat validation is adding 8ms to every physics frame and the performance budget is already at 98%."
+**Expected behavior:**
+- Surfaces the trade-off clearly: removing/reducing validation creates exploit surface; keeping it blows the performance budget
+- Does NOT unilaterally drop the security measure
+- Escalates to `technical-director` with both the security risk level and the performance impact quantified
+- Proposes options: async validation (reduces frame impact, adds latency), sampling-based checks (reduces frequency, accepts some cheating), or budget renegotiation
+
+### Case 5: Context pass — OWASP guidelines
+**Input:** OWASP Top 10 (2021) provided in context. Request: "Audit the game's login and account system."
+**Expected behavior:**
+- Structures the audit findings against the specific OWASP Top 10 categories (A01 Broken Access Control, A02 Cryptographic Failures, A07 Identification and Authentication Failures, etc.)
+- References specific control IDs from the provided list rather than generic advice
+- Flags each finding with the relevant OWASP category
+- Produces a compliance gap list: which controls are met, which are missing, which are partial
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (anti-cheat, save security, network security, vulnerability assessment)
+- [ ] Redirects matchmaking / game logic requests to appropriate agents
+- [ ] Returns structured findings with severity classification (CRITICAL / HIGH / MEDIUM / LOW)
+- [ ] Does not implement fixes unilaterally — produces findings for the responsible programmer
+- [ ] Escalates CRITICAL findings to technical-director immediately
+- [ ] References specific standards (OWASP, GDPR, etc.) when provided in context
+
+---
+
+## Coverage Notes
+- Save data audit (Case 1) confirms the agent produces actionable, prioritized findings not generic advice
+- CRITICAL vulnerability escalation (Case 3) verifies the agent's severity classification and escalation path
+- Performance trade-off (Case 4) confirms the agent does not silently drop security measures to hit a budget
diff --git a/CCGS Skill Testing Framework/agents/specialists/ai-programmer.md b/CCGS Skill Testing Framework/agents/specialists/ai-programmer.md
new file mode 100644
index 0000000..05caa0c
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/ai-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ai-programmer
+
+## Agent Summary
+Domain: NPC behavior, state machines, pathfinding, perception systems, and AI decision-making.
+Does NOT own: player mechanics (gameplay-programmer), rendering or engine internals (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references NPC behavior / AI systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over player mechanics or engine rendering
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a patrol-and-alert behavior tree for a guard NPC: patrol between waypoints, detect the player within 10 units, then enter an alert state and pursue."
+**Expected behavior:**
+- Produces a behavior tree spec (nodes: Selector, Sequence, Leaf actions) plus corresponding code scaffold
+- Defines clearly named states: Patrol, Alert, Pursue
+- Uses a perception/detection check as a condition node, not inline in movement code
+- Waypoints are data-driven (passed as a resource or export), not hardcoded positions
+- Output includes doc comments on public API
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement player input handling for the WASD movement and dash ability."
+**Expected behavior:**
+- Does NOT produce player input or movement code
+- Explicitly states this is outside its domain (player mechanics belong to gameplay-programmer)
+- Redirects the request to `gameplay-programmer`
+- May note that once player position is available via API, AI perception can reference it
+
+### Case 3: Cross-domain coordination — level constraints
+**Input:** "Design pathfinding for the warehouse level, but the level has narrow corridors that confuse the navmesh."
+**Expected behavior:**
+- Does NOT unilaterally modify level layout or navmesh assets
+- Coordinates with `level-designer` to clarify navmesh requirements and corridor dimensions
+- Proposes a pathfinding approach (e.g., navmesh with agent radius tuning, flow fields) conditional on level geometry
+- Documents assumptions and flags blockers clearly
+
+### Case 4: Performance escalation — custom data structures
+**Input:** "The pathfinding priority queue is the bottleneck; I need a custom binary heap implementation for performance."
+**Expected behavior:**
+- Recognizes that a low-level, engine-integrated data structure is within engine-programmer's domain
+- Escalates to `engine-programmer` with a clear description of the bottleneck and required interface
+- May provide the algorithmic spec (binary heap interface, expected operations) to guide the engine-programmer
+- Does NOT implement the low-level structure unilaterally if it requires engine memory management
+
+### Case 5: Context pass — uses level layout for pathfinding design
+**Input:** Level layout document provided in context showing two choke points: a doorway at (12, 0) and a bridge at (40, 5). Request: "Design the patrol route and threat response for enemies in this level."
+**Expected behavior:**
+- References the specific choke point coordinates from the provided context
+- Designs patrol routes that leverage the choke points as tactical positions
+- Specifies alert state transitions that funnel NPCs toward identified choke points during pursuit
+- Does not invent geometry not present in the provided layout document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (NPC behavior, pathfinding, perception, state machines)
+- [ ] Redirects out-of-domain requests to correct agent (gameplay-programmer, engine-programmer, level-designer)
+- [ ] Returns structured findings (behavior tree specs, state machine diagrams, code scaffolds)
+- [ ] Does not modify player mechanics files without explicit delegation
+- [ ] Escalates performance-critical low-level structures to engine-programmer
+- [ ] Uses data-driven NPC configuration (waypoints, detection radii) not hardcoded values
+
+---
+
+## Coverage Notes
+- Behavior tree output (Case 1) should be validated by a unit test in `tests/unit/ai/`
+- Level-layout context (Case 5) verifies the agent reads and applies provided documents rather than inventing
+- Performance escalation (Case 4) confirms the agent recognizes the engine-programmer boundary
diff --git a/CCGS Skill Testing Framework/agents/specialists/engine-programmer.md b/CCGS Skill Testing Framework/agents/specialists/engine-programmer.md
new file mode 100644
index 0000000..4cf84b6
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/engine-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: engine-programmer
+
+## Agent Summary
+Domain: Rendering pipeline, physics integration, memory management, resource loading, and core engine framework.
+Does NOT own: gameplay mechanics (gameplay-programmer), editor/debug tool UI (tools-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rendering / memory / engine core)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay mechanics or tool UI
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a custom object pool for projectiles to avoid per-frame allocation."
+**Expected behavior:**
+- Produces an engine-level object pool implementation with acquire/release interface
+- Pool is typed to the projectile object type, uses pre-allocated fixed-size storage
+- Provides thread-safety notes (or clearly marks as single-threaded-only with rationale)
+- Includes doc comments on the public API per coding standards
+- Output is compatible with the project's configured engine and language
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Add a pause menu screen with volume sliders and a 'back to main menu' button."
+**Expected behavior:**
+- Does NOT produce UI screen code
+- Explicitly states that menu screens belong to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- May note it can provide engine-level audio volume API endpoints for the ui-programmer to call
+
+### Case 3: Memory leak diagnosis
+**Input:** "Memory usage grows by ~50MB per level load and never releases. We suspect the resource loading system."
+**Expected behavior:**
+- Produces a systematic diagnosis approach: reference counting audit, resource handle lifecycle check, cache invalidation review
+- Identifies likely causes (orphaned resource handles, circular references, cache that never evicts)
+- Produces a concrete fix for the identified leak pattern
+- Provides a test to verify the fix (memory baseline before load, measure after unload, confirm return to baseline)
+
+### Case 4: Cross-domain coordination — shared system optimization
+**Input:** "I need to optimize the physics broadphase, but the gameplay system is tightly coupled to the physics query API."
+**Expected behavior:**
+- Does NOT unilaterally change the physics query API surface (would break gameplay-programmer's code)
+- Coordinates with `lead-programmer` to plan the change safely
+- Proposes a migration path: new optimized API alongside old API, with a deprecation period
+- Documents the coordination requirement before proceeding
+
+### Case 5: Context pass — checks engine version reference
+**Input:** Engine version reference (Godot 4.6) provided in context. Request: "Set up the default physics engine for the project."
+**Expected behavior:**
+- Reads the engine version reference and notes Godot 4.6 change: Jolt physics is now the default
+- Produces configuration guidance that accounts for the Jolt-as-default change (4.6 migration note)
+- Flags any API differences between GodotPhysics and Jolt that could affect existing code
+- Does NOT suggest deprecated or pre-4.6 physics setup steps without noting they apply to older versions
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (rendering, physics, memory, resource loading, core framework)
+- [ ] Redirects UI/menu requests to ui-programmer
+- [ ] Returns structured findings (implementation code, diagnosis steps, migration plans)
+- [ ] Coordinates with lead-programmer before changing shared API surfaces
+- [ ] Checks engine version reference before suggesting engine-specific APIs
+- [ ] Provides test evidence for fixes (memory before/after, performance measurements)
+
+---
+
+## Coverage Notes
+- Object pool (Case 1) must include a unit test in `tests/unit/engine/`
+- Memory leak diagnosis (Case 3) should produce evidence artifacts in `production/qa/evidence/`
+- Engine version check (Case 5) confirms the agent treats VERSION.md as authoritative, not LLM training data
diff --git a/CCGS Skill Testing Framework/agents/specialists/gameplay-programmer.md b/CCGS Skill Testing Framework/agents/specialists/gameplay-programmer.md
new file mode 100644
index 0000000..bb78655
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/gameplay-programmer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: gameplay-programmer
+
+## Agent Summary
+Domain: Game mechanics code, player systems, combat implementation, and interactive features.
+Does NOT own: UI implementation (ui-programmer), AI behavior trees (ai-programmer), engine/rendering systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references game mechanics / player systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep — excludes tools only needed by orchestration agents
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI, AI behavior, or engine/rendering code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a melee combo system where three consecutive light attacks chain into a finisher."
+**Expected behavior:**
+- Produces code or a code scaffold following the project's language (GDScript/C#) and coding standards
+- Defines combo state tracking, input window timing, and finisher trigger logic as separate, testable methods
+- References the relevant GDD section if one is provided in context
+- Does NOT implement UI feedback (delegates to ui-programmer) or AI reaction (delegates to ai-programmer)
+- Output includes doc comments on all public methods per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Build the main menu screen with pause and settings panels."
+**Expected behavior:**
+- Does NOT produce menu implementation code
+- Explicitly states this is outside its domain
+- Redirects the request to `ui-programmer`
+- May note that if the pause menu requires reading gameplay state it can provide the state API surface
+
+### Case 3: Domain boundary — threading flag
+**Input:** "The combo system is causing frame stutters; can you add threading to spread the input processing?"
+**Expected behavior:**
+- Does NOT unilaterally implement threading or async systems
+- Flags the threading concern to `engine-programmer` with a clear description of the hot path
+- May produce a non-threaded refactor to reduce work per frame as a safe interim step
+- Documents the escalation so lead-programmer is aware
+
+### Case 4: Conflict with an Accepted ADR
+**Input:** "Change the damage calculation to use floating-point accumulation directly instead of the fixed-point formula in ADR-003."
+**Expected behavior:**
+- Identifies that the proposed change violates ADR-003 (Accepted status)
+- Does NOT silently implement the violation
+- Flags the conflict to `lead-programmer` with the ADR reference and the trade-off described
+- Will implement only after explicit override decision from lead-programmer or technical-director
+
+### Case 5: Context pass — implements to GDD spec
+**Input:** GDD for "PlayerCombat" provided in context. Request: "Implement the stamina drain formula from the combat GDD."
+**Expected behavior:**
+- Reads the formula section of the provided GDD
+- Implements the exact formula as written — does NOT invent new variables or adjust coefficients
+- Makes stamina drain a data-driven value (external config), not a hardcoded constant
+- Notes any edge cases from the GDD's edge-cases section and handles them in code
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (mechanics, player systems, combat)
+- [ ] Redirects out-of-domain requests to correct agent (ui-programmer, ai-programmer, engine-programmer)
+- [ ] Returns structured findings (code scaffold, method signatures, inline comments) not freeform opinions
+- [ ] Does not modify files outside `src/gameplay/` or `src/core/` without explicit delegation
+- [ ] Flags ADR violations rather than overriding them silently
+- [ ] Makes gameplay values data-driven, never hardcoded
+
+---
+
+## Coverage Notes
+- Combo system test (Case 1) should be validated with a unit test in `tests/unit/gameplay/`
+- Threading escalation (Case 3) verifies the agent does not over-reach into engine territory
+- ADR conflict (Case 4) confirms the agent respects the architecture governance process
+- Cases 1 and 5 together verify the agent implements to spec rather than improvising
diff --git a/CCGS Skill Testing Framework/agents/specialists/network-programmer.md b/CCGS Skill Testing Framework/agents/specialists/network-programmer.md
new file mode 100644
index 0000000..082fdee
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/network-programmer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: network-programmer
+
+## Agent Summary
+Domain: Multiplayer networking, state replication, lag compensation, matchmaking protocol design, and network message schemas.
+Does NOT own: gameplay logic (only the networking of it), server infrastructure and deployment (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references multiplayer / replication / networking)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay logic or server deployment infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design state replication for player position in a 4-player co-op game."
+**Expected behavior:**
+- Produces a sync strategy document covering:
+ - Replication frequency (e.g., 20Hz with delta compression)
+ - Priority tier (e.g., own-player high priority, other players medium)
+ - Interpolation approach for remote players (e.g., linear interpolation with 100ms buffer)
+ - Bandwidth estimate per player per second
+- Does NOT implement the player movement logic itself (defers to gameplay-programmer)
+- Proposes dead-reckoning or prediction strategy to reduce visible lag
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Deploy our game server to AWS EC2 and set up auto-scaling."
+**Expected behavior:**
+- Does NOT produce server deployment configuration, Terraform, or AWS setup scripts
+- Explicitly states that server infrastructure belongs to `devops-engineer`
+- Redirects the request to `devops-engineer`
+- May note it can provide the network protocol spec the server needs to implement once infrastructure is set up
+
+### Case 3: State divergence — rollback/reconciliation
+**Input:** "Under high latency, clients are diverging from the authoritative server state for physics objects."
+**Expected behavior:**
+- Proposes a rollback-and-reconciliation approach (client-side prediction + server authoritative correction)
+- Specifies the state snapshot format, reconciliation trigger threshold (e.g., >5 units position error), and correction interpolation speed
+- Notes the input buffer pattern for deterministic replay
+- Does NOT change the physics simulation itself — documents the interface contract for engine-programmer
+
+### Case 4: Anti-cheat conflict
+**Input:** "We want client-authoritative position for smooth movement, but anti-cheat requires server validation."
+**Expected behavior:**
+- Surfaces the direct conflict: client-authority is fast but exploitable; server-authority is secure but requires latency compensation
+- Coordinates with `security-engineer` to agree on the validation boundary
+- Proposes a compromise (server validates position within a tolerance band, flags outliers) rather than unilaterally deciding
+- Documents the trade-off and escalates the final decision to `technical-director` if security-engineer and network-programmer cannot agree
+
+### Case 5: Context pass — latency budget
+**Input:** Technical preferences provided in context: target latency 80ms RTT for 95th percentile players. Request: "Design the input replication scheme for a fighting game."
+**Expected behavior:**
+- References the 80ms RTT budget explicitly in the design
+- Selects replication approach calibrated to that budget (e.g., rollback netcode is preferred for fighting games at this latency)
+- Specifies input delay frames calculated from the 80ms budget (e.g., 2 frames at 60fps = 33ms buffer)
+- Flags that rollback netcode requires gameplay-programmer to implement deterministic simulation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (replication, lag compensation, protocol design, matchmaking)
+- [ ] Redirects server deployment to devops-engineer
+- [ ] Returns structured findings (sync strategies, protocol specs, bandwidth estimates)
+- [ ] Does not implement gameplay logic — only specifies the network contract for it
+- [ ] Coordinates with security-engineer on anti-cheat boundaries
+- [ ] Designs to explicit latency targets from provided context
+
+---
+
+## Coverage Notes
+- Replication strategy (Case 1) should include a bandwidth calculation reviewable by technical-director
+- Rollback/reconciliation (Case 3) must document the engine-programmer interface contract clearly
+- Anti-cheat conflict (Case 4) confirms the agent escalates rather than unilaterally deciding security trade-offs
diff --git a/CCGS Skill Testing Framework/agents/specialists/performance-analyst.md b/CCGS Skill Testing Framework/agents/specialists/performance-analyst.md
new file mode 100644
index 0000000..c442a6a
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/performance-analyst.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: performance-analyst
+
+## Agent Summary
+Domain: Profiling, bottleneck identification, performance metrics tracking, and optimization recommendations.
+Does NOT own: implementing optimizations (belongs to the appropriate programmer for that domain).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references profiling / bottleneck analysis / performance metrics)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over implementing any optimization — explicitly identifies itself as analysis/recommendation only
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Analyze this frame time data: CPU 14ms, GPU 8ms, physics 6ms, draw calls 420, scripts 3ms."
+**Expected behavior:**
+- Identifies the primary bottleneck: CPU is over a 16.67ms (60fps) budget at 14ms total
+- Breaks down contributors: physics (6ms, 43% of CPU time) is the top culprit
+- Draw calls (420) flags as a secondary concern if the budget limit is lower (e.g., 200 draw calls per technical-preferences.md)
+- Produces a prioritized bottleneck report:
+ 1. Physics — 6ms, reduce simulation frequency or switch broadphase algorithm
+ 2. Draw calls — 420, implement batching or LOD
+ 3. Scripts — 3ms, profile hot paths
+- Does NOT implement any of these optimizations
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the batching optimization to reduce draw calls from 420 to under 200."
+**Expected behavior:**
+- Does NOT produce implementation code for batching
+- Explicitly states that implementing optimizations belongs to the appropriate programmer (engine-programmer for rendering batching)
+- Redirects the implementation to `engine-programmer` with the recommendation context attached
+- May produce a requirements brief for the optimization so engine-programmer has a clear target
+
+### Case 3: Regression identification
+**Input:** "Performance dropped significantly after last week's commits. Frame time went from 10ms to 18ms."
+**Expected behavior:**
+- Proposes a bisection strategy to identify the offending commit range
+- Requests or reviews the diff of commits in the window to narrow the likely cause
+- Identifies affected systems based on what changed (e.g., if physics code was modified, points to physics as the primary suspect)
+- Produces a regression report naming the probable commit, the affected system, and the measured delta
+
+### Case 4: Recommendation vs. code quality trade-off
+**Input:** "The fastest optimization for the script bottleneck would be to inline all calls and remove abstraction layers."
+**Expected behavior:**
+- Surfaces the trade-off: inlining improves performance but reduces testability and violates the coding standard requiring unit-testable public methods
+- Does NOT recommend the optimization without noting the code quality cost
+- Escalates the trade-off to `lead-programmer` for a decision
+- May propose a middle path (e.g., profile-guided inlining of only the hottest 2–3 methods) that preserves testability
+
+### Case 5: Context pass — technical-preferences.md budget
+**Input:** Technical preferences from context: Target 60fps, frame budget 16.67ms, draw calls max 200, memory ceiling 512MB. Request: "Review the current build profile."
+**Expected behavior:**
+- References the specific values from the provided context: 16.67ms, 200 draw calls, 512MB
+- Compares current measurements against each threshold explicitly
+- Labels each metric as WITHIN BUDGET / AT RISK / OVER BUDGET based on the provided numbers
+- Does NOT use different budget numbers than those provided in the context
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (profiling, analysis, recommendations — not implementation)
+- [ ] Redirects optimization implementation to the correct programmer domain agent
+- [ ] Returns structured findings (bottleneck report with severity, measured values, and recommended action owner)
+- [ ] Escalates code-quality trade-offs to lead-programmer rather than deciding unilaterally
+- [ ] Applies budget thresholds from provided context rather than assumed defaults
+- [ ] Labels all findings with a specific action owner (who should implement the fix)
+
+---
+
+## Coverage Notes
+- Frame time analysis (Case 1) output should be structured as a report filed in `production/qa/evidence/`
+- Regression case (Case 3) confirms the agent investigates cause, not just measures symptoms
+- Code quality trade-off (Case 4) verifies the agent does not recommend optimizations that violate coding standards without flagging the conflict
diff --git a/CCGS Skill Testing Framework/agents/specialists/prototyper.md b/CCGS Skill Testing Framework/agents/specialists/prototyper.md
new file mode 100644
index 0000000..11bba80
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/prototyper.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: prototyper
+
+## Agent Summary
+- **Domain**: Rapid throwaway prototypes in the `prototypes/` directory, concept validation experiments, mechanical feasibility tests. Standards intentionally relaxed for speed — prototypes are not production code.
+- **Does NOT own**: Production source code in `src/` (gameplay-programmer), design documents (game-designer), production-grade architecture decisions (lead-programmer / technical-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces recommendation docs after prototype conclusion; does not participate in phase gates
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rapid prototyping, prototypes/ directory, throwaway code)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write scoped to prototypes/ directory; no production src/ write access)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition explicitly states that prototype code is not production code and must not be copied to src/
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — prototype a card-drawing mechanic
+**Input**: "Prototype a card-drawing mechanic in 2 hours. The core question: does drawing 3 cards per turn with hand-size limit of 7 feel good? I need something to test in a playtest today."
+**Expected behavior**:
+- Produces a minimal working prototype written in the project's engine scripting language, scoped to `prototypes/card-draw-mechanic/`
+- Code prioritizes speed over correctness: no unit tests, no doc comments required, global state is acceptable for a prototype
+- Implements the minimal viable mechanic: a deck, a draw function (draw N cards), a hand container with a size limit, and a simple UI or debug print to verify state
+- Does NOT implement production patterns (dependency injection, signals, data-driven config) unless they take less time than not using them
+- Includes a `README.md` in the prototype folder: hypothesis being tested, how to run, what to observe in the playtest
+
+### Case 2: Out-of-domain request — production-grade implementation
+**Input**: "The card mechanic prototype worked great. Now write the production implementation of the card system for src/gameplay/cards/."
+**Expected behavior**:
+- Does not write production code to `src/`
+- States clearly: "Prototyper produces throwaway code in prototypes/ to validate concepts; production implementation of validated mechanics is handled by gameplay-programmer"
+- Offers to produce a transition document: what the prototype proved, what the production implementation should preserve (the mechanic), and what it should discard (the throwaway implementation patterns)
+- Does NOT copy the prototype code into src/ or suggest it as a starting point without warning about its non-production quality
+
+### Case 3: Prototype validates the mechanic — recommendation output
+**Input**: "The card-draw prototype playtested well. Three sessions all enjoyed drawing 3 cards/turn with hand limit 7. No confusion observed. What's next?"
+**Expected behavior**:
+- Produces a prototype conclusion document in `prototypes/card-draw-mechanic/conclusion.md` (or equivalent)
+- Document includes: hypothesis that was tested, playtest method (sessions, duration, observer notes), result verdict (VALIDATED), key findings (what worked, any minor issues observed), recommendation for production (specific mechanic parameters to preserve: 3 cards/turn, hand limit 7), and a flag to route the production implementation request to gameplay-programmer
+- Does NOT begin writing production code
+- Output is structured as a decision-ready recommendation, not a narrative summary
+
+### Case 4: Prototype reveals the mechanic is unworkable — abandonment note
+**Input**: "The prototype for the physics-based lock-picking mechanic is done. After 4 playtest sessions, all testers found it frustrating — too much precision required, not fun. One tester rage-quit."
+**Expected behavior**:
+- Produces a prototype abandonment note in `prototypes/lock-picking-physics/conclusion.md`
+- Document includes: hypothesis that was tested, result verdict (ABANDONED), specific reasons (precision barrier too high, negative emotional response, rage-quit incident as evidence), and a recommendation for alternative approaches to explore (simplified key-tumbler mechanic, rhythm-based alternative, removal of the mechanic entirely)
+- Does NOT recommend persisting with the prototype mechanic because of sunk cost
+- Does NOT mark the result as inconclusive — after 4 sessions with consistent negative responses, abandonment is the correct verdict
+
+### Case 5: Context pass — using the project's engine scripting language
+**Input context**: Project uses Godot 4.6 with GDScript (configured in technical-preferences.md).
+**Input**: "Prototype a basic grid movement system — player clicks a tile and the character moves to it."
+**Expected behavior**:
+- Produces the prototype in GDScript — not Python, C#, or pseudocode
+- Uses Godot 4.6 node types appropriate for a grid: TileMap or a custom grid manager node, CharacterBody2D or Node2D for the player
+- Does NOT apply production coding standards (no required test coverage, no doc comments, global state acceptable)
+- Writes the output to `prototypes/grid-movement/` not to `src/`
+- If a Godot 4.6 API is uncertain (given the LLM knowledge cutoff noted in VERSION.md), flags the specific API with a note to verify against the Godot 4.6 docs
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (prototypes/ directory only; throwaway code for concept validation)
+- [ ] Redirects production implementation requests to gameplay-programmer with a transition document offer
+- [ ] Produces structured conclusion documents (VALIDATED or ABANDONED verdict) after prototype evaluation
+- [ ] Does not recommend preserving prototype code in production form without explicit warnings
+- [ ] Uses the project's configured engine and scripting language; flags version uncertainty
+
+---
+
+## Coverage Notes
+- Case 2 (production redirect) is critical — prototype code leaking into src/ is a common quality problem
+- Case 4 (abandonment honesty) tests whether the agent avoids sunk-cost bias — prototypes that fail should be cleanly abandoned
+- Case 5 requires that technical-preferences.md has the engine and language configured; test is incomplete if not configured
+- The intentional relaxation of coding standards is a feature, not a gap — do not flag missing tests or doc comments as failures in prototype output
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/specialists/sound-designer.md b/CCGS Skill Testing Framework/agents/specialists/sound-designer.md
new file mode 100644
index 0000000..4ae0b15
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/sound-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: sound-designer
+
+## Agent Summary
+Domain: SFX specs, audio events, mixing parameters, and sound category definitions.
+Does NOT own: music composition direction (audio-director), code implementation of audio systems.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references SFX / audio events / mixing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep — does NOT include engine code execution tools
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over music direction or audio code implementation
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an SFX spec for a sword swing attack."
+**Expected behavior:**
+- Produces a complete audio event spec including:
+ - Event name (e.g., `sfx_combat_sword_swing`)
+ - Variation count (minimum 3 to avoid repetition fatigue)
+ - Pitch range (e.g., ±8% randomization)
+ - Volume range and normalization target (e.g., -12 dBFS)
+ - Sound category (e.g., `combat_sfx`)
+ - Suggested layering notes (whoosh layer + impact transient)
+- Output follows the project audio naming convention if one is established
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Compose a looping ambient music track for the forest level."
+**Expected behavior:**
+- Does NOT produce music composition direction or a music brief
+- Explicitly states that music direction belongs to `audio-director`
+- Redirects the request to `audio-director`
+- May note it can provide an SFX ambience layer spec (wind, wildlife) to complement the music once the music direction is set
+
+### Case 3: Dynamic parameter — falloff curve spec
+**Input:** "The sword swing SFX needs distance falloff so it sounds different across the arena."
+**Expected behavior:**
+- Produces a spec for the dynamic parameter including:
+ - Parameter name (e.g., `distance` or `listener_distance`)
+ - Falloff curve type (e.g., logarithmic, linear, custom)
+ - Near/far distance thresholds with corresponding volume and high-frequency attenuation values
+ - Occlusion override behavior if applicable
+- Does NOT write the audio engine integration code (defers to the appropriate programmer)
+
+### Case 4: Naming convention conflict
+**Input:** "Add a new SFX event called `SWORD_HIT_1` for the melee system."
+**Expected behavior:**
+- Identifies that `SWORD_HIT_1` conflicts with the established event naming convention (snake_case with category prefix, e.g., `sfx_combat_sword_hit`)
+- Does NOT silently register the non-conforming name
+- Flags the conflict to `audio-director` with the proposed compliant alternative
+- Will proceed with the corrected name once confirmed by audio-director
+
+### Case 5: Context pass — uses audio style guide
+**Input:** Audio style guide provided in context specifying: "gritty, grounded, no reverb tails over 1.5s, reference: The Witcher 3 combat audio." Request: "Create SFX specs for the full melee combat suite."
+**Expected behavior:**
+- References the "gritty, grounded" tone descriptor in the spec rationale
+- Caps all reverb tail specifications at 1.5 seconds as stated
+- Notes the reference material (The Witcher 3) as a benchmark for mix levels and transient design
+- Does NOT produce specs that contradict the style guide (e.g., no ethereal or heavily reverb-processed specs)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (SFX specs, event definitions, mixing parameters)
+- [ ] Redirects music direction requests to audio-director
+- [ ] Returns structured audio event specs (event name, variations, pitch, volume, category)
+- [ ] Does not produce code for audio system implementation
+- [ ] Flags naming convention violations rather than silently accepting non-conforming names
+- [ ] References provided style guides and constraints in all spec output
+
+---
+
+## Coverage Notes
+- SFX spec format (Case 1) should match whatever event schema the audio middleware (Wwise/FMOD/built-in) requires
+- Falloff curve (Case 3) verifies the agent produces implementation-ready parameter specs
+- Style guide compliance (Case 5) confirms the agent reads provided context and constrains output accordingly
diff --git a/CCGS Skill Testing Framework/agents/specialists/technical-artist.md b/CCGS Skill Testing Framework/agents/specialists/technical-artist.md
new file mode 100644
index 0000000..4f075bb
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/technical-artist.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: technical-artist
+
+## Agent Summary
+Domain: Shaders, VFX, rendering optimization, art pipeline tools, and visual performance.
+Does NOT own: art style decisions or color palette (art-director), gameplay code (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references shaders / VFX / rendering)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over art style direction or gameplay logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a dissolve effect shader for enemy death sequences."
+**Expected behavior:**
+- Produces shader code or a Shader Graph node spec appropriate to the configured engine (Godot shading language / Unity Shader Graph / Unreal Material Blueprint)
+- Defines a `dissolve_amount` uniform (0.0–1.0) as the animation driver
+- Uses a noise texture sample to determine the dissolve threshold
+- Notes edge-lighting technique as an optional enhancement
+- Output is engine-version-aware (checks version reference if post-cutoff APIs are needed)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Define the art bible color palette: primary, secondary, and accent colors for the UI."
+**Expected behavior:**
+- Does NOT produce color palette decisions or art direction documents
+- Explicitly states that art style decisions belong to `art-director`
+- Redirects the request to `art-director`
+- May note it can later implement a color-grading or palette LUT shader once the palette is decided
+
+### Case 3: Performance warning — GPU particle count
+**Input:** "The VFX system is triggering a GPU particle count warning at 50,000 particles in the explosion pool."
+**Expected behavior:**
+- Produces an optimization spec addressing the specific warning
+- Proposes concrete strategies: particle budget caps per emitter, LOD-based particle reduction, GPU instancing, or switching to mesh-based VFX for distant effects
+- Provides before/after GPU cost estimates where calculable
+- Does NOT change gameplay behavior of the explosion (delegates any gameplay impact to gameplay-programmer)
+
+### Case 4: Engine version compatibility
+**Input:** "Use the new texture sampler API for the water shader."
+**Expected behavior:**
+- Checks the engine version reference (e.g., `docs/engine-reference/godot/VERSION.md`) before suggesting any API
+- Flags if the requested API is post-cutoff (e.g., Godot 4.4+ texture type changes)
+- Provides the correct syntax for the project's pinned engine version
+- If uncertain about post-cutoff behavior, explicitly states the uncertainty and directs to verified docs
+
+### Case 5: Context pass — uses performance budget
+**Input:** Performance budget from `technical-preferences.md` provided in context: 2ms GPU frame budget, max 200 draw calls. Request: "Optimize the forest rendering system."
+**Expected behavior:**
+- References the specific 2ms GPU budget and 200 draw call limit from the provided context
+- Proposes optimizations calibrated to those exact targets (e.g., "batching reduces draw calls from 340 to ~180, within the 200 limit")
+- Does NOT propose optimizations that would exceed the stated budgets in other dimensions
+- Produces a ranked list of optimizations by expected impact vs. implementation cost
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (shaders, VFX, rendering optimization, art pipeline)
+- [ ] Redirects art style decisions to art-director
+- [ ] Returns structured findings (shader code, optimization specs with metrics, node graphs)
+- [ ] Does not modify gameplay code files without explicit delegation
+- [ ] Checks engine version reference before suggesting post-cutoff APIs
+- [ ] Quantifies performance changes against stated budgets
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should include a visual test reference in `production/qa/evidence/`
+- Engine version check (Case 4) confirms the agent treats VERSION.md as authoritative
+- Performance budget case (Case 5) verifies the agent reads and applies provided context numbers
diff --git a/CCGS Skill Testing Framework/agents/specialists/tools-programmer.md b/CCGS Skill Testing Framework/agents/specialists/tools-programmer.md
new file mode 100644
index 0000000..d16d3ce
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/tools-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: tools-programmer
+
+## Agent Summary
+Domain: Editor extensions, content authoring tools, debug utilities, and pipeline automation scripts.
+Does NOT own: game code (gameplay-programmer, ui-programmer, etc.), engine core systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references editor tools / pipeline / debug utilities)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game source code or engine internals
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a custom editor tool for placing enemy patrol waypoints in the level."
+**Expected behavior:**
+- Produces an editor extension spec and code scaffold for the configured engine (e.g., Godot EditorPlugin, Unity Editor window, Unreal Detail Customization)
+- Tool allows designer to click-place waypoints in the scene/viewport
+- Waypoints are serialized as engine-native resource (not hardcoded) so level-designer can edit without code
+- Includes undo/redo support per editor plugin best practices
+- Does NOT modify the AI pathfinding runtime code (that belongs to ai-programmer)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the enemy melee combo system in code."
+**Expected behavior:**
+- Does NOT produce gameplay mechanic code
+- Explicitly states that combat system implementation belongs to `gameplay-programmer`
+- Redirects the request to `gameplay-programmer`
+- May note it can build a debug overlay tool to visualize combo state if useful during development
+
+### Case 3: Runtime data access — coordination required
+**Input:** "The waypoint editor tool needs to read game data at runtime to validate patrol routes against the AI budget."
+**Expected behavior:**
+- Identifies that runtime data access from an editor plugin requires a defined, safe interface to the game's runtime systems
+- Coordinates with `engine-programmer` to establish a read-only data access pattern (e.g., a resource validation API)
+- Does NOT directly read internal engine or game memory structures without an agreed interface
+- Documents the required interface before implementing the tool
+
+### Case 4: Engine version breakage
+**Input:** "After the engine upgrade, the waypoint editor tool crashes on startup."
+**Expected behavior:**
+- Checks the engine version reference (`docs/engine-reference/`) for breaking changes in editor plugin APIs
+- Identifies the specific API or signal that changed in the new version
+- Produces a targeted fix for the breaking change
+- Notes any other tools that may be affected by the same API change
+
+### Case 5: Context pass — art pipeline requirements
+**Input:** Art pipeline requirements provided in context: "All texture imports must set compression to VRAM Compressed, generate mipmaps, and tag with a LOD group." Request: "Build an asset import tool that enforces these settings."
+**Expected behavior:**
+- References all three requirements from the context: VRAM compression, mipmap generation, LOD group tagging
+- Produces an import tool that validates and applies all three settings on import
+- Adds a warning or error report for assets that fail to meet the specified settings
+- Does NOT change the art pipeline requirements themselves (those belong to art-director / technical-artist)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (editor tools, pipeline scripts, debug utilities)
+- [ ] Redirects game code requests to appropriate programmer agents
+- [ ] Returns structured findings (tool specs, editor extension code, pipeline scripts)
+- [ ] Coordinates with engine-programmer before accessing runtime data from editor context
+- [ ] Checks engine version reference before using editor plugin APIs
+- [ ] Builds tools to enforce requirements, does not author the requirements themselves
+
+---
+
+## Coverage Notes
+- Waypoint editor tool (Case 1) should have a smoke test verifying it loads without errors in the editor
+- Runtime data access (Case 3) confirms the agent respects the engine-programmer's ownership of core APIs
+- Art pipeline context (Case 5) verifies the agent builds to match provided specs rather than inventing requirements
diff --git a/CCGS Skill Testing Framework/agents/specialists/ui-programmer.md b/CCGS Skill Testing Framework/agents/specialists/ui-programmer.md
new file mode 100644
index 0000000..78f6018
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/ui-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ui-programmer
+
+## Agent Summary
+Domain: Menu screens, HUDs, inventory screens, dialogue boxes, UI framework code, and data binding.
+Does NOT own: UX flow design (ux-designer), visual style direction (art-director / technical-artist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references menus / HUDs / UI framework / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement the inventory screen from the UX spec in `design/ux/inventory-flow.md`."
+**Expected behavior:**
+- Reads the UX spec before producing any code
+- Produces implementation using the project's configured UI framework (UI Toolkit, UGUI, UMG, or Godot Control nodes)
+- Implements all states defined in the spec (default, hover, selected, empty-slot, locked-slot)
+- Binds inventory data to UI elements via the project's data model, not hardcoded values
+- Includes doc comments on public UI API per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the inventory interaction flow — what happens when the player equips, drops, or combines items."
+**Expected behavior:**
+- Does NOT produce interaction flow design or user flow diagrams
+- Explicitly states that UX flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes that once the flow spec is ready, it can implement it
+
+### Case 3: Custom animation coordination
+**Input:** "The item selection in the inventory needs a custom bounce animation when selected."
+**Expected behavior:**
+- Recognizes that defining the animation curve and feel is within technical-artist territory
+- Does NOT invent animation parameters (timing, easing) without a spec
+- Coordinates with `technical-artist` for an animation spec (duration, easing curve, overshoot amount)
+- Once the spec is provided, produces the implementation binding the animation to the selection state
+
+### Case 4: Ambiguous UX spec — flags back
+**Input:** The UX spec states "show item details on selection" but does not define what happens when an empty slot is selected.
+**Expected behavior:**
+- Identifies the ambiguity in the spec (empty slot selection state is undefined)
+- Does NOT make an arbitrary implementation decision for the undefined state
+- Flags the ambiguity back to `ux-designer` with the specific question: "What should the detail panel show when an empty inventory slot is selected?"
+- May propose two common options (hide panel / show placeholder) to help ux-designer decide quickly
+
+### Case 5: Context pass — engine UI toolkit
+**Input:** Engine context provided: project uses Godot 4.6 with Control node UI. Request: "Implement a scrollable item list for the inventory."
+**Expected behavior:**
+- Uses Godot's `ScrollContainer` + `VBoxContainer` + `ItemList` (or equivalent) pattern, not Canvas or UGUI
+- Does NOT produce Unity UGUI or Unreal UMG code for a Godot project
+- Checks the engine version reference (4.6) for any Control node API changes from 4.4/4.5 before using specific APIs
+- Produces GDScript or C# code consistent with the project's configured language
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (menus, HUDs, UI framework, data binding)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Coordinates with technical-artist for animation specs before implementing animations
+- [ ] Flags ambiguous UX specs back to ux-designer rather than making arbitrary implementation decisions
+- [ ] Returns structured output (implementation code, data binding patterns, state machine for UI states)
+- [ ] Uses the correct engine UI toolkit for the project — never cross-engine code
+
+---
+
+## Coverage Notes
+- Inventory implementation (Case 1) should have a UI interaction test or manual walkthrough doc in `production/qa/evidence/`
+- Animation coordination (Case 3) confirms the agent does not invent feel parameters without a spec
+- Ambiguous spec (Case 4) verifies the agent routes spec gaps back to the authoring agent rather than guessing
diff --git a/CCGS Skill Testing Framework/agents/specialists/ux-designer.md b/CCGS Skill Testing Framework/agents/specialists/ux-designer.md
new file mode 100644
index 0000000..b876154
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/ux-designer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ux-designer
+
+## Agent Summary
+Domain: User experience flows, interaction design, information architecture, input handling design, and onboarding UX.
+Does NOT own: visual art style (art-director), UI implementation code (ui-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UX flows / interaction design / information architecture)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over visual art direction or UI implementation code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design the inventory management flow for a survival game."
+**Expected behavior:**
+- Produces a user flow diagram (states and transitions) for the inventory: open, browse, select item, sub-actions (equip/drop/combine), close
+- Defines all interaction states (default, hover, selected, empty-slot, locked-slot)
+- Specifies input mappings for each action (keyboard, gamepad if applicable)
+- Notes cognitive load considerations (e.g., maximum items visible without scrolling)
+- Does NOT produce visual design (colors, icons) or implementation code
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the inventory screen in GDScript with drag-and-drop support."
+**Expected behavior:**
+- Does NOT produce implementation code
+- Explicitly states that UI code implementation belongs to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- Notes that the UX flow spec should be provided to ui-programmer as the implementation reference
+
+### Case 3: Flow depth conflict — simplification
+**Input:** "The lead designer says the current 5-step crafting flow is too deep; maximum 3 steps allowed."
+**Expected behavior:**
+- Produces a revised 3-step flow that collapses the original 5-step sequence
+- Shows clearly what was merged or removed and why each collapse is safe from a usability standpoint
+- Does NOT simply remove steps without addressing the user's goal at each removed step
+- Flags if the 3-step constraint makes any required use case impossible and proposes an alternative
+
+### Case 4: Accessibility conflict
+**Input:** "The onboarding flow uses a timed prompt (auto-advances after 3 seconds) to keep pace, but this conflicts with accessibility requirements for user-controlled timing."
+**Expected behavior:**
+- Identifies the conflict with WCAG 2.1 2.2.1 (Timing Adjustable)
+- Does NOT override the accessibility requirement to preserve pace
+- Coordinates with `accessibility-specialist` to agree on a compliant solution
+- Proposes alternatives: pause-on-hover, skip button, settings option to disable auto-advance
+
+### Case 5: Context pass — player mental model research
+**Input:** Playtest research provided in context: "Players consistently expected the 'Crafting' option to be inside the Inventory screen, not in a separate top-level menu." Request: "Redesign the navigation IA for crafting."
+**Expected behavior:**
+- References the specific player expectation from the research (crafting expected inside inventory)
+- Restructures the information architecture to place crafting as a tab or panel within the inventory screen
+- Does NOT produce a design that contradicts the stated player mental model without explicit justification
+- Notes the research source in the rationale for the design decision
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UX flows, interaction design, IA, onboarding)
+- [ ] Redirects code implementation to ui-programmer, visual style to art-director
+- [ ] Returns structured findings (state diagrams, flow steps, input mappings) not freeform opinions
+- [ ] Coordinates with accessibility-specialist when flows have timing or cognitive load constraints
+- [ ] Designs flows based on provided user research, not assumed behavior
+- [ ] Documents rationale for flow decisions against user goals
+
+---
+
+## Coverage Notes
+- Inventory flow (Case 1) should be written to `design/ux/` as a spec for ui-programmer to implement against
+- Mental model case (Case 5) verifies the agent applies research evidence, not intuition
+- Accessibility coordination (Case 4) confirms the agent does not override accessibility requirements for UX aesthetics
diff --git a/CCGS Skill Testing Framework/agents/specialists/world-builder.md b/CCGS Skill Testing Framework/agents/specialists/world-builder.md
new file mode 100644
index 0000000..caaa2ee
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/world-builder.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: world-builder
+
+## Agent Summary
+- **Domain**: World lore architecture — factions and their cultures/governments/motivations, world history, geography and ecology, cosmology and metaphysics, world rules (how magic works, what is and is not possible), internal consistency enforcement across the world document
+- **Does NOT own**: Specific NPC or quest dialogue (writer), game mechanics rules derived from world rules (game-designer/systems-designer), narrative story structure and arc design (narrative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates world rule/mechanic conflicts to narrative-director and game-designer jointly
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references world lore, factions, history, world rules, ecology)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/world/ documents; no game source, mechanic design, or dialogue files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over dialogue writing, mechanic design, or narrative arc structure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — faction culture and government design
+**Input**: "Design the Ironveil Merchant Consortium — a powerful trading faction in our world. I need their culture, government structure, and internal motivations."
+**Expected behavior**:
+- Produces a faction profile document with: cultural values and norms, government structure (how decisions are made, who holds power, succession or appointment process), internal factions or tensions within the consortium, relationship to other factions (allies, rivals, neutral parties), and primary motivations (what they want and why)
+- The faction is internally consistent: a merchant consortium's government is driven by economic logic, not feudal or religious logic, unless a deliberate hybrid is specified
+- Output includes at least one internal tension or contradiction within the faction — factions without internal complexity are flat
+- Formatted as a structured faction profile, not a narrative essay
+
+### Case 2: Out-of-domain request — dialogue writing
+**Input**: "Write the dialogue for a Ironveil Consortium merchant NPC that the player meets at the city gates."
+**Expected behavior**:
+- Does not produce NPC dialogue
+- States clearly: "Dialogue writing is owned by writer; I provide the world and faction context that informs the dialogue, including the faction's culture, tone, and speaking style"
+- Offers to produce the faction's speaking style notes and cultural context that writer would need to write consistent dialogue
+
+### Case 3: New lore entry contradicts established history — conflict flagging
+**Input**: "Add a lore entry stating the Ironveil Consortium was founded 50 years ago by a single merchant family." [Context includes existing lore: the Consortium has existed for 300 years and was founded as a collective by 12 rival trading houses.]
+**Expected behavior**:
+- Identifies the contradiction: existing lore states 300-year history and a founding coalition of 12 houses; the new entry claims 50 years and a single founding family
+- Does NOT write the new entry as requested
+- Flags the conflict: states both versions, identifies which is established and which is the proposed change
+- Proposes resolution options: (a) the new entry is wrong and should be corrected; (b) the existing lore should be updated if the new version is the intended canon; (c) there is an in-world explanation (the current family claims founding credit despite the collective origin — a deliberate narrative unreliable narrator)
+- Routes the resolution to narrative-director if no clear answer exists
+
+### Case 4: World rule has gameplay implications — coordination with game-designer
+**Input**: "I want to establish a world rule: magic users who cast spells near iron ore are weakened. Iron disrupts arcane energy."
+**Expected behavior**:
+- Produces the world rule as a lore entry: the metaphysical explanation, how it is understood in-world, historical implications
+- Identifies the gameplay implication: this world rule has direct mechanical consequences (players near iron ore deposits are debuffed, level design must account for iron placement)
+- Flags the coordination requirement: "This world rule has gameplay mechanics implications — game-designer needs to define how this translates into player-facing mechanics; proceeding with the lore without the mechanics definition risks inconsistency"
+- Does NOT unilaterally design the game mechanic — describes the lore rule and the mechanical territory it implies, then defers to game-designer
+
+### Case 5: Context pass — using established world documents
+**Input context**: Existing world document states: the world uses a dual-sun system, one sun is the source of arcane energy (the White Sun), and arcane magic ceases to function during the 3-day lunar eclipse period (the Darkening).
+**Input**: "Add a lore entry about the Mages' College and how they prepare for the Darkening."
+**Expected behavior**:
+- Uses the established dual-sun cosmology: references the White Sun as the source of arcane energy
+- Uses the established Darkening event: 3-day eclipse, magic ceases
+- Does NOT invent a different eclipse mechanism, duration, or name
+- Produces a lore entry where the Mages' College's Darkening preparations are consistent with the established rules: they cannot cast during the Darkening, so preparations are practical (stockpiling non-magical supplies, scheduling, shutting down ongoing magical processes)
+- Does not contradict any established fact from the context document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (factions, world history, geography, ecology, world rules, cosmology)
+- [ ] Redirects dialogue writing requests to writer with contextual faction notes
+- [ ] Flags lore contradictions with both versions stated and resolution options offered — does not silently overwrite established lore
+- [ ] Identifies gameplay implications of world rules and flags coordination with game-designer
+- [ ] Uses all established world facts from context; does not invent alternatives to stated lore
+
+---
+
+## Coverage Notes
+- Case 3 (contradiction detection) requires existing lore to be in context — this is the most important consistency test
+- Case 4 (world rule/mechanic coordination) tests cross-domain awareness; verify the agent identifies the mechanic boundary without crossing it
+- Case 5 is the most important context-awareness test; the agent must use established facts, not creative alternatives
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/agents/specialists/writer.md b/CCGS Skill Testing Framework/agents/specialists/writer.md
new file mode 100644
index 0000000..365dc9f
--- /dev/null
+++ b/CCGS Skill Testing Framework/agents/specialists/writer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: writer
+
+## Agent Summary
+- **Domain**: In-game written content — NPC dialogue (including branching trees), lore codex entries, item and ability descriptions, environmental text (signs, books, notes), quest text, tutorial text, in-world written documents
+- **Does NOT own**: Story architecture and narrative structure (narrative-director), world lore and world rules (world-builder), UX copy and UI labels (ux-designer), patch notes (community-manager)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags lore inconsistencies to narrative-director rather than resolving them autonomously
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references dialogue, lore entries, item descriptions, in-game text)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/ and assets/data/dialogue/; no code or world-building architecture files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over narrative structure, world rules, or UX copy direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — NPC merchant dialogue
+**Input**: "Write dialogue for Mira, a traveling merchant NPC. She sells general supplies. Players can ask her about her wares, the road ahead, and rumors."
+**Expected behavior**:
+- Produces a dialogue tree with at least three top-level conversation options: [Wares], [The Road Ahead], [Rumors]
+- Each branch has a distinct conversational response in Mira's voice — not generic merchant filler
+- Includes at least one response that has a follow-up branch (showing tree structure, not just flat responses)
+- Mira's voice is consistent across branches: if she's warm and chatty in one branch, she's not brusque in another without reason
+- Output is formatted as a structured dialogue tree: node label, NPC line, player options, next node
+
+### Case 2: Out-of-domain request — world history design
+**Input**: "Design the history of the world — when the first kingdom was founded, what the great wars were, and why magic was banned."
+**Expected behavior**:
+- Does not produce world history, lore architecture, or world rules
+- States clearly: "World history, lore, and world rules are owned by world-builder; once the history is established, I can write in-game texts, books, and dialogue that reference those events"
+- Does not produce even partial world history as a "placeholder"
+
+### Case 3: Dialogue contradicts established lore — flag to narrative-director
+**Input**: "Write Mira's dialogue line where she mentions that dragons have been extinct for 200 years." [Context includes existing lore: dragons are alive and revered in the northern provinces, not extinct.]
+**Expected behavior**:
+- Identifies the contradiction: established lore states dragons are alive and revered; dialogue stating they're extinct directly conflicts
+- Does NOT write the requested line as given
+- Flags the inconsistency to narrative-director: "Mira's dialogue as requested contradicts established lore (dragons are alive per world-builder's document); requires narrative-director resolution before I can write this line"
+- Offers an alternative: a line that references dragons in a way consistent with the established lore (e.g., Mira expresses awe about a dragon sighting in the north)
+
+### Case 4: Item description references an undesigned mechanic
+**Input**: "Write a description for the 'Berserker's Chalice' — a consumable that triggers the Berserker state when drunk."
+**Expected behavior**:
+- Identifies the dependency gap: "Berserker state" is not defined in any provided game design document
+- Flags the missing dependency: "This description references a 'Berserker state' mechanic that has no GDD entry — I cannot write accurate flavor text for a mechanic whose rules are undefined, as the description may create incorrect player expectations"
+- Does NOT write a description that invents mechanic details (duration, effects) that may conflict with the eventual design
+- Offers two paths: (a) write a vague, non-mechanical description that creates no false expectations, flagged as temporary; (b) wait for game-designer to define the Berserker state first
+
+### Case 5: Context pass — character voice guide
+**Input context**: Character voice guide for Mira: She speaks in short, energetic sentences. Uses merchant slang ("a fine bargain," "coin well spent"). Drops pronouns occasionally ("Good wares, these."). Never uses contractions — always "I will" not "I'll". Warm but slightly mercenary.
+**Input**: "Write Mira's response when a player asks if she has healing potions."
+**Expected behavior**:
+- Short, energetic sentences — no long monologues
+- Uses merchant slang: "a fine bargain," "coin well spent," or similar
+- Drops pronouns where natural: "Fine stock, these potions."
+- No contractions: "I will" not "I'll," "do not" not "don't"
+- Warm tone with a mercenary undertone: she's happy to help because you're a paying customer
+- Does NOT produce dialogue that violates any voice guide rule — check each rule explicitly
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (dialogue, lore entries, item descriptions, in-game text)
+- [ ] Redirects world history and world rule requests to world-builder without producing unauthorized lore
+- [ ] Flags lore contradictions to narrative-director rather than silently writing inconsistent content
+- [ ] Identifies mechanic dependency gaps before writing item descriptions that could create false player expectations
+- [ ] Applies all rules from a provided character voice guide — no partial compliance
+
+---
+
+## Coverage Notes
+- Case 3 (lore contradiction detection) requires that existing lore is in the conversation context — test is only valid when context is provided
+- Case 4 (dependency gap) tests whether the agent writes descriptions that could set wrong player expectations — a subtle but important quality issue
+- Case 5 is the most important context-awareness test; voice guide compliance must be checked rule-by-rule, not holistically
+- No automated runner; review manually or via `/skill-test`
diff --git a/CCGS Skill Testing Framework/catalog.yaml b/CCGS Skill Testing Framework/catalog.yaml
new file mode 100644
index 0000000..677f378
--- /dev/null
+++ b/CCGS Skill Testing Framework/catalog.yaml
@@ -0,0 +1,1199 @@
+version: 2
+last_updated: "2026-04-06"
+skills:
+ # Critical — gate skills that control phase transitions
+ - name: gate-check
+ spec: CCGS Skill Testing Framework/skills/gate/gate-check.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: critical
+ category: gate
+
+ - name: design-review
+ spec: CCGS Skill Testing Framework/skills/review/design-review.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: critical
+ category: review
+
+ - name: story-readiness
+ spec: CCGS Skill Testing Framework/skills/readiness/story-readiness.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: critical
+ category: readiness
+
+ - name: story-done
+ spec: CCGS Skill Testing Framework/skills/readiness/story-done.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: critical
+ category: readiness
+
+ - name: review-all-gdds
+ spec: CCGS Skill Testing Framework/skills/review/review-all-gdds.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: critical
+ category: review
+
+ - name: architecture-review
+ spec: CCGS Skill Testing Framework/skills/review/architecture-review.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: critical
+ category: review
+
+ # High — pipeline-critical skills
+ - name: create-epics
+ spec: CCGS Skill Testing Framework/skills/pipeline/create-epics.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: pipeline
+
+ - name: create-stories
+ spec: CCGS Skill Testing Framework/skills/pipeline/create-stories.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: pipeline
+
+ - name: dev-story
+ spec: CCGS Skill Testing Framework/skills/pipeline/dev-story.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: pipeline
+
+ - name: create-control-manifest
+ spec: CCGS Skill Testing Framework/skills/pipeline/create-control-manifest.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: pipeline
+
+ - name: propagate-design-change
+ spec: CCGS Skill Testing Framework/skills/pipeline/propagate-design-change.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: pipeline
+
+ - name: architecture-decision
+ spec: CCGS Skill Testing Framework/skills/authoring/architecture-decision.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: authoring
+
+ - name: map-systems
+ spec: CCGS Skill Testing Framework/skills/pipeline/map-systems.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: pipeline
+
+ - name: design-system
+ spec: CCGS Skill Testing Framework/skills/authoring/design-system.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: authoring
+
+ - name: consistency-check
+ spec: CCGS Skill Testing Framework/skills/analysis/consistency-check.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: high
+ category: analysis
+
+ # Medium — team and sprint management skills
+ - name: sprint-plan
+ spec: CCGS Skill Testing Framework/skills/sprint/sprint-plan.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: sprint
+
+ - name: sprint-status
+ spec: CCGS Skill Testing Framework/skills/sprint/sprint-status.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: sprint
+
+ - name: team-ui
+ spec: CCGS Skill Testing Framework/skills/team/team-ui.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-combat
+ spec: CCGS Skill Testing Framework/skills/team/team-combat.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-narrative
+ spec: CCGS Skill Testing Framework/skills/team/team-narrative.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-audio
+ spec: CCGS Skill Testing Framework/skills/team/team-audio.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-level
+ spec: CCGS Skill Testing Framework/skills/team/team-level.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-polish
+ spec: CCGS Skill Testing Framework/skills/team/team-polish.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-release
+ spec: CCGS Skill Testing Framework/skills/team/team-release.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-live-ops
+ spec: CCGS Skill Testing Framework/skills/team/team-live-ops.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ - name: team-qa
+ spec: CCGS Skill Testing Framework/skills/team/team-qa.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: team
+
+ # Low — analysis, reporting, utility skills
+ - name: skill-test
+ spec: CCGS Skill Testing Framework/skills/utility/skill-test.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: medium
+ category: utility
+
+ - name: skill-improve
+ spec: CCGS Skill Testing Framework/skills/utility/skill-improve.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: start
+ spec: CCGS Skill Testing Framework/skills/utility/start.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: help
+ spec: CCGS Skill Testing Framework/skills/utility/help.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: brainstorm
+ spec: CCGS Skill Testing Framework/skills/utility/brainstorm.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: project-stage-detect
+ spec: CCGS Skill Testing Framework/skills/utility/project-stage-detect.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: setup-engine
+ spec: CCGS Skill Testing Framework/skills/utility/setup-engine.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: quick-design
+ spec: CCGS Skill Testing Framework/skills/authoring/quick-design.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: authoring
+
+ - name: ux-design
+ spec: CCGS Skill Testing Framework/skills/authoring/ux-design.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: authoring
+
+ - name: ux-review
+ spec: CCGS Skill Testing Framework/skills/authoring/ux-review.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: authoring
+
+ - name: art-bible
+ spec: CCGS Skill Testing Framework/skills/authoring/art-bible.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: authoring
+
+ - name: create-architecture
+ spec: CCGS Skill Testing Framework/skills/authoring/create-architecture.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: authoring
+
+ - name: code-review
+ spec: CCGS Skill Testing Framework/skills/analysis/code-review.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: balance-check
+ spec: CCGS Skill Testing Framework/skills/analysis/balance-check.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: asset-audit
+ spec: CCGS Skill Testing Framework/skills/analysis/asset-audit.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: content-audit
+ spec: CCGS Skill Testing Framework/skills/analysis/content-audit.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: tech-debt
+ spec: CCGS Skill Testing Framework/skills/analysis/tech-debt.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: scope-check
+ spec: CCGS Skill Testing Framework/skills/analysis/scope-check.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: estimate
+ spec: CCGS Skill Testing Framework/skills/analysis/estimate.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: perf-profile
+ spec: CCGS Skill Testing Framework/skills/analysis/perf-profile.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: security-audit
+ spec: CCGS Skill Testing Framework/skills/analysis/security-audit.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: test-evidence-review
+ spec: CCGS Skill Testing Framework/skills/analysis/test-evidence-review.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: test-flakiness
+ spec: CCGS Skill Testing Framework/skills/analysis/test-flakiness.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: analysis
+
+ - name: reverse-document
+ spec: CCGS Skill Testing Framework/skills/utility/reverse-document.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: bug-report
+ spec: CCGS Skill Testing Framework/skills/utility/bug-report.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: hotfix
+ spec: CCGS Skill Testing Framework/skills/utility/hotfix.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: prototype
+ spec: CCGS Skill Testing Framework/skills/utility/prototype.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: playtest-report
+ spec: CCGS Skill Testing Framework/skills/utility/playtest-report.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: milestone-review
+ spec: CCGS Skill Testing Framework/skills/sprint/milestone-review.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: sprint
+
+ - name: retrospective
+ spec: CCGS Skill Testing Framework/skills/sprint/retrospective.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: sprint
+
+ - name: changelog
+ spec: CCGS Skill Testing Framework/skills/sprint/changelog.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: sprint
+
+ - name: patch-notes
+ spec: CCGS Skill Testing Framework/skills/sprint/patch-notes.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: sprint
+
+ - name: onboard
+ spec: CCGS Skill Testing Framework/skills/utility/onboard.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: localize
+ spec: CCGS Skill Testing Framework/skills/utility/localize.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: launch-checklist
+ spec: CCGS Skill Testing Framework/skills/utility/launch-checklist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: release-checklist
+ spec: CCGS Skill Testing Framework/skills/utility/release-checklist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: adopt
+ spec: CCGS Skill Testing Framework/skills/utility/adopt.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: smoke-check
+ spec: CCGS Skill Testing Framework/skills/utility/smoke-check.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: soak-test
+ spec: CCGS Skill Testing Framework/skills/utility/soak-test.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: test-setup
+ spec: CCGS Skill Testing Framework/skills/utility/test-setup.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: test-helpers
+ spec: CCGS Skill Testing Framework/skills/utility/test-helpers.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: regression-suite
+ spec: CCGS Skill Testing Framework/skills/utility/regression-suite.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: qa-plan
+ spec: CCGS Skill Testing Framework/skills/utility/qa-plan.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: bug-triage
+ spec: CCGS Skill Testing Framework/skills/utility/bug-triage.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: asset-spec
+ spec: CCGS Skill Testing Framework/skills/utility/asset-spec.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+ - name: day-one-patch
+ spec: CCGS Skill Testing Framework/skills/utility/day-one-patch.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ last_category: ""
+ last_category_result: ""
+ priority: low
+ category: utility
+
+agents:
+ # Tier 1 Directors (Opus)
+ - name: creative-director
+ spec: CCGS Skill Testing Framework/agents/directors/creative-director.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: director
+
+ - name: technical-director
+ spec: CCGS Skill Testing Framework/agents/directors/technical-director.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: director
+
+ - name: producer
+ spec: CCGS Skill Testing Framework/agents/directors/producer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: director
+
+ - name: art-director
+ spec: CCGS Skill Testing Framework/agents/directors/art-director.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: director
+
+ # Tier 2 Leads (Sonnet)
+ - name: lead-programmer
+ spec: CCGS Skill Testing Framework/agents/leads/lead-programmer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: lead
+
+ - name: qa-lead
+ spec: CCGS Skill Testing Framework/agents/leads/qa-lead.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: lead
+
+ - name: narrative-director
+ spec: CCGS Skill Testing Framework/agents/leads/narrative-director.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: lead
+
+ - name: audio-director
+ spec: CCGS Skill Testing Framework/agents/leads/audio-director.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: lead
+
+ - name: game-designer
+ spec: CCGS Skill Testing Framework/agents/leads/game-designer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: lead
+
+ - name: systems-designer
+ spec: CCGS Skill Testing Framework/agents/leads/systems-designer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: lead
+
+ - name: level-designer
+ spec: CCGS Skill Testing Framework/agents/leads/level-designer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: lead
+
+ # Core Specialists (Sonnet)
+ - name: gameplay-programmer
+ spec: CCGS Skill Testing Framework/agents/specialists/gameplay-programmer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: ai-programmer
+ spec: CCGS Skill Testing Framework/agents/specialists/ai-programmer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: technical-artist
+ spec: CCGS Skill Testing Framework/agents/specialists/technical-artist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: sound-designer
+ spec: CCGS Skill Testing Framework/agents/specialists/sound-designer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: engine-programmer
+ spec: CCGS Skill Testing Framework/agents/specialists/engine-programmer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: tools-programmer
+ spec: CCGS Skill Testing Framework/agents/specialists/tools-programmer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: network-programmer
+ spec: CCGS Skill Testing Framework/agents/specialists/network-programmer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: security-engineer
+ spec: CCGS Skill Testing Framework/agents/qa/security-engineer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: qa
+
+ - name: accessibility-specialist
+ spec: CCGS Skill Testing Framework/agents/qa/accessibility-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: qa
+
+ - name: ux-designer
+ spec: CCGS Skill Testing Framework/agents/specialists/ux-designer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: ui-programmer
+ spec: CCGS Skill Testing Framework/agents/specialists/ui-programmer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: performance-analyst
+ spec: CCGS Skill Testing Framework/agents/specialists/performance-analyst.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ # Engine Specialists — Godot
+ - name: godot-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/godot/godot-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: godot-gdscript-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/godot/godot-gdscript-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: godot-csharp-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/godot/godot-csharp-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: godot-shader-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/godot/godot-shader-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: godot-gdextension-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/godot/godot-gdextension-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ # Engine Specialists — Unity
+ - name: unity-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unity/unity-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: unity-ui-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unity/unity-ui-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: unity-shader-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unity/unity-shader-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: unity-dots-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unity/unity-dots-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: unity-addressables-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unity/unity-addressables-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ # Engine Specialists — Unreal
+ - name: unreal-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unreal/unreal-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: ue-blueprint-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unreal/ue-blueprint-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: ue-gas-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unreal/ue-gas-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: ue-umg-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unreal/ue-umg-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ - name: ue-replication-specialist
+ spec: CCGS Skill Testing Framework/agents/engine/unreal/ue-replication-specialist.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: engine
+
+ # Operations
+ - name: devops-engineer
+ spec: CCGS Skill Testing Framework/agents/operations/devops-engineer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: operations
+
+ - name: release-manager
+ spec: CCGS Skill Testing Framework/agents/operations/release-manager.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: operations
+
+ - name: live-ops-designer
+ spec: CCGS Skill Testing Framework/agents/operations/live-ops-designer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: operations
+
+ - name: community-manager
+ spec: CCGS Skill Testing Framework/agents/operations/community-manager.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: operations
+
+ - name: analytics-engineer
+ spec: CCGS Skill Testing Framework/agents/operations/analytics-engineer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: operations
+
+ - name: economy-designer
+ spec: CCGS Skill Testing Framework/agents/operations/economy-designer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: operations
+
+ - name: localization-lead
+ spec: CCGS Skill Testing Framework/agents/operations/localization-lead.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: operations
+
+ # QA & Creative
+ - name: qa-tester
+ spec: CCGS Skill Testing Framework/agents/qa/qa-tester.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: qa
+
+ - name: prototyper
+ spec: CCGS Skill Testing Framework/agents/specialists/prototyper.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: writer
+ spec: CCGS Skill Testing Framework/agents/specialists/writer.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
+
+ - name: world-builder
+ spec: CCGS Skill Testing Framework/agents/specialists/world-builder.md
+ last_static: ""
+ last_static_result: ""
+ last_spec: ""
+ last_spec_result: ""
+ category: specialist
diff --git a/CCGS Skill Testing Framework/quality-rubric.md b/CCGS Skill Testing Framework/quality-rubric.md
new file mode 100644
index 0000000..b767e2e
--- /dev/null
+++ b/CCGS Skill Testing Framework/quality-rubric.md
@@ -0,0 +1,240 @@
+# Skill Quality Rubric
+
+Used by `/skill-test category [name|all]` to evaluate skills beyond structural compliance.
+Each category defines 4–5 binary PASS/FAIL metrics specific to the skill's job.
+
+A metric is PASS when the skill's written instructions clearly satisfy the criterion.
+A metric is FAIL when the instructions are absent, ambiguous, or contradictory.
+A metric is WARN when the instructions partially address the criterion.
+
+---
+
+## Skill Categories
+
+### `gate`
+
+**Skills**: gate-check
+
+Gate skills control phase transitions. They must enforce correctness without
+auto-advancing stage and must respect the three review modes.
+
+| Metric | PASS criteria |
+|---|---|
+| **G1 — Review mode read** | Skill reads `production/session-state/review-mode.txt` (or equivalent) before deciding which directors to spawn |
+| **G2 — Full mode: all 4 directors spawn** | In `full` mode, all 4 Tier-1 directors (CD, TD, PR, AD) PHASE-GATE prompts are invoked in parallel |
+| **G3 — Lean mode: PHASE-GATE only** | In `lean` mode, only `*-PHASE-GATE` gates run; inline gates (CD-PILLARS, TD-ARCHITECTURE, etc.) are skipped |
+| **G4 — Solo mode: no directors** | In `solo` mode, no director gates spawn; each is noted as "skipped — Solo mode" |
+| **G5 — No auto-advance** | Skill never writes `production/stage.txt` without explicit user confirmation via "May I write" |
+
+---
+
+### `review`
+
+**Skills**: design-review, architecture-review, review-all-gdds
+
+Review skills read documents and produce structured verdicts. They are strictly
+read-only and must never trigger director gates themselves.
+
+| Metric | PASS criteria |
+|---|---|
+| **R1 — Read-only enforcement** | `allowed-tools` does not include `Write` or `Edit`; skill never offers to modify the reviewed document |
+| **R2 — 8-section check** | Skill evaluates all 8 required GDD sections (or equivalent architectural sections) explicitly |
+| **R3 — Correct verdict vocabulary** | Verdict is exactly one of: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED (design) or PASS / CONCERNS / FAIL (architecture) |
+| **R4 — No director gates** | Skill does not spawn any director gate regardless of review mode; it IS the review |
+| **R5 — Structured findings** | Output contains a per-section status table or checklist before the final verdict |
+
+---
+
+### `authoring`
+
+**Skills**: design-system, quick-design, architecture-decision, ux-design, ux-review, art-bible, create-architecture
+
+Authoring skills create or update design documents section by section. They must
+collaborate before writing and handle both new and existing (retrofit) documents.
+
+| Metric | PASS criteria |
+|---|---|
+| **A1 — Section-by-section cycle** | Skill authors one section at a time, presenting content for approval before proceeding to the next |
+| **A2 — May-I-write per section** | Skill asks "May I write this to [filepath]?" before each section write, not just once at the end |
+| **A3 — Retrofit mode** | Skill detects if the target file already exists and offers to update specific sections rather than overwriting the whole document |
+| **A4 — Director gate at correct tier** | If a director gate is defined for this skill (e.g., CD-GDD-ALIGN, TD-ADR), it runs at the correct mode threshold (full/lean) — NOT in solo |
+| **A5 — Skeleton-first** | Skill creates a file skeleton with all section headers before filling content, to preserve progress on session interruption |
+
+---
+
+### `readiness`
+
+**Skills**: story-readiness, story-done
+
+Readiness skills validate stories before or after implementation. They must produce
+multi-dimensional verdicts and integrate correctly with director gate mode.
+
+| Metric | PASS criteria |
+|---|---|
+| **RD1 — Multi-dimensional check** | Skill checks ≥3 independent dimensions (e.g., Design, Architecture, Scope, DoD) and reports each separately |
+| **RD2 — Three verdict levels** | Verdict hierarchy is clearly defined: READY/COMPLETE > NEEDS WORK/COMPLETE WITH NOTES > BLOCKED |
+| **RD3 — BLOCKED requires external action** | BLOCKED verdict is reserved for issues that cannot be fixed by the story author alone (e.g., Proposed ADR, unresolvable dependency) |
+| **RD4 — Director gate at correct mode** | QL-STORY-READY or LP-CODE-REVIEW gate spawns in `full` mode, skips in `lean`/`solo` with a noted skip message |
+| **RD5 — Next-story handoff** | After completion, skill surfaces the next READY story from the active sprint |
+
+---
+
+### `pipeline`
+
+**Skills**: create-epics, create-stories, dev-story, create-control-manifest, propagate-design-change, map-systems
+
+Pipeline skills produce artifacts that other skills consume. They must write files
+with correct schema, respect layer/priority ordering, and gate before writing.
+
+| Metric | PASS criteria |
+|---|---|
+| **P1 — Correct output schema** | Each produced file follows the project template (EPIC.md, story frontmatter, etc.); skill references the template path |
+| **P2 — Layer/priority ordering** | Skills that produce epics or stories respect layer ordering (core → extended → meta) and priority fields |
+| **P3 — May-I-write before each artifact** | Skill asks "May I write [artifact]?" before creating each output file, not batch-approving all files at once |
+| **P4 — Director gate at correct tier** | In-scope gates (PR-EPIC, QL-STORY-READY, LP-CODE-REVIEW, etc.) run in `full`, skip in `lean`/`solo` with noted skip |
+| **P5 — Reads before writes** | Skill reads the relevant GDD/ADR/manifest before producing artifacts to ensure alignment |
+
+---
+
+### `analysis`
+
+**Skills**: consistency-check, balance-check, content-audit, code-review, tech-debt,
+scope-check, estimate, perf-profile, asset-audit, security-audit, test-evidence-review, test-flakiness
+
+Analysis skills scan the project and surface findings. They are read-only during
+analysis and must ask before recommending any file writes.
+
+| Metric | PASS criteria |
+|---|---|
+| **AN1 — Read-only scan** | Analysis phase uses only Read/Glob/Grep tools; no Write or Edit during the scan itself |
+| **AN2 — Structured findings table** | Output includes a findings table or checklist (not prose only) with severity/priority per finding |
+| **AN3 — No auto-write** | Any suggested file writes (e.g., tech-debt register, fix patches) are gated behind "May I write" |
+| **AN4 — No director gates during analysis** | Analysis skills do not spawn director gates; they produce findings for human review |
+
+---
+
+### `team`
+
+**Skills**: team-combat, team-narrative, team-audio, team-level, team-ui, team-qa,
+team-release, team-polish, team-live-ops
+
+Team skills orchestrate multiple specialist agents for a department. They must
+spawn the right agents, run independent ones in parallel, and surface blocks immediately.
+
+| Metric | PASS criteria |
+|---|---|
+| **T1 — Named agent list** | Skill explicitly names which agents it spawns and in what order |
+| **T2 — Parallel where independent** | Agents whose inputs don't depend on each other are spawned in parallel (single message, multiple Task calls) |
+| **T3 — BLOCKED surfacing** | If any spawned agent returns BLOCKED or fails, skill surfaces it immediately and halts dependent work — never silently skips |
+| **T4 — Collect all verdicts before proceeding** | Dependent phases wait for all parallel agents to complete before proceeding |
+| **T5 — Usage error on no argument** | If required argument (e.g., feature name) is missing, skill outputs usage hint and stops without spawning agents |
+
+---
+
+### `sprint`
+
+**Skills**: sprint-plan, sprint-status, milestone-review, retrospective, changelog, patch-notes
+
+Sprint skills read production state and produce reports or planning artifacts.
+They have a PR-SPRINT or PR-MILESTONE gate at specific mode thresholds.
+
+| Metric | PASS criteria |
+|---|---|
+| **SP1 — Reads sprint/milestone state** | Skill reads `production/sprints/` or `production/milestones/` before producing output |
+| **SP2 — Correct sprint gate** | PR-SPRINT (for planning) or PR-MILESTONE (for milestone review) gate runs in `full` mode, skips in `lean`/`solo` |
+| **SP3 — Structured output** | Output uses a consistent structure (velocity table, risk list, action items) rather than free prose |
+| **SP4 — No auto-commit** | Skill never writes sprint files or milestone records without "May I write" |
+
+---
+
+### `utility`
+
+**Skills**: start, help, brainstorm, onboard, adopt, hotfix, prototype, localize,
+launch-checklist, release-checklist, smoke-check, soak-test, test-setup, test-helpers,
+regression-suite, qa-plan, bug-triage, bug-report, playtest-report, asset-spec,
+reverse-document, project-stage-detect, setup-engine, skill-test, skill-improve,
+day-one-patch, and any other skills not in categories above
+
+Utility skills pass the 7 standard static checks. If they happen to spawn director
+gates, the gate mode logic must also be correct.
+
+| Metric | PASS criteria |
+|---|---|
+| **U1 — Passes all 7 static checks** | `/skill-test static [name]` returns COMPLIANT with 0 FAILs |
+| **U2 — Gate mode correct (if applicable)** | If the skill spawns any director gate, it reads review-mode and applies full/lean/solo logic correctly |
+
+---
+
+## Agent Categories
+
+Used to validate agent spec files in `tests/agents/`.
+
+### `director`
+
+**Agents**: creative-director, technical-director, art-director, producer
+
+| Metric | PASS criteria |
+|---|---|
+| **D1 — Correct verdict vocabulary** | Returns APPROVE / CONCERNS / REJECT (or domain equivalent: REALISTIC/CONCERNS/UNREALISTIC for producer) |
+| **D2 — Domain boundary respected** | Does not make binding decisions outside its declared domain |
+| **D3 — Conflict escalation** | When two departments conflict, escalates to correct parent (creative-director or technical-director) rather than unilaterally deciding |
+| **D4 — Opus model tier** | Agent is assigned Opus model per coordination-rules.md |
+
+### `lead`
+
+**Agents**: lead-programmer, qa-lead, narrative-director, audio-director, game-designer,
+systems-designer, level-designer
+
+| Metric | PASS criteria |
+|---|---|
+| **L1 — Domain verdict** | Returns a domain-specific verdict (e.g., FEASIBLE/INFEASIBLE for lead-programmer, PASS/FAIL for qa-lead) |
+| **L2 — Escalates to shared parent** | Out-of-domain conflicts escalate to creative-director (design) or technical-director (tech) |
+| **L3 — Sonnet model tier** | Agent is assigned Sonnet model (default) per coordination-rules.md |
+
+### `specialist`
+
+**Agents**: gameplay-programmer, ai-programmer, technical-artist, sound-designer,
+engine-programmer, tools-programmer, network-programmer, security-engineer,
+accessibility-specialist, ux-designer, ui-programmer, performance-analyst, prototyper,
+qa-tester, writer, world-builder
+
+| Metric | PASS criteria |
+|---|---|
+| **S1 — Stays in domain** | Explicitly scopes itself to its declared domain; defers out-of-domain requests |
+| **S2 — No binding cross-domain decisions** | Does not unilaterally decide matters owned by another specialist |
+| **S3 — Defers correctly** | Out-of-domain requests are redirected to the correct agent, not refused silently |
+
+### `engine`
+
+**Agents**: godot-specialist, godot-gdscript-specialist, godot-csharp-specialist,
+godot-shader-specialist, godot-gdextension-specialist, unity-specialist, unity-ui-specialist,
+unity-shader-specialist, unity-dots-specialist, unity-addressables-specialist,
+unreal-specialist, ue-blueprint-specialist, ue-gas-specialist, ue-umg-specialist,
+ue-replication-specialist
+
+| Metric | PASS criteria |
+|---|---|
+| **E1 — Version-aware** | References engine version from `docs/engine-reference/` before suggesting API calls; flags post-cutoff risk |
+| **E2 — File routing** | Routes file types to the correct sub-specialist (e.g., `.gdshader` → godot-shader-specialist, not godot-gdscript-specialist) |
+| **E3 — Engine-specific patterns** | Enforces engine-specific idioms (e.g., GDScript static typing, C# attribute exports, Blueprint function libraries) |
+
+### `qa`
+
+**Agents**: qa-tester, qa-lead, security-engineer, accessibility-specialist
+
+| Metric | PASS criteria |
+|---|---|
+| **Q1 — Produces artifacts not code** | Primary output is test cases, bug reports, or coverage gaps — not implementation code |
+| **Q2 — Evidence format** | Test cases follow the project's test evidence format (unit/integration/visual/UI per coding-standards.md) |
+| **Q3 — No scope creep** | Does not propose new features; flags gaps for humans to decide |
+
+### `operations`
+
+**Agents**: devops-engineer, release-manager, live-ops-designer, community-manager,
+analytics-engineer, economy-designer, localization-lead
+
+| Metric | PASS criteria |
+|---|---|
+| **O1 — Domain ownership clear** | Agent description clearly states what it owns (pipeline, releases, economy, etc.) |
+| **O2 — Defers implementation** | Does not write game logic or engine code; delegates to appropriate specialist |
+| **O3 — Toolset matches role** | `allowed-tools` in frontmatter matches the operational (not coding) nature of the role |
diff --git a/CCGS Skill Testing Framework/skills/analysis/asset-audit.md b/CCGS Skill Testing Framework/skills/analysis/asset-audit.md
new file mode 100644
index 0000000..458a8f6
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/asset-audit.md
@@ -0,0 +1,170 @@
+# Skill Test Spec: /asset-audit
+
+## Skill Summary
+
+`/asset-audit` audits the `assets/` directory for naming convention compliance,
+missing metadata, and format/size issues. It reads asset files against the
+conventions and budgets defined in `technical-preferences.md`. No director gates
+are invoked. The skill does not write without user approval. Verdicts: COMPLIANT,
+WARNINGS, or NON-COMPLIANT.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLIANT, WARNINGS, NON-COMPLIANT
+- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
+- [ ] Has a next-step handoff (what to do after audit results)
+
+---
+
+## Director Gate Checks
+
+None. Asset auditing is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All assets follow naming conventions
+
+**Fixture:**
+- `technical-preferences.md` specifies naming convention: `snake_case`, e.g., `enemy_grunt_idle.png`
+- `assets/art/characters/` contains: `enemy_grunt_idle.png`, `enemy_sniper_run.png`
+- `assets/audio/sfx/` contains: `sfx_jump_land.ogg`, `sfx_item_pickup.ogg`
+- All files are within size budget (textures ≤2MB, audio ≤500KB)
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads naming conventions and size budgets from `technical-preferences.md`
+2. Skill scans `assets/` recursively
+3. All files match `snake_case` convention; all within budget
+4. Audit table shows all rows PASS
+5. Verdict is COMPLIANT
+
+**Assertions:**
+- [ ] Audit covers both art and audio asset directories
+- [ ] Each file is checked against naming convention and size budget
+- [ ] All rows show PASS when compliant
+- [ ] Verdict is COMPLIANT
+- [ ] No files are written
+
+---
+
+### Case 2: Non-Compliant — Textures exceed size budget
+
+**Fixture:**
+- `assets/art/environment/` contains 5 texture files
+- 3 texture files are 4MB each (budget: ≤2MB)
+- 2 texture files are within budget
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads size budget from `technical-preferences.md` (2MB for textures)
+2. Skill scans `assets/art/environment/` — finds 3 oversized textures
+3. Audit table lists each oversized file with actual size and budget
+4. Verdict is NON-COMPLIANT
+5. Skill recommends compression or resolution reduction for flagged files
+
+**Assertions:**
+- [ ] All 3 oversized files are listed by name with actual size and budget size
+- [ ] Verdict is NON-COMPLIANT when any file exceeds its budget
+- [ ] Optimization recommendation is given for oversized files
+- [ ] Within-budget files are also listed (showing PASS) for completeness
+
+---
+
+### Case 3: Format Issue — Audio in wrong format
+
+**Fixture:**
+- `technical-preferences.md` specifies audio format: OGG
+- `assets/audio/music/theme_main.wav` exists (WAV format)
+- `assets/audio/sfx/sfx_footstep.ogg` exists (correct OGG format)
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads audio format requirement: OGG
+2. Skill scans `assets/audio/` — finds `theme_main.wav` in wrong format
+3. Audit table flags `theme_main.wav` as FORMAT ISSUE (expected OGG, found WAV)
+4. `sfx_footstep.ogg` shows PASS
+5. Verdict is WARNINGS (format issues are correctable)
+
+**Assertions:**
+- [ ] `theme_main.wav` is flagged as FORMAT ISSUE with expected and actual format noted
+- [ ] Verdict is WARNINGS (not NON-COMPLIANT) for format issues, which are correctable
+- [ ] Correct-format assets are shown as PASS
+- [ ] Skill does not modify or convert any asset files
+
+---
+
+### Case 4: Missing Asset — Asset referenced by GDD but absent from assets/
+
+**Fixture:**
+- `design/gdd/enemies.md` references `enemy_boss_idle.png`
+- `assets/art/characters/boss/` directory is empty — file does not exist
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads GDD references to find expected assets (cross-references with `/content-audit` scope)
+2. Skill scans `assets/art/characters/boss/` — file not found
+3. Audit table flags `enemy_boss_idle.png` as MISSING ASSET
+4. Verdict is NON-COMPLIANT (missing critical art asset)
+
+**Assertions:**
+- [ ] Skill checks GDD references to identify expected assets
+- [ ] Missing assets are flagged as MISSING ASSET with the GDD reference noted
+- [ ] Verdict is NON-COMPLIANT when critical assets are missing
+- [ ] Skill does not create or add placeholder assets
+
+---
+
+### Case 5: Gate Compliance — No gate; technical-artist may be consulted separately
+
+**Fixture:**
+- 2 files have naming convention violations (CamelCase instead of snake_case)
+- `review-mode.txt` contains `full`
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill scans assets and finds 2 naming violations
+2. No director gate is invoked regardless of review mode
+3. Verdict is WARNINGS
+4. Output notes: "Consider having a Technical Artist review naming conventions"
+5. Skill presents findings; offers optional audit report write
+6. If user opts in: "May I write to `production/qa/asset-audit-[date].md`?"
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Technical artist consultation is suggested (not mandated)
+- [ ] Findings table is presented before any write prompt
+- [ ] Optional audit report write asks "May I write" before writing
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `technical-preferences.md` for naming conventions, formats, and size budgets
+- [ ] Scans `assets/` directory recursively
+- [ ] Audit table shows file name, check type, expected value, actual value, and result
+- [ ] Does not modify any asset files
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: COMPLIANT, WARNINGS, NON-COMPLIANT
+
+---
+
+## Coverage Notes
+
+- Metadata checks (e.g., missing texture import settings in Godot `.import` files)
+ are not explicitly tested here; they follow the same FORMAT ISSUE flagging pattern.
+- The interaction between `/asset-audit` and `/content-audit` (both check GDD
+ references vs. assets) is intentional overlap; `/asset-audit` focuses on
+ compliance while `/content-audit` focuses on completeness.
diff --git a/CCGS Skill Testing Framework/skills/analysis/balance-check.md b/CCGS Skill Testing Framework/skills/analysis/balance-check.md
new file mode 100644
index 0000000..9ea190c
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/balance-check.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /balance-check
+
+## Skill Summary
+
+`/balance-check` reads balance data files (JSON or YAML in `assets/data/`) and
+checks each value against the design formulas defined in GDDs under `design/gdd/`.
+It produces a findings table with columns: Value → Formula → Deviation → Severity.
+No director gates are invoked (read-only analysis). The skill may optionally write
+a balance report but asks "May I write" before doing so. Verdicts: BALANCED,
+CONCERNS, or OUT OF BALANCE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: BALANCED, CONCERNS, OUT OF BALANCE
+- [ ] Contains "May I write" language (optional report write)
+- [ ] Has a next-step handoff (what to do after findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Balance check is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All balance values within formula tolerances
+
+**Fixture:**
+- `assets/data/combat-balance.json` exists with 6 stat values
+- `design/gdd/combat-system.md` contains formulas for all 6 stats with ±10% tolerance
+- All 6 values fall within tolerance
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads all balance data files in `assets/data/`
+2. Skill reads GDD formulas from `design/gdd/`
+3. Skill computes deviation for each value against its formula
+4. All deviations are within ±10% tolerance
+5. Skill outputs findings table with all rows showing PASS
+6. Verdict is BALANCED
+
+**Assertions:**
+- [ ] Findings table is shown for all checked values
+- [ ] Each row shows: stat name, formula target, actual value, deviation percentage
+- [ ] All rows show PASS or equivalent when within tolerance
+- [ ] Verdict is BALANCED
+- [ ] No files are written without user approval
+
+---
+
+### Case 2: Out of Balance — Player damage 40% above formula target
+
+**Fixture:**
+- `assets/data/combat-balance.json` has `player_damage_base: 140`
+- `design/gdd/combat-system.md` formula specifies `player_damage_base = 100` (±10%)
+- All other stats are within tolerance
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads combat-balance.json and computes deviation for `player_damage_base`
+2. Deviation is +40% — far outside ±10% tolerance
+3. Skill flags this row as severity HIGH in the findings table
+4. Verdict is OUT OF BALANCE
+5. Skill surfaces the HIGH severity item prominently before the table
+
+**Assertions:**
+- [ ] `player_damage_base` row shows deviation of +40%
+- [ ] Severity is HIGH for deviations exceeding tolerance by more than 2×
+- [ ] Verdict is OUT OF BALANCE when any stat has HIGH severity deviation
+- [ ] The HIGH severity item is called out explicitly, not buried in table rows
+
+---
+
+### Case 3: No GDD Formulas — Cannot validate, guidance given
+
+**Fixture:**
+- `assets/data/economy-balance.yaml` exists with 10 stat values
+- No GDD in `design/gdd/` contains formula definitions for economy stats
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads balance data files
+2. Skill searches GDDs for formula definitions — finds none for economy stats
+3. Skill outputs: "Cannot validate economy stats — no formulas defined. Run /design-system first."
+4. No findings table is generated for the economy stats
+5. Verdict is CONCERNS (data exists but cannot be validated)
+
+**Assertions:**
+- [ ] Skill does not fabricate formula targets when none exist in GDDs
+- [ ] Output explicitly names the missing formula source
+- [ ] Output recommends running `/design-system` to define formulas
+- [ ] Verdict is CONCERNS (not BALANCED, since validation was impossible)
+
+---
+
+### Case 4: Orphan Reference — Balance file references an undefined stat
+
+**Fixture:**
+- `assets/data/combat-balance.json` contains a stat `legacy_armor_mult: 1.5`
+- `design/gdd/combat-system.md` has no formula for `legacy_armor_mult`
+- All other stats have formula definitions and pass validation
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads all stats from combat-balance.json
+2. Skill cannot find a formula for `legacy_armor_mult` in any GDD
+3. Skill flags `legacy_armor_mult` as ORPHAN REFERENCE in the findings table
+4. Other stats are evaluated normally; those within tolerance show PASS
+5. Verdict is CONCERNS (orphan reference prevents full validation)
+
+**Assertions:**
+- [ ] `legacy_armor_mult` appears in findings table with status ORPHAN REFERENCE
+- [ ] Orphan references are distinguished from formula deviations in the table
+- [ ] Verdict is CONCERNS when any orphan references are found
+- [ ] Skill does not skip orphan stats silently
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate; optional report requires approval
+
+**Fixture:**
+- Balance data and GDD formulas exist; 1 stat has CONCERNS-level deviation (15% above target)
+- `review-mode.txt` contains `full`
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads data and GDDs; generates findings table
+2. Verdict is CONCERNS (one stat slightly out of range)
+3. No director gate is invoked
+4. Skill presents findings table to user
+5. Skill offers to write an optional balance report
+6. If user says yes: skill asks "May I write to `production/qa/balance-report-[date].md`?"
+7. If user says no: skill ends without writing
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Findings table is presented without writing anything automatically
+- [ ] Optional report write is offered but not forced
+- [ ] "May I write" prompt appears only if user opts in to the report
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads both balance data files and GDD formulas before analysis
+- [ ] Findings table shows Value, Formula, Deviation, and Severity columns
+- [ ] Does not write any files without explicit user approval
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: BALANCED, CONCERNS, OUT OF BALANCE
+
+---
+
+## Coverage Notes
+
+- The case where `assets/data/` is entirely empty is not tested; behavior
+ follows the CONCERNS pattern with a message that no data files were found.
+- Tolerance thresholds (±10%, ±20%) are implementation details of the skill;
+ the tests verify that deviations are detected and classified, not the
+ exact threshold values.
diff --git a/CCGS Skill Testing Framework/skills/analysis/code-review.md b/CCGS Skill Testing Framework/skills/analysis/code-review.md
new file mode 100644
index 0000000..26276bc
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/code-review.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /code-review
+
+## Skill Summary
+
+`/code-review` performs an architectural code review of source files in `src/`,
+checking coding standards from `CLAUDE.md` (doc comments on public APIs,
+dependency injection over singletons, data-driven values, testability). Findings
+are advisory. No director gates are invoked. No code edits are made. Verdicts:
+APPROVED, CONCERNS, or NEEDS CHANGES.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, CONCERNS, NEEDS CHANGES
+- [ ] Does NOT require "May I write" language (read-only; findings are advisory output)
+- [ ] Has a next-step handoff (what to do with findings)
+
+---
+
+## Director Gate Checks
+
+None. Code review is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Source file follows all coding standards
+
+**Fixture:**
+- `src/gameplay/health_component.gd` exists with:
+ - All public methods have doc comments (`##` notation)
+ - No singletons used; dependencies injected via constructor
+ - No hardcoded values; all constants reference `assets/data/`
+ - ADR reference in file header: `# ADR: docs/architecture/adr-004-health.md`
+ - Referenced ADR has `Status: Accepted`
+
+**Input:** `/code-review src/gameplay/health_component.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill checks all coding standards: doc comments, DI, data-driven, ADR status
+3. All checks pass
+4. Skill outputs findings summary with all checks PASS
+5. Verdict is APPROVED
+
+**Assertions:**
+- [ ] Each coding standard check is listed in the output
+- [ ] All checks show PASS when standards are met
+- [ ] Skill reads referenced ADR to confirm its status
+- [ ] Verdict is APPROVED
+- [ ] No edits are made to any file
+
+---
+
+### Case 2: Needs Changes — Missing doc comment and singleton usage
+
+**Fixture:**
+- `src/ui/inventory_ui.gd` has:
+ - 2 public methods without doc comments
+ - Uses `GameManager.instance` (singleton pattern)
+ - All other standards met
+
+**Input:** `/code-review src/ui/inventory_ui.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill detects: 2 missing doc comments on public methods
+3. Skill detects: singleton usage at specific lines (e.g., line 42, line 87)
+4. Findings list the exact method names and line numbers
+5. Verdict is NEEDS CHANGES
+
+**Assertions:**
+- [ ] Missing doc comments are listed with method names
+- [ ] Singleton usage is flagged with file and line number
+- [ ] Verdict is NEEDS CHANGES when BLOCKING-level standard violations exist
+- [ ] Skill does not edit the file — findings are for the developer to act on
+- [ ] Output suggests replacing singleton with dependency injection
+
+---
+
+### Case 3: Architecture Risk — ADR reference is Proposed, not Accepted
+
+**Fixture:**
+- `src/core/save_system.gd` has a header comment: `# ADR: docs/architecture/adr-010-save.md`
+- `adr-010-save.md` exists but has `Status: Proposed`
+- Code itself follows all other coding standards
+
+**Input:** `/code-review src/core/save_system.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill reads referenced ADR — finds `Status: Proposed`
+3. Skill flags this as ARCHITECTURE RISK (code is implementing an unaccepted ADR)
+4. Other coding standard checks pass
+5. Verdict is CONCERNS (risk flag is advisory, not a hard NEEDS CHANGES)
+
+**Assertions:**
+- [ ] Skill reads referenced ADR file to check its status
+- [ ] ARCHITECTURE RISK is flagged when ADR status is Proposed
+- [ ] Verdict is CONCERNS (not NEEDS CHANGES) for ADR risk — advisory severity
+- [ ] Output recommends resolving the ADR before the code goes to production
+
+---
+
+### Case 4: Edge Case — No source files found at specified path
+
+**Fixture:**
+- User calls `/code-review src/networking/`
+- `src/networking/` directory does not exist
+
+**Input:** `/code-review src/networking/`
+
+**Expected behavior:**
+1. Skill attempts to read files in `src/networking/`
+2. Directory or files not found
+3. Skill outputs an error: "No source files found at `src/networking/`"
+4. Skill suggests checking `src/` for valid directories
+5. No verdict is emitted (nothing was reviewed)
+
+**Assertions:**
+- [ ] Skill does not crash when path does not exist
+- [ ] Output names the attempted path in the error message
+- [ ] Output suggests checking `src/` for valid file paths
+- [ ] No verdict is emitted when there is nothing to review
+
+---
+
+### Case 5: Gate Compliance — No gate; LP may be consulted separately
+
+**Fixture:**
+- Source file follows most standards but has 1 CONCERNS-level finding (a magic number)
+- `review-mode.txt` contains `full`
+
+**Input:** `/code-review src/gameplay/loot_system.gd`
+
+**Expected behavior:**
+1. Skill reads and reviews the source file
+2. No director gate is invoked (code review findings are advisory)
+3. Skill presents findings with the CONCERNS verdict
+4. Output notes: "Consider requesting a Lead Programmer review for architecture concerns"
+5. Skill does not invoke any agent automatically
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] LP consultation is suggested (not mandated) in the output
+- [ ] No code edits are made
+- [ ] Verdict is CONCERNS for advisory-level findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads source file(s) and coding standards before reviewing
+- [ ] Lists each coding standard check in findings output
+- [ ] Does not edit any source files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: APPROVED, CONCERNS, NEEDS CHANGES
+
+---
+
+## Coverage Notes
+
+- Batch review of all files in a directory is not explicitly tested; behavior
+ is assumed to apply the same checks file by file and aggregate the verdict.
+- Test coverage checks (verifying corresponding test files exist) are a stretch
+ goal not tested here; that is primarily the domain of `/test-evidence-review`.
diff --git a/CCGS Skill Testing Framework/skills/analysis/consistency-check.md b/CCGS Skill Testing Framework/skills/analysis/consistency-check.md
new file mode 100644
index 0000000..c978c1f
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/consistency-check.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /consistency-check
+
+## Skill Summary
+
+`/consistency-check` scans all GDDs in `design/gdd/` and checks for internal
+conflicts across documents. It produces a structured findings table with columns:
+System A vs System B, Conflict Type, Severity (HIGH / MEDIUM / LOW). Conflict
+types include: formula mismatch, competing ownership, stale reference, and
+dependency gap.
+
+The skill is read-only during analysis. It has no director gates. An optional
+consistency report can be written to `design/consistency-report-[date].md` if the
+user requests it, but the skill asks "May I write" before doing so.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
+- [ ] Does NOT require "May I write" language during analysis (read-only scan)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents that report writing is optional and requires approval
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. Consistency
+checking is a mechanical scan; no creative or technical director review is
+required as part of the scan itself.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — 4 GDDs with no conflicts
+
+**Fixture:**
+- `design/gdd/` contains exactly 4 system GDDs
+- All GDDs have consistent formulas (no overlapping variables with different values)
+- No two GDDs claim ownership of the same game entity or mechanic
+- All dependency references point to GDDs that exist
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all 4 GDDs in `design/gdd/`
+2. Runs cross-GDD consistency checks (formulas, ownership, references)
+3. No conflicts found
+4. Outputs structured findings table showing 0 issues
+5. Verdict: CONSISTENT
+
+**Assertions:**
+- [ ] All 4 GDDs are read before producing output
+- [ ] Findings table is present (even if empty — shows "No conflicts found")
+- [ ] Verdict is CONSISTENT when no conflicts exist
+- [ ] Skill does NOT write any files without user approval
+- [ ] Next-step handoff is present
+
+---
+
+### Case 2: Failure Path — Two GDDs with conflicting damage formulas
+
+**Fixture:**
+- GDD-A defines damage formula: `damage = attack * 1.5`
+- GDD-B defines damage formula: `damage = attack * 2.0` for the same entity type
+- Both GDDs refer to the same "attack" variable
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and detects the formula mismatch
+2. Findings table includes an entry: GDD-A vs GDD-B | Formula Mismatch | HIGH
+3. Specific conflicting formulas are shown (not just "formula conflict exists")
+4. Verdict: CONFLICTS FOUND
+
+**Assertions:**
+- [ ] Verdict is CONFLICTS FOUND (not CONSISTENT)
+- [ ] Conflict entry names both GDD filenames
+- [ ] Conflict type is "Formula Mismatch"
+- [ ] Severity is HIGH for a direct formula contradiction
+- [ ] Both conflicting formulas are shown in the findings table
+- [ ] Skill does NOT auto-resolve the conflict
+
+---
+
+### Case 3: Partial Path — GDD references a system with no GDD
+
+**Fixture:**
+- GDD-A's Dependencies section lists "system-B" as a dependency
+- No GDD for system-B exists in `design/gdd/`
+- All other GDDs are consistent
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and checks dependency references
+2. GDD-A's reference to "system-B" cannot be resolved — no GDD exists for it
+3. Findings table includes: GDD-A vs (missing) | Dependency Gap | MEDIUM
+4. Verdict: DEPENDENCY GAP (not CONSISTENT, not CONFLICTS FOUND)
+
+**Assertions:**
+- [ ] Verdict is DEPENDENCY GAP (distinct from CONSISTENT and CONFLICTS FOUND)
+- [ ] Findings entry names GDD-A and the missing system-B
+- [ ] Severity is MEDIUM for an unresolved dependency reference
+- [ ] Skill suggests running `/design-system system-B` to create the missing GDD
+
+---
+
+### Case 4: Edge Case — No GDDs found
+
+**Fixture:**
+- `design/gdd/` directory is empty or does not exist
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill attempts to read files in `design/gdd/`
+2. No GDD files found
+3. Skill outputs an error: "No GDDs found in `design/gdd/`. Run `/design-system` to create GDDs first."
+4. No findings table is produced
+5. No verdict is issued
+
+**Assertions:**
+- [ ] Skill outputs a clear error message when no GDDs are found
+- [ ] No verdict is produced (CONSISTENT / CONFLICTS FOUND / DEPENDENCY GAP)
+- [ ] Skill recommends the correct next action (`/design-system`)
+- [ ] Skill does NOT crash or produce a partial report
+
+---
+
+### Case 5: Director Gate — No gate spawned; no review-mode.txt read
+
+**Fixture:**
+- `design/gdd/` contains ≥2 GDDs
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and runs the consistency scan
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. No director gate agents are spawned at any point
+4. Findings table and verdict are produced normally
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
+- [ ] Review mode has no effect on this skill's behavior
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads all GDDs before producing the findings table
+- [ ] Findings table shown in full before any write ask (if report is requested)
+- [ ] Verdict is one of exactly: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
+- [ ] No director gates — no review-mode.txt read
+- [ ] Report writing (if requested) gated by "May I write" approval
+- [ ] Ends with next-step handoff appropriate to verdict
+
+---
+
+## Coverage Notes
+
+- This skill checks for structural consistency between GDDs. Deep design theory
+ analysis (pillar drift, dominant strategies) is handled by `/review-all-gdds`.
+- Formula conflict detection relies on consistent formula notation across GDDs —
+ informal descriptions of the same mechanic may not be detected.
+- The conflict severity rubric (HIGH / MEDIUM / LOW) is defined in the skill body
+ and not re-enumerated here.
diff --git a/CCGS Skill Testing Framework/skills/analysis/content-audit.md b/CCGS Skill Testing Framework/skills/analysis/content-audit.md
new file mode 100644
index 0000000..1240964
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/content-audit.md
@@ -0,0 +1,164 @@
+# Skill Test Spec: /content-audit
+
+## Skill Summary
+
+`/content-audit` reads GDDs in `design/gdd/` and checks whether all content
+items specified there (enemies, items, levels, etc.) are accounted for in
+`assets/`. It produces a gap table: Content Type → Specified Count → Found Count
+→ Missing Items. No director gates are invoked. The skill does not write without
+user approval. Verdicts: COMPLETE, GAPS FOUND, or MISSING CRITICAL CONTENT.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
+- [ ] Does NOT require "May I write" language (read-only output; write is optional report)
+- [ ] Has a next-step handoff (what to do after gap table is reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Content audit is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All specified content present
+
+**Fixture:**
+- `design/gdd/enemies.md` specifies 4 enemy types: Grunt, Sniper, Tank, Boss
+- `assets/art/characters/` contains folders: `grunt/`, `sniper/`, `tank/`, `boss/`
+- `design/gdd/items.md` specifies 3 item types; all 3 found in `assets/data/items/`
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads all GDDs in `design/gdd/`
+2. Skill scans `assets/` for each specified content item
+3. All 4 enemy types and 3 item types are found
+4. Gap table shows: all rows have Found Count = Specified Count, no missing items
+5. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] Gap table covers all content types found in GDDs
+- [ ] Each row shows Specified Count and Found Count
+- [ ] No missing items when counts match
+- [ ] Verdict is COMPLETE
+- [ ] No files are written
+
+---
+
+### Case 2: Gaps Found — Enemy type missing from assets
+
+**Fixture:**
+- `design/gdd/enemies.md` specifies 3 enemy types: Grunt, Sniper, Boss
+- `assets/art/characters/` contains: `grunt/`, `sniper/` only (Boss folder missing)
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDD — finds 3 enemy types specified
+2. Skill scans `assets/art/characters/` — finds only 2
+3. Gap table row for enemies: Specified 3, Found 2, Missing: Boss
+4. Verdict is GAPS FOUND
+
+**Assertions:**
+- [ ] Gap table row identifies "Boss" as the missing item by name
+- [ ] Specified Count (3) and Found Count (2) are both shown
+- [ ] Verdict is GAPS FOUND when any content item is missing
+- [ ] Skill does not assume the asset will be added later — it flags it now
+
+---
+
+### Case 3: No GDD Content Specs Found — Guidance given
+
+**Fixture:**
+- `design/gdd/` contains only `core-loop.md` which has no content inventory section
+- No other GDDs exist with content specifications
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads all GDDs — finds no content inventory sections
+2. Skill outputs: "No content specifications found in GDDs — run /design-system first to define content lists"
+3. No gap table is produced
+4. Verdict is GAPS FOUND (cannot confirm completeness without specs)
+
+**Assertions:**
+- [ ] Skill does not produce a gap table when no GDD content specs exist
+- [ ] Output recommends running `/design-system`
+- [ ] Verdict reflects inability to confirm completeness
+
+---
+
+### Case 4: Edge Case — Asset in wrong format for target platform
+
+**Fixture:**
+- `design/gdd/audio.md` specifies audio assets as OGG format
+- `assets/audio/sfx/jump.wav` exists (WAV format, not OGG)
+- `assets/audio/sfx/land.ogg` exists (correct format)
+- `technical-preferences.md` specifies audio format: OGG
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDD audio spec and technical preferences for format requirements
+2. Skill finds `jump.wav` — present but in wrong format
+3. Gap table row for audio: Specified 2, Found 2 (by name), but `jump.wav` flagged as FORMAT ISSUE
+4. Verdict is GAPS FOUND (format compliance is part of content completeness)
+
+**Assertions:**
+- [ ] Skill checks asset format against GDD or technical preferences when format is specified
+- [ ] `jump.wav` is flagged as FORMAT ISSUE with expected format (OGG) noted
+- [ ] Format issues are distinct from missing content in the gap table
+- [ ] Verdict is GAPS FOUND when format issues exist
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate; gap table for human review
+
+**Fixture:**
+- GDDs specify 10 content items; 9 are found in assets; 1 is missing
+- `review-mode.txt` contains `full`
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDDs and scans assets; produces gap table
+2. No director gate is invoked regardless of review mode
+3. Skill presents gap table to user as read-only output
+4. Verdict is GAPS FOUND
+5. Skill offers to write an audit report but does not write automatically
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Gap table is presented without auto-writing any file
+- [ ] Optional report write is offered but not forced
+- [ ] Skill does not modify any asset files
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads GDDs and asset directory before producing gap table
+- [ ] Gap table shows Content Type, Specified Count, Found Count, Missing Items
+- [ ] Does not write files without explicit user approval
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
+
+---
+
+## Coverage Notes
+
+- MISSING CRITICAL CONTENT verdict (vs. GAPS FOUND) is triggered when the
+ missing item is tagged as critical in the GDD; this is not explicitly tested
+ but follows the same detection path.
+- The case where `assets/` directory does not exist is not tested; the skill
+ would produce a MISSING CRITICAL CONTENT verdict for all specified items.
diff --git a/CCGS Skill Testing Framework/skills/analysis/estimate.md b/CCGS Skill Testing Framework/skills/analysis/estimate.md
new file mode 100644
index 0000000..d9a3259
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/estimate.md
@@ -0,0 +1,168 @@
+# Skill Test Spec: /estimate
+
+## Skill Summary
+
+`/estimate` estimates task or story effort using a relative-size scale (S / M /
+L / XL) based on story complexity, acceptance criteria count, and historical
+sprint velocity from past sprint files. Estimates are advisory and are never
+written automatically. No director gates are invoked. Verdicts are effort ranges,
+not pass/fail — every run produces an estimate.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains size labels: S, M, L, XL (the "verdict" equivalents for this skill)
+- [ ] Does NOT require "May I write" language (advisory output only)
+- [ ] Has a next-step handoff (how to use the estimate in sprint planning)
+
+---
+
+## Director Gate Checks
+
+None. Estimation is an advisory informational skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clear story with known tech stack
+
+**Fixture:**
+- `production/epics/combat/story-hitbox-detection.md` exists with:
+ - 4 clear Acceptance Criteria
+ - ADR reference (Accepted status)
+ - No "unknown" or "TBD" language in story body
+- `production/sprints/sprint-003.md` through `sprint-005.md` exist with velocity data
+- Tech stack is GDScript (well-understood by team per sprint history)
+
+**Input:** `/estimate production/epics/combat/story-hitbox-detection.md`
+
+**Expected behavior:**
+1. Skill reads the story file — assesses clarity, AC count, tech stack
+2. Skill reads sprint history to determine average velocity
+3. Skill outputs estimate: M (1–2 days) with reasoning
+4. No files are written
+
+**Assertions:**
+- [ ] Estimate is M for a clear, well-scoped story with known tech
+- [ ] Reasoning references AC count, tech stack familiarity, and velocity data
+- [ ] Estimate is presented as a range (e.g., "1–2 days"), not a single point
+- [ ] No files are written
+
+---
+
+### Case 2: High Uncertainty — Unknown system, no ADR yet
+
+**Fixture:**
+- `production/epics/online/story-lobby-matchmaking.md` exists with:
+ - 2 vague Acceptance Criteria (using "should" and "TBD")
+ - No ADR reference — matchmaking architecture not yet decided
+ - References new subsystem ("online/matchmaking") with no existing source files
+
+**Input:** `/estimate production/epics/online/story-lobby-matchmaking.md`
+
+**Expected behavior:**
+1. Skill reads story — finds vague AC, no ADR, no existing source
+2. Skill flags multiple uncertainty factors
+3. Estimate is L–XL with an explicit risk note: "Estimate range is wide due to architectural unknowns"
+4. Skill recommends creating an ADR before development begins
+
+**Assertions:**
+- [ ] Estimate is L or XL (not S or M) when significant unknowns exist
+- [ ] Risk note explains the specific unknowns driving the wide range
+- [ ] Output recommends resolving architectural questions first
+- [ ] No files are written
+
+---
+
+### Case 3: No Sprint Velocity Data — Conservative defaults used
+
+**Fixture:**
+- Story file exists and is well-defined
+- `production/sprints/` is empty — no historical sprints
+
+**Input:** `/estimate production/epics/core/story-save-load.md`
+
+**Expected behavior:**
+1. Skill reads story — assesses complexity
+2. Skill attempts to read sprint velocity data — finds none
+3. Skill notes: "No sprint history found — using conservative defaults for velocity"
+4. Estimate is produced using default assumptions (e.g., 1 story point = 1 day)
+5. No files are written
+
+**Assertions:**
+- [ ] Skill does not error when no sprint history exists
+- [ ] Output explicitly notes that conservative defaults are being used
+- [ ] Estimate is still produced (not blocked by missing velocity)
+- [ ] Conservative defaults produce a higher (not lower) estimate range
+
+---
+
+### Case 4: Multiple Stories — Each estimated individually plus sprint total
+
+**Fixture:**
+- User provides a sprint file: `production/sprints/sprint-007.md` with 4 stories
+- Sprint history exists (3 previous sprints)
+
+**Input:** `/estimate production/sprints/sprint-007.md`
+
+**Expected behavior:**
+1. Skill reads sprint file — identifies 4 stories
+2. Skill estimates each story individually: S, M, M, L
+3. Skill computes sprint total: approximately 6–8 story points
+4. Skill presents per-story estimates followed by sprint total
+5. No files are written
+
+**Assertions:**
+- [ ] Each story receives its own estimate label
+- [ ] Sprint total is presented after individual estimates
+- [ ] Total is a sum range derived from individual ranges
+- [ ] Skill handles sprint files (not just single story files) as input
+
+---
+
+### Case 5: Gate Compliance — No gate; estimates are informational
+
+**Fixture:**
+- Story file exists with medium complexity
+- `review-mode.txt` contains `full`
+
+**Input:** `/estimate production/epics/core/story-item-pickup.md`
+
+**Expected behavior:**
+1. Skill reads story and sprint history; computes estimate
+2. No director gate is invoked in any review mode
+3. Estimate is presented as advisory output only
+4. Skill notes: "Use this estimate in /sprint-plan when selecting stories for the next sprint"
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output is purely informational — no approval or write prompt
+- [ ] Next-step recommendation references `/sprint-plan`
+- [ ] Estimate does not change based on review mode
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads story file before estimating
+- [ ] Reads sprint velocity history when available
+- [ ] Produces effort range (S/M/L/XL), not a single number
+- [ ] Does not write any files
+- [ ] No director gates are invoked
+- [ ] Always produces an estimate (never blocked by missing data; uses defaults instead)
+
+---
+
+## Coverage Notes
+
+- The skill does not produce PASS/FAIL verdicts; the "verdict" here is the
+ effort range itself. Test assertions focus on the accuracy of the range
+ and the quality of the reasoning, not a binary outcome.
+- Team-specific velocity calibration (what "M" means for this team) is an
+ implementation detail not tested here; it is configured via sprint history.
diff --git a/CCGS Skill Testing Framework/skills/analysis/perf-profile.md b/CCGS Skill Testing Framework/skills/analysis/perf-profile.md
new file mode 100644
index 0000000..171c526
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/perf-profile.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /perf-profile
+
+## Skill Summary
+
+`/perf-profile` is a structured performance profiling workflow that identifies
+bottlenecks and recommends optimizations. If profiler data or performance logs
+are provided, it analyzes them directly. If not, it guides the user through a
+manual profiling checklist. No director gates are invoked. The skill asks
+"May I write to `production/qa/perf-[date].md`?" before persisting a report.
+Verdicts: WITHIN BUDGET, CONCERNS, or OVER BUDGET.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: WITHIN BUDGET, CONCERNS, OVER BUDGET
+- [ ] Contains "May I write" language (skill writes perf report)
+- [ ] Has a next-step handoff (what to do after performance findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Performance profiling is an advisory analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Frame data provided, draw call spike found
+
+**Fixture:**
+- User provides `production/qa/profiler-export-2026-03-15.json` with frame time data
+- Data shows: average frame time 14ms (within 16.6ms budget), but frames 42–48 spike to 28ms
+- Spike correlates with a scene with 450 draw calls (budget: 200)
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-03-15.json`
+
+**Expected behavior:**
+1. Skill reads profiler data
+2. Skill identifies average frame time is within budget
+3. Skill identifies draw call spike on frames 42–48 (450 calls vs 200 budget)
+4. Verdict is CONCERNS (average OK, but spikes indicate an issue)
+5. Skill recommends batching or culling for the identified scene
+6. Skill asks "May I write to `production/qa/perf-2026-04-06.md`?"
+
+**Assertions:**
+- [ ] Spike frames are identified by frame number
+- [ ] Draw call count and budget are compared explicitly
+- [ ] Verdict is CONCERNS when spikes exceed budget even if average is OK
+- [ ] At least one specific optimization recommendation is given
+- [ ] "May I write" prompt appears before writing report
+
+---
+
+### Case 2: No Profiler Data — Manual checklist output
+
+**Fixture:**
+- User runs `/perf-profile` with no arguments
+- No profiler data files exist in `production/qa/`
+
+**Input:** `/perf-profile`
+
+**Expected behavior:**
+1. Skill finds no profiler data
+2. Skill outputs a manual profiling checklist for the user to work through:
+ - Enable Godot profiler or target engine's profiler
+ - Record a 60-second play session
+ - Export frame time data
+ - Note any dropped frames or hitches
+3. Skill asks user to provide data once collected before running analysis
+
+**Assertions:**
+- [ ] Skill does not crash or emit a verdict when no data is provided
+- [ ] Manual profiling checklist is output (actionable steps, not just an error)
+- [ ] No verdict is emitted (there is nothing to assess yet)
+- [ ] No files are written
+
+---
+
+### Case 3: Over Budget — Frame budget exceeded for target platform
+
+**Fixture:**
+- Profiler data shows consistent 22ms frame times (target: 16.6ms for 60fps)
+- All frames exceed budget; no single spike — systemic issue
+- `technical-preferences.md` specifies target platform: PC, 60fps
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-03-20.json`
+
+**Expected behavior:**
+1. Skill reads profiler data and technical preferences for performance budget
+2. All frames are over the 16.6ms budget
+3. Verdict is OVER BUDGET
+4. Skill outputs a prioritized optimization list (e.g., LOD system, shader complexity, physics tick rate)
+5. Skill asks "May I write" before writing report
+
+**Assertions:**
+- [ ] Verdict is OVER BUDGET when all or most frames exceed budget
+- [ ] Target frame budget is read from `technical-preferences.md` (not hardcoded)
+- [ ] Optimization priority list is provided, not just the raw verdict
+- [ ] "May I write" prompt appears before report write
+
+---
+
+### Case 4: Previous Perf Report Exists — Delta comparison
+
+**Fixture:**
+- `production/qa/perf-2026-03-28.md` exists with prior results (avg 15ms, max 19ms)
+- New profiler export shows: avg 13ms, max 17ms
+- Both reports are for the same scene
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-04-05.json`
+
+**Expected behavior:**
+1. Skill reads new profiler data
+2. Skill detects prior report for the same scene
+3. Skill computes deltas: avg improved 2ms, max improved 2ms
+4. Skill presents regression check: no regressions detected
+5. Verdict is WITHIN BUDGET; report notes improvement since last profile
+
+**Assertions:**
+- [ ] Skill checks `production/qa/` for prior perf reports before writing
+- [ ] Delta comparison is shown (prior vs. current for key metrics)
+- [ ] Verdict is WITHIN BUDGET when current metrics are within budget
+- [ ] Improvement trend is noted positively in the report
+
+---
+
+### Case 5: Gate Compliance — No gate; performance-analyst separate
+
+**Fixture:**
+- Profiler data shows CONCERNS-level findings (some spikes)
+- `review-mode.txt` contains `full`
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-04-01.json`
+
+**Expected behavior:**
+1. Skill analyzes profiler data; verdict is CONCERNS
+2. No director gate is invoked regardless of review mode
+3. Output notes: "For in-depth analysis, consider running `/perf-profile` with the performance-analyst agent"
+4. Skill asks "May I write" and writes report on user approval
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Performance-analyst consultation is suggested (not mandated)
+- [ ] "May I write" prompt appears before report write
+- [ ] Verdict is CONCERNS for spike-based findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads profiler data when provided; outputs checklist when not
+- [ ] Reads `technical-preferences.md` for target platform frame budget
+- [ ] Checks for prior perf reports to enable delta comparison
+- [ ] Always asks "May I write" before writing report
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: WITHIN BUDGET, CONCERNS, OVER BUDGET
+
+---
+
+## Coverage Notes
+
+- Platform-specific profiling workflows (console, mobile) are not tested here;
+ the checklist output in Case 2 would be platform-specific in practice.
+- The delta comparison in Case 4 assumes reports cover the same scene; cross-scene
+ comparisons are not explicitly handled.
diff --git a/CCGS Skill Testing Framework/skills/analysis/scope-check.md b/CCGS Skill Testing Framework/skills/analysis/scope-check.md
new file mode 100644
index 0000000..79cf229
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/scope-check.md
@@ -0,0 +1,168 @@
+# Skill Test Spec: /scope-check
+
+## Skill Summary
+
+`/scope-check` is a Haiku-tier read-only skill that analyzes a feature, sprint,
+or story for scope creep risk. It reads sprint and story files and compares them
+against the active milestone goals. It is designed for fast, low-cost checks
+before or during planning. No director gates are invoked. No files are written.
+Verdicts: ON SCOPE, CONCERNS, or SCOPE CREEP DETECTED.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff (what to do based on verdict)
+
+---
+
+## Director Gate Checks
+
+None. Scope check is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Sprint stories align with milestone goals
+
+**Fixture:**
+- `production/milestones/milestone-03.md` lists 3 goals: combat system, enemy AI, level loading
+- `production/sprints/sprint-006.md` contains 5 stories, all tagged to one of the 3 goals
+- `production/session-state/active.md` references milestone-03 as the active milestone
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads active milestone goals from milestone-03
+2. Skill reads sprint-006 stories and checks each against milestone goals
+3. All 5 stories map to one of the 3 goals
+4. Skill outputs a mapping table: story → milestone goal
+5. Verdict is ON SCOPE
+
+**Assertions:**
+- [ ] Each story is mapped to a milestone goal in the output
+- [ ] Verdict is ON SCOPE when all stories map to milestone goals
+- [ ] No files are written
+- [ ] Skill does not modify sprint or milestone files
+
+---
+
+### Case 2: Scope Creep Detected — Stories introducing systems not in milestone
+
+**Fixture:**
+- `production/milestones/milestone-03.md` goals: combat, enemy AI, level loading
+- `production/sprints/sprint-006.md` contains 5 stories:
+ - 3 stories map to milestone goals
+ - 2 stories reference "online leaderboard" and "achievement system" (not in milestone-03)
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads milestone goals and sprint stories
+2. Skill identifies 2 stories with no matching milestone goal
+3. Skill names the out-of-scope stories: "Online Leaderboard Feature", "Achievement System Setup"
+4. Verdict is SCOPE CREEP DETECTED
+
+**Assertions:**
+- [ ] Out-of-scope stories are named explicitly in the output
+- [ ] Verdict is SCOPE CREEP DETECTED when any story has no milestone goal match
+- [ ] Skill does not automatically remove the stories — findings are advisory
+- [ ] Output recommends deferring the out-of-scope stories to a later milestone
+
+---
+
+### Case 3: No Milestone Defined — CONCERNS; scope cannot be validated
+
+**Fixture:**
+- `production/session-state/active.md` has no milestone reference
+- `production/milestones/` directory exists but is empty
+- `production/sprints/sprint-006.md` has 4 stories
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads active.md — finds no milestone reference
+2. Skill checks `production/milestones/` — no milestone files found
+3. Skill outputs: "No active milestone defined — scope cannot be validated"
+4. Verdict is CONCERNS
+
+**Assertions:**
+- [ ] Skill does not error when no milestone is defined
+- [ ] Output explicitly states that scope validation requires a milestone reference
+- [ ] Verdict is CONCERNS (not ON SCOPE or SCOPE CREEP DETECTED without data)
+- [ ] Output suggests running `/milestone-review` or creating a milestone
+
+---
+
+### Case 4: Single Story Check — Evaluated against its parent epic
+
+**Fixture:**
+- User targets a single story: `production/epics/combat/story-parry-timing.md`
+- Story references parent epic: `epic-combat.md`
+- `production/epics/combat/epic-combat.md` has scope: "melee combat mechanics"
+- Story title: "Implement parry timing window" — matches epic scope
+
+**Input:** `/scope-check production/epics/combat/story-parry-timing.md`
+
+**Expected behavior:**
+1. Skill reads the specified story file
+2. Skill reads the parent epic to get scope definition
+3. Skill evaluates story against epic scope — "parry timing" matches "melee combat"
+4. Verdict is ON SCOPE
+
+**Assertions:**
+- [ ] Single-file argument is accepted (story path, not sprint)
+- [ ] Skill reads the parent epic referenced in the story file
+- [ ] Story is evaluated against epic scope (not milestone scope) in single-story mode
+- [ ] Verdict is ON SCOPE when story matches epic scope
+
+---
+
+### Case 5: Gate Compliance — No gate; PR may be consulted separately
+
+**Fixture:**
+- Sprint has 2 SCOPE CREEP stories and 3 ON SCOPE stories
+- `review-mode.txt` contains `full`
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads milestone and sprint; identifies 2 scope creep items
+2. No director gate is invoked regardless of review mode
+3. Skill presents findings with SCOPE CREEP DETECTED verdict
+4. Output notes: "Consider raising scope concerns with the Producer before sprint begins"
+5. Skill ends without writing any files
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Producer consultation is suggested (not mandated)
+- [ ] No files are written
+- [ ] Verdict is SCOPE CREEP DETECTED
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads milestone goals and sprint/story files before analysis
+- [ ] Maps each story to a milestone goal (or flags as unmapped)
+- [ ] Does not write any files
+- [ ] No director gates are invoked
+- [ ] Runs on Haiku model tier (fast, low-cost)
+- [ ] Verdict is one of: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
+
+---
+
+## Coverage Notes
+
+- The case where the sprint file itself does not exist is not tested; the
+ skill would output a CONCERNS verdict with a message about missing sprint data.
+- Partial scope overlap (story touches a milestone goal but also introduces
+ new scope) is not explicitly tested; implementation may classify this as
+ CONCERNS rather than SCOPE CREEP DETECTED.
diff --git a/CCGS Skill Testing Framework/skills/analysis/security-audit.md b/CCGS Skill Testing Framework/skills/analysis/security-audit.md
new file mode 100644
index 0000000..1dcb85f
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/security-audit.md
@@ -0,0 +1,167 @@
+# Skill Test Spec: /security-audit
+
+## Skill Summary
+
+`/security-audit` audits the game for security risks including save data
+integrity, network communication, anti-cheat exposure, and data privacy. It
+reads source files in `src/` for security patterns and checks whether sensitive
+data is handled correctly. No director gates are invoked. The skill does not
+write files (findings report only). Verdicts: SECURE, CONCERNS, or
+VULNERABILITIES FOUND.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: SECURE, CONCERNS, VULNERABILITIES FOUND
+- [ ] Does NOT require "May I write" language (read-only; findings report only)
+- [ ] Has a next-step handoff (what to do with findings)
+
+---
+
+## Director Gate Checks
+
+None. Security audit is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Save data encrypted, no hardcoded credentials
+
+**Fixture:**
+- `src/core/save_system.gd` uses `Crypto` class to encrypt save data before writing
+- No hardcoded API keys, passwords, or credentials in any `src/` file
+- No version numbers or internal build IDs exposed in client-facing output
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/` for security patterns: encryption usage, hardcoded credentials, exposed internals
+2. All checks pass: save data encrypted, no credentials found, no exposed internals
+3. Findings report shows all checks PASS
+4. Verdict is SECURE
+
+**Assertions:**
+- [ ] Skill checks save data handling for encryption usage
+- [ ] Skill scans for hardcoded credentials (API keys, passwords, tokens)
+- [ ] Skill checks for version/build numbers exposed to players
+- [ ] All checks shown in findings report
+- [ ] Verdict is SECURE when all checks pass
+
+---
+
+### Case 2: Vulnerabilities Found — Unencrypted save data and exposed version
+
+**Fixture:**
+- `src/core/save_system.gd` writes save data as plain JSON (no encryption)
+- `src/ui/debug_overlay.gd` contains: `label.text = "Build: " + ProjectSettings.get("application/config/version")`
+ (exposes internal build version to player)
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/` — finds unencrypted save write in `save_system.gd`
+2. Skill finds exposed version string in `debug_overlay.gd`
+3. Both findings are flagged as VULNERABILITIES
+4. Verdict is VULNERABILITIES FOUND
+5. Skill provides remediation recommendations for each vulnerability
+
+**Assertions:**
+- [ ] Unencrypted save data is flagged as a vulnerability with file and approximate line
+- [ ] Exposed version string is flagged as a vulnerability
+- [ ] Remediation suggestion is given for each vulnerability
+- [ ] Verdict is VULNERABILITIES FOUND when any vulnerability is detected
+- [ ] No files are written or modified
+
+---
+
+### Case 3: Online Features Without Authentication — CONCERNS
+
+**Fixture:**
+- `src/networking/lobby.gd` exists with functions: `join_lobby()`, `send_chat()`
+- No authentication check is found before `send_chat()` — players can call it without being verified
+- Game has online multiplayer features (inferred from file presence)
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/networking/` — detects online feature code
+2. Skill checks for authentication guard before network calls — finds none on `send_chat()`
+3. Flags: "Online feature without authentication check — CONCERNS"
+4. Verdict is CONCERNS (not VULNERABILITIES FOUND, as this is a missing control, not an exploit)
+
+**Assertions:**
+- [ ] Skill detects online features by scanning for networking source files
+- [ ] Missing authentication checks before network operations are flagged
+- [ ] Verdict is CONCERNS (advisory severity) for missing authentication guards
+- [ ] Output recommends adding authentication before network calls
+
+---
+
+### Case 4: Edge Case — No Source Files to Analyze
+
+**Fixture:**
+- `src/` directory does not exist or is completely empty
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill attempts to scan `src/` — no files found
+2. Skill outputs an error: "No source files found in `src/` — nothing to audit"
+3. No findings report is generated
+4. No verdict is emitted
+
+**Assertions:**
+- [ ] Skill does not crash when `src/` is empty or absent
+- [ ] Output clearly states that no source files were found
+- [ ] No verdict is emitted (there is nothing to assess)
+- [ ] Skill suggests verifying the `src/` directory path
+
+---
+
+### Case 5: Gate Compliance — No gate; security-engineer invoked separately
+
+**Fixture:**
+- Source files exist; 1 CONCERNS-level finding detected (debug logging enabled in release build)
+- `review-mode.txt` contains `full`
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans source; finds debug logging active in release path
+2. No director gate is invoked regardless of review mode
+3. Verdict is CONCERNS
+4. Output notes: "For formal security review, consider engaging a security-engineer agent"
+5. Findings are presented as a read-only report; no files written
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Security-engineer consultation is suggested (not mandated)
+- [ ] No files are written
+- [ ] Verdict is CONCERNS for advisory-level security findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads source files in `src/` before auditing
+- [ ] Checks save data encryption, hardcoded credentials, exposed internals, auth guards
+- [ ] Provides remediation recommendations for each finding
+- [ ] Does not write any files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: SECURE, CONCERNS, VULNERABILITIES FOUND
+
+---
+
+## Coverage Notes
+
+- Anti-cheat analysis (client-side value validation, server authority) is not
+ explicitly tested here; it follows the CONCERNS or VULNERABILITIES pattern
+ depending on severity.
+- Data privacy compliance (GDPR, COPPA) is out of scope for this spec; those
+ require legal review beyond code scanning.
diff --git a/CCGS Skill Testing Framework/skills/analysis/tech-debt.md b/CCGS Skill Testing Framework/skills/analysis/tech-debt.md
new file mode 100644
index 0000000..d8caff2
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/tech-debt.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /tech-debt
+
+## Skill Summary
+
+`/tech-debt` tracks, categorizes, and prioritizes technical debt across the
+codebase. It reads `docs/tech-debt-register.md` for the existing debt register
+and scans source files in `src/` for inline `TODO` and `FIXME` comments. It
+merges and sorts items by severity. No director gates are invoked. The skill
+asks "May I write to `docs/tech-debt-register.md`?" before updating. Verdicts:
+REGISTER UPDATED or NO NEW DEBT FOUND.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: REGISTER UPDATED, NO NEW DEBT FOUND
+- [ ] Contains "May I write" language (skill writes to debt register)
+- [ ] Has a next-step handoff (what to do after register is updated)
+
+---
+
+## Director Gate Checks
+
+None. Tech debt tracking is an internal codebase analysis skill; no gates are
+invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Inline TODOs plus existing register items merged
+
+**Fixture:**
+- `docs/tech-debt-register.md` exists with 2 items (LOW and MEDIUM severity)
+- `src/gameplay/combat.gd` has 2 `# TODO` comments and 1 `# FIXME` comment
+- `src/ui/hud.gd` has 0 inline debt comments
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill reads `docs/tech-debt-register.md` — finds 2 existing items
+2. Skill scans `src/` — finds 3 inline comments (2 TODOs, 1 FIXME)
+3. Skill checks whether inline comments already exist in the register (deduplication)
+4. Skill presents combined list sorted by severity (FIXME before TODO by default)
+5. Skill asks "May I write to `docs/tech-debt-register.md`?"
+6. User approves; register updated; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] Inline comments are found by scanning `src/` recursively
+- [ ] Existing register items are not duplicated
+- [ ] Combined list is sorted by severity
+- [ ] "May I write" prompt appears before any write
+- [ ] Verdict is REGISTER UPDATED
+
+---
+
+### Case 2: Register Doesn't Exist — Offered to create it
+
+**Fixture:**
+- `docs/tech-debt-register.md` does NOT exist
+- `src/` contains 4 inline TODO/FIXME comments
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill attempts to read `docs/tech-debt-register.md` — not found
+2. Skill informs user: "No tech-debt-register.md found"
+3. Skill offers to create the register with the inline items it found
+4. Skill asks "May I write to `docs/tech-debt-register.md`?" (create)
+5. User approves; register created with 4 items; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] Skill does not crash when register file is absent
+- [ ] User is offered register creation (not silently skipping)
+- [ ] "May I write" prompt reflects file creation (not update)
+- [ ] Verdict is REGISTER UPDATED after creation
+
+---
+
+### Case 3: Resolved Item Detected — Marked resolved in register
+
+**Fixture:**
+- `docs/tech-debt-register.md` has 3 items; one references `src/gameplay/legacy_input.gd`
+- `src/gameplay/legacy_input.gd` has been deleted (refactored away)
+- The referenced TODO comment no longer exists in source
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill reads register — finds 3 items
+2. Skill scans `src/` — does not find the source location referenced by item 2
+3. Skill flags item 2 as RESOLVED (source is gone)
+4. Skill presents the resolved item to user for confirmation
+5. On approval, register is updated with item 2 marked `Status: Resolved`
+
+**Assertions:**
+- [ ] Skill checks whether each register item's source reference still exists
+- [ ] Missing source locations result in items being flagged as RESOLVED
+- [ ] User confirms before resolved items are written
+- [ ] RESOLVED items are kept in the register (not deleted) for audit history
+
+---
+
+### Case 4: Edge Case — CRITICAL debt item surfaces prominently
+
+**Fixture:**
+- `src/core/network_sync.gd` has a comment: `# FIXME(CRITICAL): race condition in sync buffer — can corrupt save data`
+- `docs/tech-debt-register.md` exists with 5 lower-severity items
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill scans source and finds the CRITICAL-tagged FIXME
+2. Skill presents the CRITICAL item at the top of the output — before the full table
+3. Skill asks user to acknowledge the critical item before proceeding
+4. After acknowledgment, skill presents full debt table and asks to write
+5. Register is updated with CRITICAL item at top; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] CRITICAL items appear at the top of the output, not buried in the table
+- [ ] Skill surfaces CRITICAL items before asking to write
+- [ ] User acknowledgment of the CRITICAL item is requested
+- [ ] CRITICAL severity is preserved in the written register entry
+
+---
+
+### Case 5: Gate Compliance — No gate; register updated only with approval
+
+**Fixture:**
+- Inline scan finds 2 new TODOs; register has 3 existing items
+- `review-mode.txt` contains `full`
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill scans source and reads register; compiles combined debt list
+2. No director gate is invoked regardless of review mode
+3. Skill presents sorted debt table to user
+4. Skill asks "May I write to `docs/tech-debt-register.md`?"
+5. User approves; register updated; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Debt table is presented before any write prompt
+- [ ] "May I write" prompt appears before file update
+- [ ] Write only occurs with explicit user approval
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `docs/tech-debt-register.md` and scans `src/` before compiling
+- [ ] Deduplicates inline comments against existing register items
+- [ ] Sorts combined list by severity
+- [ ] Always asks "May I write" before updating register
+- [ ] No director gates are invoked
+- [ ] Verdict is REGISTER UPDATED or NO NEW DEBT FOUND
+
+---
+
+## Coverage Notes
+
+- The case where `src/` is empty or absent is not tested; behavior follows
+ the NO NEW DEBT FOUND path for the inline scan, but register items would
+ still be read and presented.
+- TODO comments without severity tags are treated as LOW severity by default;
+ this classification detail is an implementation concern, not tested here.
diff --git a/CCGS Skill Testing Framework/skills/analysis/test-evidence-review.md b/CCGS Skill Testing Framework/skills/analysis/test-evidence-review.md
new file mode 100644
index 0000000..2cfdad3
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/test-evidence-review.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /test-evidence-review
+
+## Skill Summary
+
+`/test-evidence-review` performs a quality review of test files in `tests/`,
+checking test naming conventions, determinism, isolation, and absence of
+hardcoded magic numbers — all against the project's test standards defined in
+`coding-standards.md`. Findings may be flagged for qa-lead review. No director
+gates are invoked. The skill does not write without user approval. Verdicts:
+PASS, WARNINGS, or FAIL.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: PASS, WARNINGS, FAIL
+- [ ] Does NOT require "May I write" language (read-only; write is optional flagging report)
+- [ ] Has a next-step handoff (what to do after findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Test evidence review is an advisory quality skill; QL-TEST-COVERAGE gate
+is a separate skill invocation and is NOT triggered here.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Tests follow all standards
+
+**Fixture:**
+- `tests/unit/combat/health_system_take_damage_test.gd` exists with:
+ - Naming: `test_health_system_take_damage_reduces_health()` (follows `test_[system]_[scenario]_[expected]`)
+ - Arrange/Act/Assert structure present
+ - No `sleep()`, `await` with time values, or random seeds
+ - No calls to external APIs or file I/O
+ - No inline magic numbers (uses constants from `tests/unit/combat/fixtures/`)
+
+**Input:** `/test-evidence-review tests/unit/combat/`
+
+**Expected behavior:**
+1. Skill reads test standards from `coding-standards.md`
+2. Skill reads the test file; checks all 5 standards
+3. All checks pass: naming, structure, determinism, isolation, no hardcoded data
+4. Verdict is PASS
+
+**Assertions:**
+- [ ] Each of the 5 test standards is checked and reported
+- [ ] All checks show PASS when standards are met
+- [ ] Verdict is PASS
+- [ ] No files are written
+
+---
+
+### Case 2: Fail — Timing dependency detected
+
+**Fixture:**
+- `tests/unit/ui/hud_update_test.gd` contains:
+ ```gdscript
+ await get_tree().create_timer(1.0).timeout
+ assert_eq(label.text, "Ready")
+ ```
+- Real-time wait of 1 second used instead of mock or signal-based assertion
+
+**Input:** `/test-evidence-review tests/unit/ui/hud_update_test.gd`
+
+**Expected behavior:**
+1. Skill reads the test file
+2. Skill detects real-time wait (`create_timer(1.0)`) — non-deterministic timing dependency
+3. Skill flags this as a FAIL-level finding
+4. Verdict is FAIL
+5. Skill recommends replacing the timer with a signal-based assertion or mock
+
+**Assertions:**
+- [ ] Real-time wait usage is detected as a non-deterministic timing dependency
+- [ ] Finding is classified as FAIL severity (blocking — violates determinism standard)
+- [ ] Verdict is FAIL
+- [ ] Remediation suggestion references signal-based or mock-based approach
+- [ ] Skill does not edit the test file
+
+---
+
+### Case 3: Fail — Test calls external API directly
+
+**Fixture:**
+- `tests/unit/networking/auth_test.gd` contains:
+ ```gdscript
+ var result = HTTPRequest.new().request("https://api.example.com/auth")
+ ```
+- Direct HTTP call to external API without a mock
+
+**Input:** `/test-evidence-review tests/unit/networking/auth_test.gd`
+
+**Expected behavior:**
+1. Skill reads the test file
+2. Skill detects direct external API call (HTTPRequest to live URL)
+3. Skill flags this as a FAIL-level finding — violates isolation standard
+4. Verdict is FAIL
+5. Skill recommends injecting a mock HTTP client
+
+**Assertions:**
+- [ ] Direct external API call is detected and flagged
+- [ ] Finding is classified as FAIL severity (violates isolation standard)
+- [ ] Verdict is FAIL
+- [ ] Remediation references dependency injection with a mock HTTP client
+- [ ] Skill does not modify the test file
+
+---
+
+### Case 4: Edge Case — No Test Files Found
+
+**Fixture:**
+- User calls `/test-evidence-review tests/unit/audio/`
+- `tests/unit/audio/` directory does not exist
+
+**Input:** `/test-evidence-review tests/unit/audio/`
+
+**Expected behavior:**
+1. Skill attempts to read files in `tests/unit/audio/` — not found
+2. Skill outputs: "No test files found at `tests/unit/audio/` — run `/test-setup` to scaffold test directories"
+3. No verdict is emitted
+
+**Assertions:**
+- [ ] Skill does not crash when path does not exist
+- [ ] Output names the attempted path in the message
+- [ ] Output recommends `/test-setup` for scaffolding
+- [ ] No verdict is emitted when there is nothing to review
+
+---
+
+### Case 5: Gate Compliance — No gate; QL-TEST-COVERAGE is a separate skill
+
+**Fixture:**
+- Test file has 1 WARNINGS-level finding (magic number in a non-boundary test)
+- `review-mode.txt` contains `full`
+
+**Input:** `/test-evidence-review tests/unit/combat/`
+
+**Expected behavior:**
+1. Skill reviews tests; finds 1 WARNINGS-level finding
+2. No director gate is invoked (QL-TEST-COVERAGE is invoked separately, not here)
+3. Verdict is WARNINGS
+4. Output notes: "For full test coverage gate, run `/gate-check` which invokes QL-TEST-COVERAGE"
+5. Skill offers optional report write; asks "May I write" if user opts in
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Output distinguishes this skill from the QL-TEST-COVERAGE gate invocation
+- [ ] Optional report requires "May I write" before writing
+- [ ] Verdict is WARNINGS for advisory-level test quality issues
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `coding-standards.md` test standards before reviewing test files
+- [ ] Checks naming, Arrange/Act/Assert structure, determinism, isolation, no hardcoded data
+- [ ] Does not edit any test files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: PASS, WARNINGS, FAIL
+
+---
+
+## Coverage Notes
+
+- Batch review of all test files in `tests/` is not explicitly tested; behavior
+ is assumed to apply the same checks file by file and aggregate the verdict.
+- The QL-TEST-COVERAGE director gate (which checks test coverage percentage) is
+ a separate concern and is intentionally NOT invoked by this skill.
diff --git a/CCGS Skill Testing Framework/skills/analysis/test-flakiness.md b/CCGS Skill Testing Framework/skills/analysis/test-flakiness.md
new file mode 100644
index 0000000..0e67623
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/analysis/test-flakiness.md
@@ -0,0 +1,177 @@
+# Skill Test Spec: /test-flakiness
+
+## Skill Summary
+
+`/test-flakiness` detects non-deterministic tests by analyzing test history logs
+(if available) or scanning test source code for common flakiness patterns (random
+numbers without seeds, real-time waits, external I/O). No director gates are
+invoked. The skill does not write without user approval. Verdicts: NO FLAKINESS,
+SUSPECT TESTS FOUND, or CONFIRMED FLAKY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
+- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
+- [ ] Has a next-step handoff (what to do after flakiness findings)
+
+---
+
+## Director Gate Checks
+
+None. Flakiness detection is an advisory quality skill for the QA lead; no gates
+are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clean test history, no flakiness
+
+**Fixture:**
+- `production/qa/test-history/` contains logs for 10 test runs
+- All tests pass consistently across all 10 runs (100% pass rate per test)
+- No test has a failure pattern
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill reads test history logs from `production/qa/test-history/`
+2. Skill computes per-test pass rate across 10 runs
+3. All tests pass all 10 runs — no inconsistency detected
+4. Verdict is NO FLAKINESS
+
+**Assertions:**
+- [ ] Skill reads test history logs when available
+- [ ] Per-test pass rate is computed across all available runs
+- [ ] Verdict is NO FLAKINESS when all tests pass consistently
+- [ ] No files are written
+
+---
+
+### Case 2: Suspect Tests Found — Test fails intermittently in history
+
+**Fixture:**
+- `production/qa/test-history/` contains logs for 10 test runs
+- `test_combat_damage_applies_crit_multiplier` passes 7 times, fails 3 times
+- Failure messages differ (sometimes timeout, sometimes wrong value)
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill reads test history logs — computes pass rates
+2. `test_combat_damage_applies_crit_multiplier` has 70% pass rate (threshold: 95%)
+3. Skill flags it as SUSPECT with pass rate (7/10) and failure pattern noted
+4. Verdict is SUSPECT TESTS FOUND
+5. Skill recommends investigating the test for timing or state dependencies
+
+**Assertions:**
+- [ ] Tests below the pass-rate threshold are flagged by name
+- [ ] Pass rate (fraction and percentage) is shown for each suspect test
+- [ ] Failure pattern (e.g., inconsistent error messages) is noted if detectable
+- [ ] Verdict is SUSPECT TESTS FOUND
+- [ ] Skill recommends investigation steps
+
+---
+
+### Case 3: Source Pattern — Random number used without seed
+
+**Fixture:**
+- No test history logs exist
+- `tests/unit/loot/loot_drop_test.gd` contains:
+ ```gdscript
+ var roll = randf() # unseeded random — non-deterministic
+ assert_gt(roll, 0.5, "Loot should drop above 50%")
+ ```
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill finds no test history logs
+2. Skill falls back to source code analysis
+3. Skill detects `randf()` call without a preceding `seed()` call
+4. Skill flags the test as FLAKINESS RISK (source pattern, not confirmed)
+5. Verdict is SUSPECT TESTS FOUND (pattern detected, not confirmed by history)
+6. Skill recommends seeding random before the call or mocking the random function
+
+**Assertions:**
+- [ ] Source code analysis is used as fallback when no history logs exist
+- [ ] Unseeded random number usage is detected as a flakiness risk
+- [ ] Verdict is SUSPECT TESTS FOUND (not CONFIRMED FLAKY — no history to confirm)
+- [ ] Remediation recommends seeding or mocking
+
+---
+
+### Case 4: No Test History — Source-only analysis with common patterns
+
+**Fixture:**
+- `production/qa/test-history/` does not exist
+- `tests/` contains 15 test files
+- Scan finds 2 tests using `OS.get_ticks_msec()` for timing assertions
+- No other flakiness patterns found
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill checks for test history — not found
+2. Skill notes: "No test history available — analyzing source code for flakiness patterns only"
+3. Skill scans all test files for known patterns: unseeded random, real-time waits, system clock usage
+4. Finds 2 tests using `OS.get_ticks_msec()` — flags as FLAKINESS RISK
+5. Verdict is SUSPECT TESTS FOUND
+
+**Assertions:**
+- [ ] Skill notes clearly that source-only analysis is being performed (no history)
+- [ ] Common flakiness patterns are scanned: random, time-based assertions, external I/O
+- [ ] `OS.get_ticks_msec()` usage for assertions is flagged as a flakiness risk
+- [ ] Verdict is SUSPECT TESTS FOUND when source patterns are found
+
+---
+
+### Case 5: Gate Compliance — No gate; flakiness report is advisory
+
+**Fixture:**
+- Test history shows 1 CONFIRMED FLAKY test (fails 6 out of 10 runs)
+- `review-mode.txt` contains `full`
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill analyzes test history; identifies 1 confirmed flaky test
+2. No director gate is invoked regardless of review mode
+3. Verdict is CONFIRMED FLAKY
+4. Skill presents findings and offers optional written report
+5. If user opts in: "May I write to `production/qa/flakiness-report-[date].md`?"
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] CONFIRMED FLAKY verdict requires history-based evidence (not just source patterns)
+- [ ] Optional report requires "May I write" before writing
+- [ ] Flakiness report is advisory for qa-lead; skill does not auto-disable tests
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads test history logs when available; falls back to source analysis when not
+- [ ] Notes clearly which analysis mode is being used (history vs. source-only)
+- [ ] Flakiness threshold (e.g., 95% pass rate) is used for SUSPECT classification
+- [ ] CONFIRMED FLAKY requires history evidence; SUSPECT covers source patterns only
+- [ ] Does not disable or modify any test files
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
+
+---
+
+## Coverage Notes
+
+- The pass-rate threshold for SUSPECT classification (95% suggested above) is an
+ implementation detail; the tests verify that intermittent failures are flagged,
+ not the exact threshold value.
+- Tests that fail due to environment issues (missing assets, wrong platform) are
+ not flakiness — the skill distinguishes environment failures from non-determinism
+ in the test itself; this distinction is not explicitly tested here.
diff --git a/CCGS Skill Testing Framework/skills/authoring/architecture-decision.md b/CCGS Skill Testing Framework/skills/authoring/architecture-decision.md
new file mode 100644
index 0000000..db1cf21
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/authoring/architecture-decision.md
@@ -0,0 +1,197 @@
+# Skill Test Spec: /architecture-decision
+
+## Skill Summary
+
+`/architecture-decision` guides the user through section-by-section authoring of
+a new Architecture Decision Record (ADR). Required sections are: Status, Context,
+Decision, Consequences, Alternatives, and Related ADRs. The skill also stamps the
+engine version reference from `docs/engine-reference/` into the ADR for traceability.
+
+In `full` review mode, TD-ADR (technical-director) and LP-FEASIBILITY
+(lead-programmer) gate agents spawn after the draft is complete. If both gates
+return APPROVED, the ADR status is set to Accepted. In `lean` or `solo` mode,
+both gates are skipped and the ADR is written with Status: Proposed. The skill
+asks "May I write" per section during authoring. ADRs are written to
+`docs/architecture/adr-NNN-[name].md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: ACCEPTED, PROPOSED, CONCERNS
+- [ ] Contains "May I write" collaborative protocol language (per-section approval)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents gate behavior: TD-ADR + LP-FEASIBILITY in full mode; skipped in lean/solo
+- [ ] Documents that ADR status is Accepted (full, gates approve) or Proposed (otherwise)
+- [ ] Mentions engine version stamp from `docs/engine-reference/`
+
+---
+
+## Director Gate Checks
+
+In `full` mode: TD-ADR (technical-director) and LP-FEASIBILITY (lead-programmer)
+spawn after the ADR draft is complete. If both return APPROVED, ADR Status is set
+to Accepted. If either returns CONCERNS or FAIL, ADR stays Proposed.
+
+In `lean` mode: both gates are skipped. ADR is written with Status: Proposed.
+Output notes: "TD-ADR skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
+
+In `solo` mode: both gates are skipped. ADR is written with Status: Proposed.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New ADR for rendering approach, full mode, gates approve
+
+**Fixture:**
+- `docs/architecture/` exists with no existing ADR for rendering
+- `docs/engine-reference/[engine]/VERSION.md` exists
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/architecture-decision rendering-approach`
+
+**Expected behavior:**
+1. Skill guides user through each required section (Status, Context, Decision, Consequences, Alternatives, Related ADRs)
+2. Engine version is stamped into the ADR from `docs/engine-reference/`
+3. For each section: draft shown, "May I write this section?" asked, approved
+4. After all sections: TD-ADR and LP-FEASIBILITY gates spawn in parallel
+5. Both gates return APPROVED
+6. ADR Status is set to Accepted
+7. Skill writes `docs/architecture/adr-NNN-rendering-approach.md`
+8. `docs/architecture/tr-registry.yaml` updated if new TR-IDs are defined
+
+**Assertions:**
+- [ ] All 6 required sections are authored and written
+- [ ] Engine version reference is stamped in the ADR
+- [ ] TD-ADR and LP-FEASIBILITY spawn in parallel (not sequentially)
+- [ ] ADR Status is Accepted when both gates return APPROVED in full mode
+- [ ] "May I write" is asked per section during authoring
+- [ ] File is written to `docs/architecture/adr-NNN-[name].md`
+
+---
+
+### Case 2: Failure Path — TD-ADR returns CONCERNS
+
+**Fixture:**
+- ADR draft is complete (all sections filled)
+- `production/session-state/review-mode.txt` contains `full`
+- TD-ADR gate returns CONCERNS: "The decision does not address [specific concern]"
+
+**Input:** `/architecture-decision [topic]`
+
+**Expected behavior:**
+1. TD-ADR gate spawns and returns CONCERNS with specific feedback
+2. Skill surfaces the concerns to the user
+3. ADR Status remains Proposed (not Accepted)
+4. User is asked: revise the decision to address concerns, or accept as Proposed
+5. ADR is written with Status: Proposed if concerns are not resolved
+
+**Assertions:**
+- [ ] TD-ADR concerns are shown to the user verbatim
+- [ ] ADR Status is Proposed (not Accepted) when TD-ADR returns CONCERNS
+- [ ] Skill does NOT set Status: Accepted while CONCERNS are unresolved
+- [ ] User is given the option to revise and re-run the gate
+
+---
+
+### Case 3: Lean Mode — Both gates skipped; ADR written as Proposed
+
+**Fixture:**
+- `production/session-state/review-mode.txt` contains `lean`
+- ADR draft is authored for a new technical decision
+
+**Input:** `/architecture-decision [topic]`
+
+**Expected behavior:**
+1. Skill guides user through all 6 sections
+2. After draft is complete: both TD-ADR and LP-FEASIBILITY are skipped
+3. Output notes: "TD-ADR skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode"
+4. ADR is written with Status: Proposed (not Accepted, since gates did not approve)
+5. "May I write" is still asked before the final file write
+
+**Assertions:**
+- [ ] Both gate skip notes appear in output
+- [ ] ADR Status is Proposed (not Accepted) in lean mode
+- [ ] "May I write" is still asked before writing the file
+- [ ] Skill writes the ADR after user approval
+
+---
+
+### Case 4: Edge Case — ADR already exists for this topic
+
+**Fixture:**
+- `docs/architecture/` contains an existing ADR covering the same topic
+- The existing ADR has Status: Accepted
+
+**Input:** `/architecture-decision [same-topic]`
+
+**Expected behavior:**
+1. Skill detects an existing ADR covering the same topic
+2. Skill asks: "An ADR for [topic] already exists ([filename]). Update it, or create a new superseding ADR?"
+3. User selects update or supersede
+4. Skill does NOT silently create a duplicate ADR
+
+**Assertions:**
+- [ ] Skill detects the existing ADR before authoring begins
+- [ ] User is offered update or supersede options — no silent duplicate
+- [ ] If update: skill opens the existing ADR for section-by-section revision
+- [ ] If supersede: new ADR references the superseded one in Related ADRs section
+
+---
+
+### Case 5: Director Gate — Status set correctly based on mode and gate outcome
+
+**Fixture:**
+- ADR draft is complete
+- Two scenarios: (a) full mode, both gates APPROVED; (b) full mode, one gate CONCERNS
+
+**Full mode, both APPROVED:**
+- ADR Status is set to Accepted
+
+**Assertions (both approved):**
+- [ ] ADR frontmatter/header shows `Status: Accepted`
+- [ ] Both TD-ADR and LP-FEASIBILITY appear as APPROVED in output
+
+**Full mode, one gate returns CONCERNS:**
+- ADR Status stays Proposed
+
+**Assertions (CONCERNS):**
+- [ ] ADR frontmatter/header shows `Status: Proposed`
+- [ ] Concerns are listed in output
+- [ ] Skill does NOT set Status: Accepted when any gate returns CONCERNS
+
+**Lean/solo mode:**
+- ADR Status is always Proposed regardless of content quality
+
+**Assertions (lean/solo):**
+- [ ] ADR Status is Proposed in lean mode
+- [ ] ADR Status is Proposed in solo mode
+- [ ] No gate output appears in lean or solo mode
+
+---
+
+## Protocol Compliance
+
+- [ ] All 6 required sections authored before gate review
+- [ ] Engine version stamped in ADR from `docs/engine-reference/`
+- [ ] "May I write" asked per section during authoring
+- [ ] TD-ADR and LP-FEASIBILITY spawn in parallel in full mode
+- [ ] Skipped gates noted by name and mode in lean/solo output
+- [ ] ADR Status: Accepted only when full mode AND both gates APPROVED
+- [ ] Ends with next-step handoff: `/architecture-review` or `/create-control-manifest`
+
+---
+
+## Coverage Notes
+
+- ADR numbering (auto-incrementing NNN) is not independently fixture-tested —
+ the skill reads existing ADR filenames to assign the next number.
+- Related ADRs section linking (supersedes / related-to) is tested structurally
+ via Case 4 but not all link types are individually verified.
+- The TR-registry update (when new TR-IDs are defined in the ADR) is part of the
+ write phase — tested implicitly via Case 1.
diff --git a/CCGS Skill Testing Framework/skills/authoring/art-bible.md b/CCGS Skill Testing Framework/skills/authoring/art-bible.md
new file mode 100644
index 0000000..dae2efe
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/authoring/art-bible.md
@@ -0,0 +1,185 @@
+# Skill Test Spec: /art-bible
+
+## Skill Summary
+
+`/art-bible` is a guided, section-by-section art bible authoring skill. It
+produces a comprehensive visual direction document covering: Visual Style overview,
+Color Palette, Typography, Character Design Rules, Environment Style, and UI
+Visual Language. The skill follows the skeleton-first pattern: creates the file
+with all section headers immediately, then fills each section through discussion
+and writes each to disk after user approval.
+
+In `full` review mode, the AD-ART-BIBLE director gate (art director) runs after
+the draft is complete and before any section is written. In `lean` and `solo`
+modes, AD-ART-BIBLE is skipped and only user approval is required. The verdict
+is COMPLETE when all sections are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language per section
+- [ ] Documents the AD-ART-BIBLE director gate and its mode behavior
+- [ ] Has a next-step handoff (e.g., `/asset-spec` or `/design-system`)
+
+---
+
+## Director Gate Checks
+
+| Gate ID | Trigger condition | Mode guard |
+|--------------|--------------------------------|-----------------------|
+| AD-ART-BIBLE | After draft is complete | full only (not lean/solo) |
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Full mode, art bible drafted, AD-ART-BIBLE approves
+
+**Fixture:**
+- No existing `design/art-bible.md`
+- `production/session-state/review-mode.txt` contains `full`
+- `design/gdd/game-concept.md` exists with visual tone described
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill creates skeleton `design/art-bible.md` with all section headers
+2. Skill discusses and drafts each section with user collaboration
+3. After all sections are drafted, AD-ART-BIBLE gate is invoked (art director review)
+4. AD-ART-BIBLE returns APPROVED
+5. Skill asks "May I write section [N] to `design/art-bible.md`?" per section
+6. All sections written after approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Skeleton file is created first (before any section content is written)
+- [ ] AD-ART-BIBLE gate is invoked in full mode after draft is complete
+- [ ] Gate approval precedes the "May I write" section asks
+- [ ] All sections are present in the final file
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: AD-ART-BIBLE Returns CONCERNS — Section revised before writing
+
+**Fixture:**
+- Art bible draft complete
+- `production/session-state/review-mode.txt` contains `full`
+- AD-ART-BIBLE gate returns CONCERNS: "Color palette clashes with the dark
+ atmospheric tone described in the game concept"
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. AD-ART-BIBLE gate returns CONCERNS with specific feedback about palette
+2. Skill surfaces feedback to user: "Art director has concerns about the color palette"
+3. Skill returns to the Color Palette section for revision
+4. User and skill revise the palette to align with game concept tone
+5. AD-ART-BIBLE is not re-invoked (user decides to proceed after revision)
+6. Revised section is written after "May I write" approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] CONCERNS are shown to user before any section is written
+- [ ] Skill returns to the affected section for revision (not all sections)
+- [ ] Revised content (not original) is written to file
+- [ ] Verdict is COMPLETE after revision and approval
+
+---
+
+### Case 3: Lean Mode — AD-ART-BIBLE Skipped, Written With User Approval Only
+
+**Fixture:**
+- No existing art bible
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean`
+2. Skill drafts all sections with user collaboration
+3. AD-ART-BIBLE gate is skipped: output notes "[AD-ART-BIBLE] skipped — lean mode"
+4. Skill asks user for direct approval of each section
+5. Sections are written after user confirmation; verdict is COMPLETE
+
+**Assertions:**
+- [ ] AD-ART-BIBLE gate is NOT invoked in lean mode
+- [ ] Skip is explicitly noted: "[AD-ART-BIBLE] skipped — lean mode"
+- [ ] User approval is still required per section (gate skip ≠ approval skip)
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: Existing Art Bible — Retrofit Mode
+
+**Fixture:**
+- `design/art-bible.md` already exists with all sections populated
+- User wants to update the Character Design Rules section
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill reads existing art bible and detects all sections populated
+2. Skill offers retrofit: "Art bible exists — which section would you like to update?"
+3. User selects Character Design Rules
+4. Skill drafts updated content; in full mode, AD-ART-BIBLE is invoked for the
+ revised section before writing
+5. Skill asks "May I write Character Design Rules to `design/art-bible.md`?"
+6. Only that section is updated; other sections preserved; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Existing art bible is detected and retrofit is offered
+- [ ] Only the selected section is updated
+- [ ] In full mode: AD-ART-BIBLE gate runs even for single-section retrofit
+- [ ] Other sections are preserved
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Solo Mode — AD-ART-BIBLE Skipped, Noted in Output
+
+**Fixture:**
+- No existing art bible
+- `production/session-state/review-mode.txt` contains `solo`
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `solo`
+2. Art bible is drafted and written with only user approval
+3. AD-ART-BIBLE gate is skipped: output notes "[AD-ART-BIBLE] skipped — solo mode"
+4. No director agents are spawned
+5. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] AD-ART-BIBLE gate is NOT invoked in solo mode
+- [ ] Skip is explicitly noted with "solo mode" label
+- [ ] No director agents of any kind are spawned
+- [ ] Verdict is COMPLETE
+
+---
+
+## Protocol Compliance
+
+- [ ] Creates skeleton file immediately with all section headers
+- [ ] Discusses and drafts one section at a time
+- [ ] AD-ART-BIBLE gate runs in full mode after all sections are drafted
+- [ ] AD-ART-BIBLE is skipped in lean and solo modes — noted by name
+- [ ] Asks "May I write section [N]" per section
+- [ ] Verdict is COMPLETE when all sections are written
+
+---
+
+## Coverage Notes
+
+- The case where AD-ART-BIBLE returns REJECT (not just CONCERNS) is not
+ separately tested; the skill would block writing and ask the user how to
+ proceed (revise or override).
+- The Typography section is listed as a required art bible section but its
+ specific content requirements are not assertion-tested here.
+- The art bible feeds into `/asset-spec` — this relationship is noted in the
+ handoff but not tested as part of this skill's spec.
diff --git a/CCGS Skill Testing Framework/skills/authoring/create-architecture.md b/CCGS Skill Testing Framework/skills/authoring/create-architecture.md
new file mode 100644
index 0000000..f907943
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/authoring/create-architecture.md
@@ -0,0 +1,187 @@
+# Skill Test Spec: /create-architecture
+
+## Skill Summary
+
+`/create-architecture` guides the user through section-by-section authoring of a
+technical architecture document. It uses a skeleton-first approach — the file is
+created with all required section headers before any content is filled. Each
+section is discussed, drafted, and written individually after user approval. If an
+architecture document already exists, the skill offers retrofit mode to update
+specific sections.
+
+In `full` review mode, TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
+(lead-programmer) spawn after the complete draft is finished. In `lean` or `solo`
+mode, both gates are skipped. The skill writes to `docs/architecture/architecture.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Contains "May I write" collaborative protocol language (per-section approval)
+- [ ] Has a next-step handoff at the end (`/architecture-review` or `/create-control-manifest`)
+- [ ] Documents skeleton-first approach
+- [ ] Documents gate behavior: TD-ARCHITECTURE + LP-FEASIBILITY in full mode; skipped in lean/solo
+- [ ] Documents retrofit mode for existing architecture documents
+
+---
+
+## Director Gate Checks
+
+In `full` mode: TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
+(lead-programmer) spawn in parallel after all sections are drafted and before
+any final approval write.
+
+In `lean` mode: both gates are skipped. Output notes:
+"TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
+
+In `solo` mode: both gates are skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New architecture doc, skeleton-first, full mode gates approve
+
+**Fixture:**
+- No existing `docs/architecture/architecture.md`
+- `docs/architecture/` contains Accepted ADRs for reference
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skill creates skeleton `docs/architecture/architecture.md` with all required section headers
+2. For each section: drafts content, shows draft, asks "May I write [section]?", writes after approval
+3. After all sections are drafted: TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel
+4. Both gates return APPROVED
+5. Final "May I confirm architecture is complete?" asked
+6. Session state updated
+
+**Assertions:**
+- [ ] Skeleton file is created with all section headers before any content is written
+- [ ] "May I write [section]?" asked per section during authoring
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel (not sequentially)
+- [ ] Both gates complete before the final completion confirmation
+- [ ] Verdict is APPROVED when both gates return APPROVED
+- [ ] Next-step handoff to `/architecture-review` or `/create-control-manifest` is present
+
+---
+
+### Case 2: Failure Path — TD-ARCHITECTURE returns MAJOR REVISION
+
+**Fixture:**
+- Architecture doc is fully drafted (all sections)
+- `production/session-state/review-mode.txt` contains `full`
+- TD-ARCHITECTURE gate returns MAJOR REVISION: "[specific structural issue]"
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. All sections are drafted and written
+2. TD-ARCHITECTURE gate runs and returns MAJOR REVISION with specific feedback
+3. Skill surfaces the feedback to the user
+4. Architecture is NOT marked as finalized
+5. User is asked: revise the flagged sections, or accept the document as a draft
+
+**Assertions:**
+- [ ] Architecture is NOT marked finalized when TD-ARCHITECTURE returns MAJOR REVISION
+- [ ] Gate feedback is shown to the user with specific issue descriptions
+- [ ] User is given the option to revise specific sections
+- [ ] Skill does NOT auto-finalize despite MAJOR REVISION feedback
+
+---
+
+### Case 3: Lean Mode — Both gates skipped; architecture written with user approval only
+
+**Fixture:**
+- No existing architecture doc
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skeleton file is created
+2. All sections are authored and written per-section with user approval
+3. After completion: TD-ARCHITECTURE and LP-FEASIBILITY are skipped
+4. Output notes: "TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode"
+5. Architecture is considered complete based on user approval alone
+
+**Assertions:**
+- [ ] Both gate skip notes appear in output
+- [ ] Architecture document is written with only user approval in lean mode
+- [ ] Skill does NOT block completion because gates were skipped
+- [ ] Next-step handoff is still present
+
+---
+
+### Case 4: Retrofit Mode — Existing architecture doc, user updates a section
+
+**Fixture:**
+- `docs/architecture/architecture.md` already exists with all sections populated
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skill detects existing architecture doc and reads its current content
+2. Skill offers retrofit mode: "Architecture doc already exists. Which section would you like to update?"
+3. User selects a section
+4. Skill authors only that section, asks "May I write [section]?"
+5. Only the selected section is updated — other sections unchanged
+
+**Assertions:**
+- [ ] Skill detects and reads the existing architecture doc before offering retrofit
+- [ ] User is asked which section to update — not asked to rewrite the whole document
+- [ ] Only the selected section is updated
+- [ ] Other sections are not modified during a retrofit session
+
+---
+
+### Case 5: Director Gate — Architecture references a Proposed ADR; flagged as risk
+
+**Fixture:**
+- Architecture doc is being authored
+- One section references or depends on an ADR that has `Status: Proposed`
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skill authors all sections
+2. During authoring, skill detects a reference to a Proposed ADR
+3. Skill flags: "Note: [section] references ADR-NNN which is Proposed — this is a risk until the ADR is accepted"
+4. Risk flag is embedded in the relevant section's content
+5. TD-ARCHITECTURE and LP-FEASIBILITY still run — they are informed of the Proposed ADR risk
+
+**Assertions:**
+- [ ] Proposed ADR reference is detected and flagged during section authoring
+- [ ] Risk note is embedded in the architecture document section
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY still spawn (the risk does not block the gates)
+- [ ] Risk flag names the specific ADR number and title
+
+---
+
+## Protocol Compliance
+
+- [ ] Skeleton file created with all section headers before any content is written
+- [ ] "May I write [section]?" asked per section during authoring
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel in full mode
+- [ ] Skipped gates noted by name and mode in lean/solo output
+- [ ] Proposed ADR references flagged as risks in the document
+- [ ] Ends with next-step handoff: `/architecture-review` or `/create-control-manifest`
+
+---
+
+## Coverage Notes
+
+- The required section list for architecture documents is defined in the skill
+ body and in the `/architecture-review` skill — not re-enumerated here.
+- Engine version stamping in the architecture doc (parallel to ADR stamping)
+ is part of the authoring workflow — tested implicitly via Case 1.
+- The retrofit mode for updating multiple sections in one session follows the
+ same per-section approval pattern — not independently tested for multi-section
+ retrofits.
diff --git a/CCGS Skill Testing Framework/skills/authoring/design-system.md b/CCGS Skill Testing Framework/skills/authoring/design-system.md
new file mode 100644
index 0000000..923525e
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/authoring/design-system.md
@@ -0,0 +1,192 @@
+# Skill Test Spec: /design-system
+
+## Skill Summary
+
+`/design-system` guides the user through section-by-section authoring of a Game
+Design Document (GDD) for a single game system. All 8 required sections must be
+authored: Overview, Player Fantasy, Detailed Rules, Formulas, Edge Cases,
+Dependencies, Tuning Knobs, and Acceptance Criteria. The skill uses a
+skeleton-first approach — it creates the GDD file with all 8 section headers
+before filling any content — and writes each section individually after approval.
+
+The CD-GDD-ALIGN gate (creative-director) runs in both `full` AND `lean` modes.
+It is only skipped in `solo` mode. If an existing GDD file is found, the skill
+offers a retrofit mode to update specific sections rather than rewriting the whole
+document.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION
+- [ ] Contains "May I write" collaborative protocol language (per-section approval)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents skeleton-first approach (file created with headers before content)
+- [ ] Documents CD-GDD-ALIGN gate: active in full AND lean mode; skipped in solo only
+- [ ] Documents retrofit mode for existing GDD files
+
+---
+
+## Director Gate Checks
+
+In `full` mode: CD-GDD-ALIGN (creative-director) gate runs after each section is
+drafted, before writing. If MAJOR REVISION is returned, the section must be
+rewritten before proceeding.
+
+In `lean` mode: CD-GDD-ALIGN still runs (this gate is NOT skipped in lean mode —
+it runs in both full and lean). Only solo mode skips it.
+
+In `solo` mode: CD-GDD-ALIGN is skipped. Output notes:
+"CD-GDD-ALIGN skipped — solo mode". Sections are written with only user approval.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New GDD, skeleton-first, CD-GDD-ALIGN in lean mode
+
+**Fixture:**
+- No existing GDD for the target system in `design/gdd/`
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Skill creates skeleton file `design/gdd/[system-name].md` with all 8 section headers (empty bodies)
+2. For each section: discusses with user, drafts content, shows draft
+3. CD-GDD-ALIGN gate runs on each section draft (lean mode — gate is active)
+4. Gate returns APPROVED for each section
+5. "May I write [section]?" asked after gate approval
+6. Section written to file after user approval
+7. Process repeats for all 8 sections
+
+**Assertions:**
+- [ ] Skeleton file is created with all 8 section headers before any content is written
+- [ ] CD-GDD-ALIGN runs on each section in lean mode (not skipped)
+- [ ] "May I write" is asked per section (not once for all sections)
+- [ ] Each section is written individually after gate + user approval
+- [ ] All 8 sections are present in the final GDD file
+
+---
+
+### Case 2: Retrofit Mode — Existing GDD, update specific section
+
+**Fixture:**
+- `design/gdd/[system-name].md` already exists with all 8 sections populated
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Skill detects existing GDD file and reads its current content
+2. Skill offers retrofit mode: "GDD already exists. Which section would you like to update?"
+3. User selects a specific section (e.g., Formulas)
+4. Skill authors only that section, runs CD-GDD-ALIGN, asks "May I write?"
+5. Only the selected section is updated — other sections are not modified
+
+**Assertions:**
+- [ ] Skill detects and reads existing GDD before offering retrofit mode
+- [ ] User is asked which section to update — not asked to rewrite the whole document
+- [ ] Only the selected section is rewritten — others remain unchanged
+- [ ] CD-GDD-ALIGN still runs on the updated section
+- [ ] "May I write" is asked before updating the section
+
+---
+
+### Case 3: Director Gate — CD-GDD-ALIGN returns MAJOR REVISION
+
+**Fixture:**
+- New GDD being authored
+- `production/session-state/review-mode.txt` contains `lean`
+- CD-GDD-ALIGN gate returns MAJOR REVISION on the Player Fantasy section
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Player Fantasy section is drafted
+2. CD-GDD-ALIGN gate runs and returns MAJOR REVISION with specific feedback
+3. Skill surfaces the feedback to the user
+4. Section is NOT written to file while MAJOR REVISION is unresolved
+5. User rewrites the section in collaboration with the skill
+6. CD-GDD-ALIGN runs again on the revised section
+7. If revised section passes, "May I write?" is asked and section is written
+
+**Assertions:**
+- [ ] Section is NOT written when CD-GDD-ALIGN returns MAJOR REVISION
+- [ ] Gate feedback is shown to the user before requesting revision
+- [ ] CD-GDD-ALIGN runs again after the section is revised
+- [ ] Skill does NOT auto-proceed to the next section while MAJOR REVISION is unresolved
+
+---
+
+### Case 4: Solo Mode — CD-GDD-ALIGN skipped; sections written with user approval only
+
+**Fixture:**
+- New GDD being authored
+- `production/session-state/review-mode.txt` contains `solo`
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Skeleton file is created with 8 section headers
+2. For each section: drafted, shown to user
+3. CD-GDD-ALIGN is skipped — noted per section: "CD-GDD-ALIGN skipped — solo mode"
+4. "May I write [section]?" asked after user reviews draft
+5. Section written after user approval
+6. No gate review at any stage
+
+**Assertions:**
+- [ ] "CD-GDD-ALIGN skipped — solo mode" noted for each section
+- [ ] Sections are written after user approval alone (no gate required)
+- [ ] Skill does NOT spawn any CD-GDD-ALIGN gate in solo mode
+- [ ] Full GDD is written with only user approval in solo mode
+
+---
+
+### Case 5: Director Gate — Empty sections not written to file
+
+**Fixture:**
+- GDD authoring in progress
+- User and skill discuss one section but do not produce any approved content
+ (e.g., discussion ends without a decision, or user says "skip for now")
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Section discussion produces no approved content
+2. Skill does NOT write an empty or placeholder body to the section
+3. The section header remains in the skeleton file but the body stays empty
+4. Skill moves to the next section without writing the empty one
+5. At the end, incomplete sections are listed and user is reminded to return to them
+
+**Assertions:**
+- [ ] Empty or unapproved sections are NOT written to the file
+- [ ] Skeleton section header remains (preserves structure)
+- [ ] Skill tracks and lists incomplete sections at the end of the session
+- [ ] Skill does NOT write "TBD" or placeholder content without user approval
+
+---
+
+## Protocol Compliance
+
+- [ ] Skeleton file created with all 8 headers before any content is written
+- [ ] CD-GDD-ALIGN runs in both full AND lean mode (not just full)
+- [ ] CD-GDD-ALIGN skipped only in solo mode — noted per section
+- [ ] "May I write [section]?" asked per section (not once for the whole document)
+- [ ] MAJOR REVISION from CD-GDD-ALIGN blocks section write until resolved
+- [ ] Only approved, non-empty sections are written to the file
+- [ ] Ends with next-step handoff: `/review-all-gdds` or `/map-systems next`
+
+---
+
+## Coverage Notes
+
+- The 8 required sections are validated against the project's design document
+ standards defined in `CLAUDE.md` — not re-enumerated here.
+- The skill's internal section-ordering logic (which section to author first) is
+ not independently tested — the order follows the standard GDD template.
+- Pillar alignment checking within CD-GDD-ALIGN is evaluated holistically by
+ the gate agent — specific pillar checks are not fixture-tested here.
diff --git a/CCGS Skill Testing Framework/skills/authoring/quick-design.md b/CCGS Skill Testing Framework/skills/authoring/quick-design.md
new file mode 100644
index 0000000..e6bd0dd
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/authoring/quick-design.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /quick-design
+
+## Skill Summary
+
+`/quick-design` produces a lightweight design spec for features too small to
+warrant a full 8-section GDD. The target scope is under 4 hours of design time
+for a single-system feature. Instead of the full 8-section GDD format, the
+quick-design spec uses a streamlined 3-section format: Overview, Rules, and
+Acceptance Criteria.
+
+The skill has no director gates — adding gate overhead would defeat the purpose
+of a lightweight design tool. The skill asks "May I write" before writing the
+design note to `design/quick-notes/[name].md`. If the feature scope is too large
+for a quick-design, the skill redirects to `/design-system` instead.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CREATED, BLOCKED, REDIRECTED
+- [ ] Contains "May I write" collaborative protocol language (for quick-note file)
+- [ ] Has a next-step handoff at the end
+- [ ] Explicitly notes: no director gates (lightweight skill by design)
+- [ ] Mentions scope check: redirects to `/design-system` if scope exceeds sub-4h threshold
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. The lightweight
+nature of quick-design means director gate overhead is intentionally absent.
+Full GDD review is not needed for sub-4-hour single-system features.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Small UI change produces a 3-section spec
+
+**Fixture:**
+- No existing quick-note for the target feature
+- Feature is clearly scoped: a single UI element change with no cross-system impact
+
+**Input:** `/quick-design [feature-name]`
+
+**Expected behavior:**
+1. Skill asks scoping questions: what system, what change, what is the acceptance signal
+2. Skill determines scope is within the sub-4h threshold
+3. Skill drafts a 3-section spec: Overview, Rules, Acceptance Criteria
+4. Draft is shown to user
+5. "May I write `design/quick-notes/[name].md`?" is asked
+6. File is written after approval
+
+**Assertions:**
+- [ ] Spec contains exactly 3 sections: Overview, Rules, Acceptance Criteria
+- [ ] Draft is shown to user before "May I write" ask
+- [ ] "May I write `design/quick-notes/[name].md`?" is asked before writing
+- [ ] File is written to the correct path: `design/quick-notes/[name].md`
+- [ ] Verdict is CREATED after successful write
+
+---
+
+### Case 2: Failure Path — Scope check fails; redirected to /design-system
+
+**Fixture:**
+- Feature described spans multiple systems or would take more than 4 hours of design time
+ (e.g., "redesign the entire combat system" or "new progression mechanic affecting all classes")
+
+**Input:** `/quick-design [large-feature]`
+
+**Expected behavior:**
+1. Skill asks scoping questions
+2. Skill determines scope exceeds the sub-4h / single-system threshold
+3. Skill outputs: "This feature is too large for a quick-design. Use `/design-system [name]` for a full GDD."
+4. Skill does NOT write a quick-note file
+5. Verdict is REDIRECTED
+
+**Assertions:**
+- [ ] Skill detects the scope excess and stops before drafting
+- [ ] Message explicitly names `/design-system` as the correct alternative
+- [ ] No quick-note file is written
+- [ ] Verdict is REDIRECTED (not CREATED or BLOCKED)
+
+---
+
+### Case 3: Edge Case — File already exists; offered to update
+
+**Fixture:**
+- `design/quick-notes/[name].md` already exists from a previous session
+
+**Input:** `/quick-design [name]`
+
+**Expected behavior:**
+1. Skill detects existing quick-note file and reads its current content
+2. Skill asks: "[name].md already exists. Update it, or create a new version?"
+3. User selects update
+4. Skill shows the existing spec and asks which section to revise
+5. Updated spec is shown, "May I write?" asked, file updated after approval
+
+**Assertions:**
+- [ ] Skill detects and reads the existing file before offering to update
+- [ ] User is offered update or create-new options — not auto-overwritten
+- [ ] Only the revised section is updated (or the whole spec if user chooses full rewrite)
+- [ ] "May I write" is asked before overwriting the existing file
+
+---
+
+### Case 4: Edge Case — No argument provided
+
+**Fixture:**
+- `design/quick-notes/` directory may or may not exist
+
+**Input:** `/quick-design` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Skill outputs a usage error: "No feature name specified. Usage: /quick-design [feature-name]"
+3. Skill provides an example: `/quick-design pause-menu-settings`
+4. No file is created
+
+**Assertions:**
+- [ ] Skill outputs a usage error when no argument is given
+- [ ] A usage example is shown with the correct format
+- [ ] No quick-note file is written
+- [ ] Skill does NOT silently pick a feature name or default to any action
+
+---
+
+### Case 5: Director Gate — No gate spawned; explicitly noted for sub-4h features
+
+**Fixture:**
+- Feature is within scope for quick-design
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/quick-design [feature-name]`
+
+**Expected behavior:**
+1. Skill asks scoping questions and determines scope is within threshold
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. Skill does NOT spawn any director gate agent
+4. Spec is drafted, "May I write" asked, file written after approval
+5. Output explicitly notes: "No director gate review — quick-design is for sub-4h features"
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains a note explaining why no gate review is needed
+- [ ] Review mode has no effect on this skill's behavior
+- [ ] Full GDD review path (`/design-system`) is mentioned as the alternative for larger features
+
+---
+
+## Protocol Compliance
+
+- [ ] Scope check runs before drafting (redirects to `/design-system` if scope too large)
+- [ ] 3-section format used (Overview, Rules, Acceptance Criteria) — NOT the 8-section GDD format
+- [ ] Draft shown to user before "May I write" ask
+- [ ] "May I write `design/quick-notes/[name].md`?" asked before writing
+- [ ] No director gates — no review-mode.txt read
+- [ ] Ends with next-step handoff (e.g., proceed to implementation or `/dev-story`)
+
+---
+
+## Coverage Notes
+
+- The scope threshold heuristic (sub-4h, single-system) is a judgment call —
+ the skill's internal check is the authoritative definition and is not
+ independently tested by counting hours.
+- The `design/quick-notes/` directory is created automatically if it does not
+ exist — this filesystem behavior is not independently tested here.
+- Integration with the story pipeline (can a quick-design generate a story
+ directly?) is out of scope for this spec — quick-designs are standalone.
diff --git a/CCGS Skill Testing Framework/skills/authoring/ux-design.md b/CCGS Skill Testing Framework/skills/authoring/ux-design.md
new file mode 100644
index 0000000..afdc928
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/authoring/ux-design.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /ux-design
+
+## Skill Summary
+
+`/ux-design` is a guided, section-by-section UX spec authoring skill. It produces
+user flow diagrams (described textually), interaction state definitions, wireframe
+descriptions, and accessibility notes for a specified screen or HUD element. The
+skill follows the skeleton-first pattern: it creates the file with all section
+headers immediately, then fills each section through discussion and writes each
+section to disk after user approval.
+
+The skill has no inline director gates — `/ux-review` is the separate review step.
+Each section requires a "May I write section [N] to [filepath]?" ask. If a UX spec
+already exists for the named screen, the skill offers to retrofit individual sections
+rather than replace. Verdict is COMPLETE when all sections are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language per section
+- [ ] Has a next-step handoff (e.g., `/ux-review` to validate the completed spec)
+
+---
+
+## Director Gate Checks
+
+None. `/ux-design` has no inline director gates. `/ux-review` is the separate
+review skill invoked after this skill completes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New HUD spec, all sections authored and written
+
+**Fixture:**
+- No existing HUD UX spec in `design/ux/`
+- Engine and rendering preferences configured
+
+**Input:** `/ux-design hud`
+
+**Expected behavior:**
+1. Skill creates a skeleton file `design/ux/hud.md` with all section headers
+2. Skill discusses and drafts each section: User Flows, Interaction States
+ (normal/hover/focus/disabled), Wireframe Description, Accessibility Notes
+3. After each section is drafted and user confirms, skill asks "May I write
+ section [N] to `design/ux/hud.md`?"
+4. Each section is written in sequence after approval
+5. After all sections are written, verdict is COMPLETE
+6. Skill suggests running `/ux-review` as the next step
+
+**Assertions:**
+- [ ] Skeleton file is created first (with empty section bodies)
+- [ ] "May I write section [N]" is asked per section (not once at the end)
+- [ ] All required sections are present: User Flows, Interaction States,
+ Wireframe Description, Accessibility Notes
+- [ ] Handoff to `/ux-review` is at the end
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Existing UX Spec — Retrofit: user picks section to update
+
+**Fixture:**
+- `design/ux/hud.md` already exists with all sections populated
+- User wants to update only the Accessibility Notes section
+
+**Input:** `/ux-design hud`
+
+**Expected behavior:**
+1. Skill reads existing `design/ux/hud.md` and detects all sections are populated
+2. Skill reports: "UX spec already exists for HUD — offering to retrofit"
+3. Skill lists all sections and asks which to update
+4. User selects Accessibility Notes
+5. Skill drafts updated accessibility content and asks "May I write section
+ Accessibility Notes to `design/ux/hud.md`?"
+6. Only that section is updated; other sections are preserved; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Existing spec is detected and retrofit is offered
+- [ ] User selects which section(s) to update
+- [ ] Only the selected section is updated — other sections unchanged
+- [ ] "May I write" is asked for the updated section
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 3: Dependency Gap — Spec references a system with no design doc
+
+**Fixture:**
+- User is authoring a UX spec for the inventory screen
+- `design/gdd/inventory.md` does not exist
+
+**Input:** `/ux-design inventory-screen`
+
+**Expected behavior:**
+1. Skill begins authoring the inventory screen UX spec
+2. During the User Flows section, skill attempts to reference inventory system rules
+3. Skill detects: "No GDD found for inventory system — UX spec has a DEPENDENCY GAP"
+4. The dependency gap is flagged in the spec (noted inline: "DEPENDENCY GAP: inventory GDD")
+5. Skill continues authoring with placeholder notes for the missing rules
+6. Verdict is COMPLETE with advisory note about the dependency gap
+
+**Assertions:**
+- [ ] DEPENDENCY GAP label appears in the spec for the missing system doc
+- [ ] Skill does NOT block on the missing GDD — it continues with placeholders
+- [ ] Dependency gap is also noted in the skill output (not just in the file)
+- [ ] Handoff suggests both `/ux-review` and writing the missing GDD
+
+---
+
+### Case 4: No Argument Provided — Usage error
+
+**Fixture:**
+- No argument provided with the skill invocation
+
+**Input:** `/ux-design`
+
+**Expected behavior:**
+1. Skill detects no screen name or argument provided
+2. Skill outputs a usage error: "Screen name required. Usage: `/ux-design [screen-name]`"
+3. Skill provides examples: `/ux-design hud`, `/ux-design main-menu`, `/ux-design inventory`
+4. No file is created; no "May I write" is asked
+
+**Assertions:**
+- [ ] Usage error is clearly stated
+- [ ] Example invocations are provided
+- [ ] No file is created
+- [ ] Skill does not attempt to proceed without an argument
+
+---
+
+### Case 5: Director Gate Check — No gate; ux-review is the separate review skill
+
+**Fixture:**
+- New screen spec with argument provided
+
+**Input:** `/ux-design settings-menu`
+
+**Expected behavior:**
+1. Skill authors all sections of the settings menu UX spec
+2. No director agents are spawned
+3. No gate IDs appear in output during authoring
+
+**Assertions:**
+- [ ] No director gate is invoked during ux-design
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Creates skeleton file with all section headers before discussing content
+- [ ] Discusses and drafts one section at a time
+- [ ] Asks "May I write section [N]" after each section is approved
+- [ ] Detects existing spec and offers retrofit path
+- [ ] Ends with handoff to `/ux-review`
+- [ ] Verdict is COMPLETE when all sections are written
+
+---
+
+## Coverage Notes
+
+- Interaction state enumeration (normal/hover/focus/disabled/error) is a core
+ requirement of each spec; the `/ux-review` skill checks for completeness.
+- Wireframe descriptions are text-only (no images); image references may be
+ added manually by a designer after the fact.
+- Responsive layout concerns (different screen sizes) are noted as optional
+ content and not assertion-tested here.
diff --git a/CCGS Skill Testing Framework/skills/authoring/ux-review.md b/CCGS Skill Testing Framework/skills/authoring/ux-review.md
new file mode 100644
index 0000000..101f073
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/authoring/ux-review.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /ux-review
+
+## Skill Summary
+
+`/ux-review` validates an existing UX spec or HUD design document against
+accessibility and interaction standards. It checks for required sections
+(User Flows, Interaction States, Wireframe Description, Accessibility Notes),
+completeness of interaction state definitions (hover, focus, disabled, error),
+accessibility compliance (keyboard navigation, color contrast notes, screen
+reader considerations), and consistency with the art bible or design system
+if those documents exist.
+
+The skill is read-only — it produces no file writes. Verdicts: APPROVED
+(all checks pass), NEEDS REVISION (fixable issues found), or MAJOR REVISION
+NEEDED (structural or accessibility failures). No director gates apply —
+`/ux-review` IS the review gate for UX specs.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Does NOT contain "May I write" language (skill is read-only)
+- [ ] Has a next-step handoff (e.g., back to `/ux-design` for revision, or proceed to implementation)
+
+---
+
+## Director Gate Checks
+
+None. `/ux-review` is itself the review gate for UX specs. No additional director
+gates are invoked within this skill.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Complete UX spec with all required sections, APPROVED
+
+**Fixture:**
+- `design/ux/hud.md` exists with all required sections populated:
+ - User Flows: complete player flow diagrams
+ - Interaction States: normal, hover, focus, disabled, error all defined
+ - Wireframe Description: layout described
+ - Accessibility Notes: keyboard nav, contrast ratios, screen reader notes
+
+**Input:** `/ux-review hud`
+
+**Expected behavior:**
+1. Skill reads `design/ux/hud.md`
+2. Skill checks all 4 required sections — all present and non-empty
+3. Skill checks interaction states — all 5 states defined
+4. Skill checks accessibility notes — keyboard, contrast, and screen reader covered
+5. Skill outputs: checklist of all passed checks
+6. Verdict is APPROVED
+
+**Assertions:**
+- [ ] All 4 required sections are checked
+- [ ] All 5 interaction states are verified present
+- [ ] Verdict is APPROVED
+- [ ] No files are written
+
+---
+
+### Case 2: Missing Accessibility Section — NEEDS REVISION
+
+**Fixture:**
+- `design/ux/hud.md` exists but the Accessibility Notes section is empty
+- All other sections are fully populated
+
+**Input:** `/ux-review hud`
+
+**Expected behavior:**
+1. Skill reads the file and checks all sections
+2. Accessibility Notes section is empty — check fails
+3. Skill outputs: "NEEDS REVISION — Accessibility Notes section is empty"
+4. Skill lists specific items to add: keyboard navigation, color contrast ratios,
+ screen reader labels
+5. Verdict is NEEDS REVISION
+6. Handoff suggests returning to `/ux-design hud` to fill in the section
+
+**Assertions:**
+- [ ] NEEDS REVISION verdict is returned (not APPROVED or MAJOR REVISION NEEDED)
+- [ ] Specific missing content items are listed
+- [ ] Handoff points back to `/ux-design hud` for revision
+- [ ] No files are written
+
+---
+
+### Case 3: Interaction States Incomplete — NEEDS REVISION
+
+**Fixture:**
+- `design/ux/settings-menu.md` exists
+- Interaction States section only defines: normal and hover
+- Missing: focus, disabled, error states
+
+**Input:** `/ux-review settings-menu`
+
+**Expected behavior:**
+1. Skill reads the file and checks interaction states
+2. Only 2 of 5 required states are defined
+3. Skill reports: "NEEDS REVISION — Interaction states incomplete: missing focus, disabled, error"
+4. Verdict is NEEDS REVISION with specific missing states named
+
+**Assertions:**
+- [ ] NEEDS REVISION verdict returned
+- [ ] All 3 missing states are named explicitly in the output
+- [ ] Skill does not return MAJOR REVISION NEEDED for a fixable gap
+- [ ] Handoff suggests returning to `/ux-design settings-menu`
+
+---
+
+### Case 4: File Not Found — Error with remediation
+
+**Fixture:**
+- `design/ux/inventory-screen.md` does not exist
+
+**Input:** `/ux-review inventory-screen`
+
+**Expected behavior:**
+1. Skill attempts to read `design/ux/inventory-screen.md` — file not found
+2. Skill outputs: "UX spec not found: design/ux/inventory-screen.md"
+3. Skill suggests running `/ux-design inventory-screen` to create the spec first
+4. No review is performed; no verdict is issued
+
+**Assertions:**
+- [ ] Error message names the missing file with full path
+- [ ] `/ux-design inventory-screen` is suggested as the remediation
+- [ ] No review checklist is produced
+- [ ] No verdict is issued (error state, not APPROVED/NEEDS REVISION)
+
+---
+
+### Case 5: Director Gate Check — No gate; ux-review is itself the review
+
+**Fixture:**
+- Valid UX spec file
+
+**Input:** `/ux-review hud`
+
+**Expected behavior:**
+1. Skill performs the review and issues a verdict
+2. No additional director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Checks all 4 required sections (User Flows, Interaction States, Wireframe,
+ Accessibility Notes)
+- [ ] Checks all 5 interaction states (normal, hover, focus, disabled, error)
+- [ ] Checks accessibility coverage (keyboard nav, contrast, screen reader)
+- [ ] Does not write any files
+- [ ] Issues specific, actionable feedback when verdict is not APPROVED
+- [ ] Ends with next-step handoff to `/ux-design` for revision or implementation
+
+---
+
+## Coverage Notes
+
+- MAJOR REVISION NEEDED is triggered when structural sections are entirely
+ absent (not just empty) or when fundamental interaction flows are missing
+ entirely; not tested with a separate fixture here.
+- Art bible / design system consistency check (color palette alignment) is
+ mentioned as a capability but not separately fixture-tested.
+- The case where an existing spec was written for a now-renamed screen is
+ not tested; the skill would review the file by path regardless of the name.
diff --git a/tests/skills/gate-check.md b/CCGS Skill Testing Framework/skills/gate/gate-check.md
similarity index 73%
rename from tests/skills/gate-check.md
rename to CCGS Skill Testing Framework/skills/gate/gate-check.md
index 08c64bf..545bc8e 100644
--- a/tests/skills/gate-check.md
+++ b/CCGS Skill Testing Framework/skills/gate/gate-check.md
@@ -122,6 +122,62 @@ Verified automatically by `/skill-test static` — no fixture needed.
---
+---
+
+### Case 5: Director Gate — lean vs full vs solo mode
+
+**Fixture:**
+- `production/session-state/review-mode.txt` exists (or equivalent state file)
+- All required artifacts for the target gate are present
+- `design/gdd/game-concept.md` exists
+
+**Case 5a — full mode:**
+- `review-mode.txt` contains `full`
+
+**Input:** `/gate-check systems-design` (with full mode active)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `full`
+2. Skill spawns all 4 PHASE-GATE director prompts in parallel:
+ - CD-PHASE-GATE (creative-director)
+ - TD-PHASE-GATE (technical-director)
+ - PR-PHASE-GATE (producer)
+ - AD-PHASE-GATE (art-director)
+3. If one director returns CONCERNS → overall gate verdict is at minimum CONCERNS
+4. All 4 verdicts are collected before producing final output
+
+**Assertions (5a):**
+- [ ] Skill reads review-mode before deciding which directors to spawn
+- [ ] All 4 PHASE-GATE director prompts are spawned (not just 1 or 2)
+- [ ] Directors are spawned in parallel (simultaneous, not sequential)
+- [ ] A CONCERNS verdict from any one director propagates to overall verdict
+- [ ] Verdict is NOT auto-PASS if any director returns CONCERNS or REJECT
+
+**Case 5b — solo mode:**
+- `review-mode.txt` contains `solo`
+
+**Input:** `/gate-check systems-design` (with solo mode active)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `solo`
+2. Each director is noted as skipped: "[CD-PHASE-GATE] skipped — Solo mode"
+3. Gate verdict is derived from artifact/quality checks only
+4. No director gates spawn
+
+**Assertions (5b):**
+- [ ] No director gates are spawned in solo mode
+- [ ] Each skipped gate is explicitly noted in output: "[GATE-ID] skipped — Solo mode"
+- [ ] Verdict is based on artifact and quality checks only
+
+**Note on Case 3 correction:**
+The Case 3 assertions previously stated "Skill does not ask the user which gate to check
+if current stage is determinable." This is correct. However, the skill DOES use
+AskUserQuestion to confirm the auto-detected transition before running full checks —
+this is a confirmation step, not a gate selection. Assertions for Case 3 should not
+treat this confirmation as a failure.
+
+---
+
## Protocol Compliance
- [ ] Uses "May I write" before updating `production/stage.txt`
diff --git a/CCGS Skill Testing Framework/skills/pipeline/create-control-manifest.md b/CCGS Skill Testing Framework/skills/pipeline/create-control-manifest.md
new file mode 100644
index 0000000..f021843
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/pipeline/create-control-manifest.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /create-control-manifest
+
+## Skill Summary
+
+`/create-control-manifest` reads all Accepted ADRs from `docs/architecture/` and
+generates a control manifest — a summary document that captures all architectural
+constraints, required patterns, and forbidden patterns in one place. The manifest
+is the reference document that story authors use when writing story files, ensuring
+stories inherit the correct architectural rules without having to read all ADRs
+individually.
+
+The skill only includes Accepted ADRs; Proposed ADRs are excluded and noted. It
+has no director gates. The skill asks "May I write" before writing
+`docs/architecture/control-manifest.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CREATED, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (for control-manifest.md)
+- [ ] Has a next-step handoff at the end (`/create-epics` or `/create-stories`)
+- [ ] Documents that only Accepted ADRs are included (not Proposed)
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. The control
+manifest is a mechanical extraction from Accepted ADRs; no creative or technical
+review gate is needed.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — 4 Accepted ADRs create a correct manifest
+
+**Fixture:**
+- `docs/architecture/` contains 4 ADR files, all with `Status: Accepted`
+- Each ADR has a "Required Patterns" and/or "Forbidden Patterns" section
+- No existing `docs/architecture/control-manifest.md`
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads all ADR files in `docs/architecture/`
+2. Extracts Required Patterns, Forbidden Patterns, and key constraints from each
+3. Drafts the manifest with correct section structure
+4. Shows the draft manifest to the user
+5. Asks "May I write `docs/architecture/control-manifest.md`?"
+6. Writes the manifest after approval
+
+**Assertions:**
+- [ ] All 4 Accepted ADRs are represented in the manifest
+- [ ] Manifest includes distinct sections for Required Patterns and Forbidden Patterns
+- [ ] Manifest includes the source ADR number for each constraint
+- [ ] "May I write" is asked before writing
+- [ ] Skill does NOT write without approval
+- [ ] Verdict is CREATED after writing
+
+---
+
+### Case 2: Failure Path — No ADRs found
+
+**Fixture:**
+- `docs/architecture/` directory exists but contains no ADR files
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads `docs/architecture/` and finds no ADR files
+2. Skill outputs: "No ADRs found. Run `/architecture-decision` to create ADRs before generating the control manifest."
+3. Skill exits without creating any file
+4. Verdict is BLOCKED
+
+**Assertions:**
+- [ ] Skill outputs a clear error when no ADRs are found
+- [ ] No control manifest file is written
+- [ ] Skill recommends `/architecture-decision` as the next action
+- [ ] Verdict is BLOCKED (not an error crash)
+
+---
+
+### Case 3: Mixed ADR Statuses — Only Accepted ADRs included
+
+**Fixture:**
+- `docs/architecture/` contains 3 Accepted ADRs and 2 Proposed ADRs
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads all ADR files and filters by Status: Accepted
+2. Manifest is drafted from the 3 Accepted ADRs only
+3. Output notes: "2 Proposed ADRs were excluded: [adr-NNN-name, adr-NNN-name]"
+4. User sees which ADRs were excluded before approving the write
+5. Asks "May I write `docs/architecture/control-manifest.md`?"
+
+**Assertions:**
+- [ ] Only the 3 Accepted ADRs appear in the manifest content
+- [ ] Excluded Proposed ADRs are listed by name in the output
+- [ ] User sees the exclusion list before approving the write
+- [ ] Skill does NOT silently omit Proposed ADRs without noting them
+
+---
+
+### Case 4: Edge Case — Manifest already exists
+
+**Fixture:**
+- `docs/architecture/control-manifest.md` already exists (version 1, dated last week)
+- `docs/architecture/` contains Accepted ADRs (some new since last manifest)
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill detects existing manifest and reads its version number / date
+2. Skill offers to regenerate: "control-manifest.md already exists (v1, [date]). Regenerate with current ADRs?"
+3. If user confirms: skill drafts updated manifest, increments version number
+4. Asks "May I write `docs/architecture/control-manifest.md`?" (overwrite)
+5. Writes updated manifest after approval
+
+**Assertions:**
+- [ ] Skill reads and reports the existing manifest version before offering to regenerate
+- [ ] User is offered a regenerate/skip choice — not auto-overwritten
+- [ ] Updated manifest has an incremented version number
+- [ ] "May I write" is asked before overwriting the existing file
+
+---
+
+### Case 5: Director Gate — No gate spawned; no review-mode.txt read
+
+**Fixture:**
+- 4 Accepted ADRs exist
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads ADRs and drafts manifest
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. No director gate agents are spawned at any point
+4. Skill proceeds directly to "May I write" after drafting
+5. Review mode setting has no effect on this skill's behavior
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
+- [ ] The manifest is generated from ADRs alone, with no external gate review
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads all ADR files before drafting manifest
+- [ ] Only Accepted ADRs included — Proposed ones noted as excluded
+- [ ] Manifest draft shown to user before "May I write" ask
+- [ ] "May I write `docs/architecture/control-manifest.md`?" asked before writing
+- [ ] No director gates — no review-mode.txt read
+- [ ] Ends with next-step handoff: `/create-epics` or `/create-stories`
+
+---
+
+## Coverage Notes
+
+- The exact section structure of the generated manifest (constraint tables, pattern
+ lists) is defined by the skill body and not re-enumerated in test assertions.
+- The `version` field incrementing logic (v1 → v2) is tested via Case 4 but exact
+ version numbering format is not fixture-locked.
+- ADR parsing (extracting Required/Forbidden Patterns) depends on consistent ADR
+ structure — tested implicitly via Case 1's fixture.
diff --git a/CCGS Skill Testing Framework/skills/pipeline/create-epics.md b/CCGS Skill Testing Framework/skills/pipeline/create-epics.md
new file mode 100644
index 0000000..921eac1
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/pipeline/create-epics.md
@@ -0,0 +1,190 @@
+# Skill Test Spec: /create-epics
+
+## Skill Summary
+
+`/create-epics` reads all approved GDDs and translates them into EPIC.md files,
+one per system. Epics are organized by layer (Foundation → Core → Feature →
+Presentation) and processed in priority order within each layer. Each EPIC.md
+includes scope, governing ADRs, GDD requirements, engine risk level, and a
+Definition of Done. The skill asks "May I write" before creating each EPIC file.
+
+In `full` review mode, a PR-EPIC gate (producer) runs after drafting epics and
+before writing any files. In `lean` or `solo` mode, PR-EPIC is skipped and noted.
+Epics are written to `production/epics/[layer]/EPIC-[name].md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CREATED, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (per-epic approval)
+- [ ] Has a next-step handoff at the end (`/create-stories`)
+- [ ] Documents PR-EPIC gate behavior: runs in full mode; skipped in lean/solo
+
+---
+
+## Director Gate Checks
+
+In `full` mode: PR-EPIC (producer) gate runs after epics are drafted and before
+any epic file is written. If PR-EPIC returns CONCERNS, epics are revised before
+the "May I write" ask.
+
+In `lean` mode: PR-EPIC is skipped. Output notes: "PR-EPIC skipped — lean mode".
+
+In `solo` mode: PR-EPIC is skipped. Output notes: "PR-EPIC skipped — solo mode".
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Two approved GDDs create two EPIC files
+
+**Fixture:**
+- `design/gdd/systems-index.md` exists with 2 systems listed
+- Both systems have approved GDDs in `design/gdd/`
+- `docs/architecture/architecture.md` exists with matching modules
+- At least one Accepted ADR exists for each system
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. Skill reads systems index and both GDDs
+2. Drafts 2 EPIC definitions (layer, GDD path, ADRs, requirements, engine risk)
+3. PR-EPIC gate is skipped (lean mode) — noted in output
+4. For each epic: asks "May I write `production/epics/[layer]/EPIC-[name].md`?"
+5. After approval: writes both EPIC files
+6. Creates or updates `production/epics/index.md`
+
+**Assertions:**
+- [ ] Epic summary is shown before any write ask
+- [ ] "May I write" is asked per-epic (not once for all epics together)
+- [ ] Each EPIC.md contains: layer, GDD path, governing ADRs, requirements table, Definition of Done
+- [ ] PR-EPIC skip is noted in output
+- [ ] `production/epics/index.md` is updated after writing
+- [ ] Skill does NOT write EPIC files without per-epic approval
+
+---
+
+### Case 2: Failure Path — No approved GDDs found
+
+**Fixture:**
+- `design/gdd/systems-index.md` exists
+- No GDDs in `design/gdd/` have approved status (all are Draft or In Progress)
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. Skill reads systems index and attempts to find approved GDDs
+2. No approved GDDs found
+3. Skill outputs: "No approved GDDs to convert. GDDs must be Approved before creating epics."
+4. Skill suggests running `/design-system` and completing GDD approval first
+5. Skill exits without creating any EPIC files
+
+**Assertions:**
+- [ ] Skill stops cleanly with a clear message when no approved GDDs exist
+- [ ] No EPIC files are written
+- [ ] Skill recommends the correct next action
+- [ ] Verdict is BLOCKED
+
+---
+
+### Case 3: Director Gate — Full mode spawns PR-EPIC before writing
+
+**Fixture:**
+- 2 approved GDDs exist
+- `production/session-state/review-mode.txt` contains `full`
+
+**Full mode expected behavior:**
+1. Skill drafts both epics
+2. PR-EPIC gate spawns and reviews the epic drafts
+3. If PR-EPIC returns APPROVED: "May I write" ask proceeds normally
+4. Epic files are written after approval
+
+**Assertions (full mode):**
+- [ ] PR-EPIC gate appears in output as an active gate
+- [ ] PR-EPIC runs before any "May I write" ask
+- [ ] Epic files are NOT written before PR-EPIC completes
+
+**Fixture (lean mode):**
+- Same GDDs
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Lean mode expected behavior:**
+1. Epics are drafted
+2. PR-EPIC is skipped — noted in output
+3. "May I write" ask proceeds directly
+
+**Assertions (lean mode):**
+- [ ] "PR-EPIC skipped — lean mode" appears in output
+- [ ] Skill proceeds to "May I write" without waiting for PR-EPIC
+
+---
+
+### Case 4: Edge Case — Epic already exists for a GDD
+
+**Fixture:**
+- `production/epics/[layer]/EPIC-[name].md` already exists for one of the approved GDDs
+- The other GDD has no existing EPIC file
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. Skill detects the existing EPIC file for the first system
+2. Skill offers to update rather than overwrite: "EPIC-[name].md already exists. Update it, or skip?"
+3. For the second system (no existing file): proceeds normally with "May I write"
+
+**Assertions:**
+- [ ] Skill detects existing EPIC files before writing
+- [ ] User is offered "update" or "skip" options — not auto-overwritten
+- [ ] The new system's EPIC is created normally without conflict
+
+---
+
+### Case 5: Director Gate — PR-EPIC returns CONCERNS
+
+**Fixture:**
+- 2 approved GDDs exist
+- `production/session-state/review-mode.txt` contains `full`
+- PR-EPIC gate returns CONCERNS (e.g., scope of one epic is too large)
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. PR-EPIC gate spawns and returns CONCERNS with specific feedback
+2. Skill surfaces the concerns to the user before any write ask
+3. User is given options: revise epics, accept concerns and proceed, or stop
+4. If user revises: updated epic drafts are shown before the "May I write" ask
+5. Skill does NOT write epics while CONCERNS are unaddressed
+
+**Assertions:**
+- [ ] CONCERNS from PR-EPIC are shown to the user before writing
+- [ ] Skill does NOT auto-write epics when CONCERNS are returned
+- [ ] User is given a clear choice to revise, proceed, or stop
+- [ ] Revised epic drafts are re-shown after revision before final approval
+
+---
+
+## Protocol Compliance
+
+- [ ] Epic drafts shown to user before any "May I write" ask
+- [ ] "May I write" asked per-epic, not once for the entire batch
+- [ ] PR-EPIC gate (if active) runs before write asks — not after
+- [ ] Skipped gates noted by name and mode in output
+- [ ] EPIC.md content sourced only from GDDs, ADRs, and architecture docs — nothing invented
+- [ ] Ends with next-step handoff: `/create-stories [epic-slug]` per created epic
+
+---
+
+## Coverage Notes
+
+- Processing of Core, Feature, and Presentation layers follows the same per-epic
+ pattern as Foundation — layer-specific ordering is not independently tested.
+- Engine risk level assignment (LOW/MEDIUM/HIGH) from governing ADRs is
+ validated implicitly via Case 1's fixture structure.
+- The `layer: [name]` and `[system-name]` argument modes follow the same approval
+ pattern as the default (all systems) mode.
diff --git a/CCGS Skill Testing Framework/skills/pipeline/create-stories.md b/CCGS Skill Testing Framework/skills/pipeline/create-stories.md
new file mode 100644
index 0000000..e2dbb89
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/pipeline/create-stories.md
@@ -0,0 +1,191 @@
+# Skill Test Spec: /create-stories
+
+## Skill Summary
+
+`/create-stories` breaks a single epic into developer-ready story files. It reads
+the EPIC.md, the corresponding GDD, governing ADRs, the control manifest, and the
+TR registry. Each story gets structured frontmatter including: Title, Epic, Layer,
+Priority, Status, TR-ID, ADR references, Acceptance Criteria, and Definition of
+Done. Stories are classified by type (Logic / Integration / Visual/Feel / UI /
+Config/Data) which determines the required test evidence path.
+
+In `full` review mode, a QL-STORY-READY check runs per story after creation. In
+`lean` or `solo` mode, QL-STORY-READY is skipped. The skill asks "May I write"
+before writing each story file. Stories are written to
+`production/epics/[layer]/story-[name].md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED, NEEDS WORK
+- [ ] Contains "May I write" collaborative protocol language (per-story approval)
+- [ ] Has a next-step handoff at the end (`/story-readiness`, `/dev-story`)
+- [ ] Documents story Status: Blocked when governing ADR is Proposed
+- [ ] Documents QL-STORY-READY gate: active in full mode, skipped in lean/solo
+
+---
+
+## Director Gate Checks
+
+In `full` mode: QL-STORY-READY check runs per story after creation. Stories that
+fail the check are noted as NEEDS WORK before the "May I write" ask.
+
+In `lean` mode: QL-STORY-READY is skipped. Output notes:
+"QL-STORY-READY skipped — lean mode" per story.
+
+In `solo` mode: QL-STORY-READY is skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Epic with 3 stories, all ADRs Accepted
+
+**Fixture:**
+- `production/epics/[layer]/EPIC-[name].md` exists with 3 GDD requirements
+- Corresponding GDD exists with matching acceptance criteria
+- All governing ADRs have `Status: Accepted`
+- `docs/architecture/control-manifest.md` exists
+- `docs/architecture/tr-registry.yaml` has TR-IDs for all 3 requirements
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/create-stories [epic-name]`
+
+**Expected behavior:**
+1. Skill reads EPIC.md, GDD, governing ADRs, control manifest, and TR registry
+2. Classifies each requirement into a story type (Logic / Integration / Visual/Feel / UI / Config/Data)
+3. Drafts 3 story files with correct frontmatter schema
+4. QL-STORY-READY is skipped (lean mode) — noted in output
+5. Asks "May I write" before writing each story file
+6. Writes all 3 story files after approval
+
+**Assertions:**
+- [ ] Each story's frontmatter contains: Title, Epic, Layer, Priority, Status, TR-ID, ADR reference, Acceptance Criteria, DoD
+- [ ] Story types are correctly classified (at least one Logic type in fixture)
+- [ ] "May I write" is asked per story (not once for the entire batch)
+- [ ] QL-STORY-READY skip is noted in output
+- [ ] All 3 story files are written with correct naming: `story-[name].md`
+- [ ] Skill does NOT start implementation
+
+---
+
+### Case 2: Failure Path — No epic file found
+
+**Fixture:**
+- The epic path provided does not exist in `production/epics/`
+
+**Input:** `/create-stories nonexistent-epic`
+
+**Expected behavior:**
+1. Skill attempts to read the EPIC.md file
+2. File not found
+3. Skill outputs a clear error with the path it searched
+4. Skill suggests checking `production/epics/` or running `/create-epics` first
+5. No story files are created
+
+**Assertions:**
+- [ ] Skill outputs a clear error naming the missing file path
+- [ ] No story files are written
+- [ ] Skill recommends the correct next action (`/create-epics`)
+- [ ] Skill does NOT create stories without a valid EPIC.md
+
+---
+
+### Case 3: Blocked Story — ADR is Proposed
+
+**Fixture:**
+- EPIC.md exists with 2 requirements
+- Requirement 1 is covered by an Accepted ADR
+- Requirement 2 is covered by an ADR with `Status: Proposed`
+
+**Input:** `/create-stories [epic-name]`
+
+**Expected behavior:**
+1. Skill reads the ADR for Requirement 2 and finds Status: Proposed
+2. Story for Requirement 2 is drafted with `Status: Blocked`
+3. Blocking note references the specific ADR: "BLOCKED: ADR-NNN is Proposed"
+4. Story for Requirement 1 is drafted normally with `Status: Ready`
+5. Both stories are shown in the draft — user asked "May I write" for both
+
+**Assertions:**
+- [ ] Story 2 has `Status: Blocked` in its frontmatter
+- [ ] Blocking note names the specific ADR number and recommends `/architecture-decision`
+- [ ] Story 1 has `Status: Ready` — blocked status does not affect non-blocked stories
+- [ ] Blocked status is shown in the draft preview before writing
+- [ ] Both story files are written (blocked stories are still written — just flagged)
+
+---
+
+### Case 4: Edge Case — No argument provided
+
+**Fixture:**
+- `production/epics/` directory exists with ≥2 epic subdirectories
+
+**Input:** `/create-stories` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs a usage error: "No epic specified. Usage: /create-stories [epic-name]"
+3. Skill lists available epics from `production/epics/`
+4. No story files are created
+
+**Assertions:**
+- [ ] Skill outputs a usage error when no argument is given
+- [ ] Skill lists available epics to help the user choose
+- [ ] No story files are written
+- [ ] Skill does NOT silently pick an epic without user input
+
+---
+
+### Case 5: Director Gate — Full mode runs QL-STORY-READY; stories failing noted as NEEDS WORK
+
+**Fixture:**
+- EPIC.md exists with 2 requirements
+- Both governing ADRs are Accepted
+- `production/session-state/review-mode.txt` contains `full`
+- QL-STORY-READY check finds one story has ambiguous acceptance criteria
+
+**Input:** `/create-stories [epic-name]`
+
+**Expected behavior:**
+1. Both stories are drafted
+2. QL-STORY-READY check runs for each story
+3. Story 1 passes QL-STORY-READY
+4. Story 2 fails QL-STORY-READY — noted as NEEDS WORK with specific feedback
+5. Both stories are shown to user with pass/fail status before "May I write"
+6. User can proceed (story written as-is with NEEDS WORK note) or revise first
+
+**Assertions:**
+- [ ] QL-STORY-READY results appear per story in the output
+- [ ] Story 2 is flagged as NEEDS WORK with the specific failing criteria
+- [ ] Story 1 shows as passing QL-STORY-READY
+- [ ] User is given the choice to proceed or revise before writing
+- [ ] Skill does NOT auto-block writing of stories that fail QL-STORY-READY without user input
+
+---
+
+## Protocol Compliance
+
+- [ ] All context (EPIC, GDD, ADRs, manifest, TR registry) loaded before drafting stories
+- [ ] Story drafts shown in full before any "May I write" ask
+- [ ] "May I write" asked per story (not once for the entire batch)
+- [ ] Blocked stories flagged before write approval — not discovered after writing
+- [ ] TR-IDs reference the registry — requirement text is not embedded inline in story files
+- [ ] Control manifest rules quoted per-story from the manifest, not invented
+- [ ] Ends with next-step handoff: `/story-readiness` → `/dev-story`
+
+---
+
+## Coverage Notes
+
+- Integration story test evidence (playtest doc alternative) follows the same
+ approval pattern as Logic stories — not independently fixture-tested.
+- Story ordering (foundational first, UI last) is validated implicitly via
+ Case 1's multi-story fixture.
+- The story sizing rule (splitting large requirement groups) is not tested here
+ — it is addressed in the `/create-stories` skill's internal logic.
diff --git a/CCGS Skill Testing Framework/skills/pipeline/dev-story.md b/CCGS Skill Testing Framework/skills/pipeline/dev-story.md
new file mode 100644
index 0000000..ebe1789
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/pipeline/dev-story.md
@@ -0,0 +1,205 @@
+# Skill Test Spec: /dev-story
+
+## Skill Summary
+
+`/dev-story` reads a story file, loads all required context (referenced ADR,
+TR-ID from the registry, control manifest, engine preferences), implements the
+story, verifies that all acceptance criteria are met, and marks the story
+Complete. The skill routes implementation to the correct specialist agent based
+on the engine and file type — it does not write source code directly.
+
+In `full` review mode, an LP-CODE-REVIEW gate runs before marking the story
+Complete. In `lean` or `solo` mode, LP-CODE-REVIEW is skipped and the story is
+marked Complete after the user confirms all criteria are met. The skill asks
+"May I write" before updating story status and before writing code files.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED, IN PROGRESS, NEEDS CHANGES
+- [ ] Contains "May I write" collaborative protocol language (story status + code files)
+- [ ] Has a next-step handoff at the end (`/story-done`)
+- [ ] Documents LP-CODE-REVIEW gate: active in full mode, skipped in lean/solo
+- [ ] Notes that implementation is delegated to specialist agents (not done directly)
+
+---
+
+## Director Gate Checks
+
+In `full` mode: LP-CODE-REVIEW gate runs after implementation is complete and all
+criteria are verified, before marking the story Complete.
+
+In `lean` mode: LP-CODE-REVIEW is skipped. Output notes:
+"LP-CODE-REVIEW skipped — lean mode". Story is marked Complete after user confirms.
+
+In `solo` mode: LP-CODE-REVIEW is skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Story implemented and marked Complete (full mode)
+
+**Fixture:**
+- A story file exists at `production/epics/[layer]/story-[name].md` with:
+ - `Status: Ready`
+ - A TR-ID referencing a registered requirement
+ - At least 2 Given-When-Then acceptance criteria
+ - A test evidence path
+- Referenced ADR has `Status: Accepted`
+- `docs/architecture/control-manifest.md` exists
+- `.claude/docs/technical-preferences.md` has engine and language configured
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/dev-story production/epics/[layer]/story-[name].md`
+
+**Expected behavior:**
+1. Skill reads the story file and all referenced context
+2. Skill verifies the ADR is Accepted (no block)
+3. Skill routes implementation to the correct specialist agent
+4. All acceptance criteria are verified as met
+5. LP-CODE-REVIEW gate spawns and returns APPROVED
+6. Skill asks "May I update story status to Complete?"
+7. Story status is updated to Complete
+
+**Assertions:**
+- [ ] Skill reads story before spawning any agent
+- [ ] ADR status is checked before implementation begins
+- [ ] Implementation is delegated to a specialist agent (not done inline)
+- [ ] All acceptance criteria are confirmed before LP-CODE-REVIEW
+- [ ] LP-CODE-REVIEW appears in output as a completed gate
+- [ ] Story status is updated to Complete only after gate approval and user consent
+- [ ] Test file is written as part of implementation (not deferred)
+
+---
+
+### Case 2: Failure Path — Referenced ADR is Proposed
+
+**Fixture:**
+- A story file exists with `Status: Ready`
+- The story's TR-ID points to a requirement covered by an ADR with `Status: Proposed`
+
+**Input:** `/dev-story production/epics/[layer]/story-[name].md`
+
+**Expected behavior:**
+1. Skill reads the story file
+2. Skill resolves the TR-ID and reads the governing ADR
+3. ADR status is Proposed — skill outputs a BLOCKED message
+4. Skill names the specific ADR blocking the story
+5. Skill recommends running `/architecture-decision` to advance the ADR
+6. Implementation does NOT begin
+
+**Assertions:**
+- [ ] Skill does NOT begin implementation with a Proposed ADR
+- [ ] BLOCKED message names the specific ADR number and title
+- [ ] Skill recommends `/architecture-decision` as the next action
+- [ ] Story status remains unchanged (not set to In Progress or Complete)
+
+---
+
+### Case 3: Ambiguous Acceptance Criteria — Skill asks for clarification
+
+**Fixture:**
+- A story file exists with `Status: Ready`
+- Referenced ADR is Accepted
+- One acceptance criterion is ambiguous (not Given-When-Then; uses subjective language like "feels responsive")
+
+**Input:** `/dev-story production/epics/[layer]/story-[name].md`
+
+**Expected behavior:**
+1. Skill reads the story and identifies the ambiguous criterion
+2. Before routing to the specialist, skill asks the user to clarify the criterion
+3. User provides a concrete, testable restatement
+4. Skill proceeds with implementation using the clarified criterion
+5. Skill does NOT guess at the intended behavior
+
+**Assertions:**
+- [ ] Skill surfaces the ambiguous criterion before implementation starts
+- [ ] Skill asks for user clarification (not auto-interpretation)
+- [ ] Implementation begins only after clarification is provided
+- [ ] Clarified criterion is used in the test (not the original vague version)
+
+---
+
+### Case 4: Edge Case — No argument; reads from session state
+
+**Fixture:**
+- No argument is provided
+- `production/session-state/active.md` references an active story file
+- That story file exists with `Status: In Progress`
+
+**Input:** `/dev-story` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Skill reads `production/session-state/active.md`
+3. Skill finds the active story reference
+4. Skill confirms with user: "Continuing work on [story title] — is that correct?"
+5. After confirmation, skill proceeds with that story
+
+**Assertions:**
+- [ ] Skill reads session state when no argument is provided
+- [ ] Skill confirms the active story with the user before proceeding
+- [ ] Skill does NOT silently assume the active story without confirmation
+- [ ] If session state has no active story, skill asks which story to implement
+
+---
+
+### Case 5: Director Gate — LP-CODE-REVIEW returns NEEDS CHANGES; lean mode skips gate
+
+**Fixture (full mode):**
+- Story is implemented and all criteria appear met
+- `production/session-state/review-mode.txt` contains `full`
+- LP-CODE-REVIEW gate returns NEEDS CHANGES with specific feedback
+
+**Full mode expected behavior:**
+1. LP-CODE-REVIEW gate spawns after implementation
+2. Gate returns NEEDS CHANGES with 2 specific issues
+3. Story status remains In Progress — NOT marked Complete
+4. User is shown the gate feedback and asked how to proceed
+
+**Assertions (full mode):**
+- [ ] Story is NOT marked Complete when LP-CODE-REVIEW returns NEEDS CHANGES
+- [ ] Gate feedback is shown to the user verbatim
+- [ ] Story status stays In Progress until issues are resolved and gate passes
+
+**Fixture (lean mode):**
+- Same story, `production/session-state/review-mode.txt` contains `lean`
+
+**Lean mode expected behavior:**
+1. Implementation completes
+2. LP-CODE-REVIEW gate is skipped — noted in output
+3. User is asked to confirm all criteria are met
+4. Story is marked Complete after user confirmation
+
+**Assertions (lean mode):**
+- [ ] "LP-CODE-REVIEW skipped — lean mode" appears in output
+- [ ] Story is marked Complete after user confirms criteria (no gate required)
+- [ ] Skill does NOT block on a gate that is skipped
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT write source code directly — delegates to specialist agents
+- [ ] Reads all context (story, TR-ID, ADR, manifest, engine prefs) before implementation
+- [ ] "May I write" asked before updating story status and before writing code files
+- [ ] Skipped gates noted by name and mode in output
+- [ ] Updates `production/session-state/active.md` after story completion
+- [ ] Ends with next-step handoff: `/story-done`
+
+---
+
+## Coverage Notes
+
+- Engine routing logic (Godot vs Unity vs Unreal) is not tested per engine —
+ the routing pattern is consistent; engine selection is a config fact.
+- Visual/Feel and UI story types (no automated test required) have different
+ evidence requirements and are not covered in these cases.
+- Integration story type follows the same pattern as Logic but with a different
+ evidence path — not independently fixture-tested.
diff --git a/CCGS Skill Testing Framework/skills/pipeline/map-systems.md b/CCGS Skill Testing Framework/skills/pipeline/map-systems.md
new file mode 100644
index 0000000..2eda044
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/pipeline/map-systems.md
@@ -0,0 +1,196 @@
+# Skill Test Spec: /map-systems
+
+## Skill Summary
+
+`/map-systems` decomposes a game concept into a systems index. It reads the
+approved game concept and pillars, enumerates both explicit and implicit systems,
+maps dependencies between systems, assigns priority tiers (MVP / Vertical Slice /
+Alpha / Full Vision), and organizes systems into a layered design order
+(Foundation → Core → Feature → Presentation). The output is written to
+`design/systems-index.md` after user approval.
+
+This skill is required between game concept approval and per-system GDD creation
+— it is a mandatory gate in the pipeline. In `full` review mode, CD-SYSTEMS
+(creative-director) and TD-SYSTEM-BOUNDARY (technical-director) spawn in parallel
+after the decomposition is drafted. In `lean` or `solo` mode, both gates are
+skipped. The skill writes to `design/systems-index.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (for systems-index.md)
+- [ ] Has a next-step handoff at the end (`/design-system`)
+- [ ] Documents gate behavior: CD-SYSTEMS + TD-SYSTEM-BOUNDARY in parallel in full mode
+
+---
+
+## Director Gate Checks
+
+In `full` mode: CD-SYSTEMS (creative-director) and TD-SYSTEM-BOUNDARY
+(technical-director) spawn in parallel after the systems decomposition is drafted
+and before `design/systems-index.md` is written.
+
+In `lean` mode: both gates are skipped. Output notes:
+"CD-SYSTEMS skipped — lean mode" and "TD-SYSTEM-BOUNDARY skipped — lean mode".
+
+In `solo` mode: both gates are skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Game concept exists, 5-8 systems identified
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists with Core Mechanics and MVP Definition sections
+- `design/gdd/game-pillars.md` exists with ≥1 pillar defined
+- No `design/systems-index.md` exists yet
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Skill reads game-concept.md and game-pillars.md
+2. Identifies 5-8 systems (explicit + implicit)
+3. Maps dependencies between systems and assigns layers
+4. CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel and return APPROVED
+5. Asks "May I write `design/systems-index.md`?"
+6. Writes systems-index.md after approval
+7. Updates `production/session-state/active.md`
+
+**Assertions:**
+- [ ] Between 5 and 8 systems are identified (not fewer, not more without explanation)
+- [ ] CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel (not sequentially)
+- [ ] Both gates complete before the "May I write" ask
+- [ ] "May I write `design/systems-index.md`?" is asked before writing
+- [ ] systems-index.md is NOT written without approval
+- [ ] Session state is updated after writing
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Failure Path — No game concept found
+
+**Fixture:**
+- `design/gdd/game-concept.md` does NOT exist
+- `design/gdd/` directory may be empty or absent
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Skill attempts to read `design/gdd/game-concept.md`
+2. File not found
+3. Skill outputs: "No game concept found. Run `/brainstorm` to create one, then return to `/map-systems`."
+4. Skill exits without creating systems-index.md
+
+**Assertions:**
+- [ ] Skill outputs a clear error naming the missing file path
+- [ ] Skill recommends `/brainstorm` as the next action
+- [ ] No systems-index.md is created
+- [ ] Verdict is BLOCKED
+
+---
+
+### Case 3: Director Gate — CD-SYSTEMS returns CONCERNS (missing core system)
+
+**Fixture:**
+- Game concept exists
+- `production/session-state/review-mode.txt` contains `full`
+- CD-SYSTEMS gate returns CONCERNS: "The [core-system] is implied by the concept but not identified"
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Systems are drafted (5-8 initial systems identified)
+2. CD-SYSTEMS gate returns CONCERNS naming the missing core system
+3. TD-SYSTEM-BOUNDARY returns APPROVED
+4. Skill surfaces CD-SYSTEMS concerns to user
+5. User is asked: revise systems list to add the missing system, or proceed as-is
+6. If revised: updated systems list shown before "May I write" ask
+
+**Assertions:**
+- [ ] CD-SYSTEMS concerns are shown to the user before writing
+- [ ] Skill does NOT auto-write systems-index.md while CONCERNS are unresolved
+- [ ] User is given the option to revise or proceed
+- [ ] Revised systems list is re-shown after revision before final "May I write"
+
+---
+
+### Case 4: Edge Case — systems-index.md already exists
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists
+- `design/systems-index.md` already exists with N systems
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Skill reads the existing systems-index.md and presents its current state
+2. Skill asks: "systems-index.md already exists with [N] systems. Update with new systems, or review and revise priorities?"
+3. User chooses an action
+4. Skill does NOT silently overwrite the existing index
+
+**Assertions:**
+- [ ] Skill detects and reads the existing systems-index.md before proceeding
+- [ ] User is offered update/review options — not auto-overwritten
+- [ ] Existing system count is presented to the user
+- [ ] Skill does NOT proceed with a full re-decomposition without user choosing to do so
+
+---
+
+### Case 5: Director Gate — Lean mode and solo mode both skip gates, noted
+
+**Fixture (lean mode):**
+- Game concept exists
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Lean mode expected behavior:**
+1. Systems are decomposed and drafted
+2. Both CD-SYSTEMS and TD-SYSTEM-BOUNDARY are skipped
+3. Output notes: "CD-SYSTEMS skipped — lean mode" and "TD-SYSTEM-BOUNDARY skipped — lean mode"
+4. "May I write" ask proceeds directly
+
+**Assertions (lean mode):**
+- [ ] Both gate skip notes appear in output
+- [ ] Skill proceeds to "May I write" without gate approval
+- [ ] systems-index.md is written after user approval
+
+**Fixture (solo mode):**
+- Same game concept, `production/session-state/review-mode.txt` contains `solo`
+
+**Solo mode expected behavior:**
+1. Same decomposition workflow
+2. Both gates skipped — noted in output with "solo mode"
+3. "May I write" ask proceeds
+
+**Assertions (solo mode):**
+- [ ] Both skip notes appear with "solo mode" label
+- [ ] Behavior is otherwise identical to lean mode for this skill
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads game-concept.md and game-pillars.md before any decomposition
+- [ ] "May I write `design/systems-index.md`?" asked before writing
+- [ ] systems-index.md is NOT written without user approval
+- [ ] CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel in full mode
+- [ ] Skipped gates noted by name and mode in lean/solo output
+- [ ] Ends with next-step handoff: `/design-system [next-system]`
+
+---
+
+## Coverage Notes
+
+- Circular dependency detection (System A depends on System B which depends on A)
+ is part of the dependency mapping phase — not independently fixture-tested here.
+- Priority tier assignment (MVP heuristics) is evaluated as part of the Case 1
+ collaborative workflow rather than independently.
+- The `next` argument mode (handing off the highest-priority undesigned system to
+ `/design-system`) is not tested here — it is a post-index-creation convenience.
diff --git a/CCGS Skill Testing Framework/skills/pipeline/propagate-design-change.md b/CCGS Skill Testing Framework/skills/pipeline/propagate-design-change.md
new file mode 100644
index 0000000..26d0ef8
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/pipeline/propagate-design-change.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /propagate-design-change
+
+## Skill Summary
+
+`/propagate-design-change` handles GDD revision cascades. When a GDD is updated,
+the skill traces all downstream artifacts that reference it: ADRs, TR-registry
+entries, stories, and epics. It produces a structured impact report showing what
+needs to change and why. The skill does NOT automatically apply changes — it
+proposes edits for each affected artifact and asks "May I write" per artifact
+before making any modification.
+
+The skill is read-only during analysis and write-gated per artifact during the
+update phase. It has no director gates — the analysis itself is mechanical
+tracing, not a creative review.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED, NO IMPACT
+- [ ] Contains "May I write" collaborative protocol language (per-artifact approval)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents that changes are proposed, not applied automatically
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents during analysis.
+The impact report is a mechanical tracing operation; no creative or technical
+director review is required at the analysis stage.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — GDD revision affects 2 stories and 1 epic
+
+**Fixture:**
+- `design/gdd/[system].md` exists and has been recently revised (git diff shows changes)
+- `production/epics/[layer]/EPIC-[system].md` references this GDD
+- 2 story files reference TR-IDs from this GDD
+- The changed GDD section affects the acceptance criteria of both stories
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill reads the revised GDD and identifies what changed (git diff or content comparison)
+2. Skill scans ADRs, TR-registry, epics, and stories for references to this GDD
+3. Skill produces an impact report: 1 epic affected, 2 stories affected
+4. Skill shows the proposed change for each artifact
+5. For each artifact: asks "May I update [filepath]?" separately
+6. Applies changes only after per-artifact approval
+
+**Assertions:**
+- [ ] Impact report identifies all 3 affected artifacts (1 epic + 2 stories)
+- [ ] Each affected artifact's proposed change is shown before asking to write
+- [ ] "May I write" is asked per artifact (not once for all artifacts)
+- [ ] Skill does NOT apply any changes without per-artifact approval
+- [ ] Verdict is COMPLETE after all approved changes are applied
+
+---
+
+### Case 2: No Impact — Changed GDD has no downstream references
+
+**Fixture:**
+- `design/gdd/[system].md` exists and has been revised
+- No ADRs, stories, or epics reference this GDD's TR-IDs or GDD path
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill reads the revised GDD
+2. Skill scans all ADRs, stories, and epics for references
+3. No references found
+4. Skill outputs: "No downstream impact found for [system].md — no artifacts reference this GDD."
+5. No write operations are performed
+
+**Assertions:**
+- [ ] Skill outputs the "No downstream impact found" message
+- [ ] Verdict is NO IMPACT
+- [ ] No "May I write" asks are issued (nothing to update)
+- [ ] Skill does NOT error or crash when no references are found
+
+---
+
+### Case 3: In-Progress Story Warning — Referenced story is currently being developed
+
+**Fixture:**
+- A story referencing this GDD has `Status: In Progress`
+- The developer has already started implementing this story
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill identifies the In Progress story as an affected artifact
+2. Skill outputs an elevated warning: "CAUTION: [story-file] is currently In Progress — a developer may be working on this. Coordinate before updating."
+3. The warning appears in the impact report before the "May I write" ask for that story
+4. User can still approve or skip the update for that story
+
+**Assertions:**
+- [ ] In Progress story is flagged with an elevated warning (distinct from regular affected-artifact entries)
+- [ ] Warning appears before the "May I write" ask for that story
+- [ ] Skill still offers to update the story — the warning does not block the option
+- [ ] Other (non-In-Progress) artifacts are not affected by this warning
+
+---
+
+### Case 4: Edge Case — No argument provided
+
+**Fixture:**
+- Multiple GDDs exist in `design/gdd/`
+
+**Input:** `/propagate-design-change` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Skill outputs a usage error: "No GDD specified. Usage: /propagate-design-change design/gdd/[system].md"
+3. Skill lists recently modified GDDs as suggestions (git log)
+4. No analysis is performed
+
+**Assertions:**
+- [ ] Skill outputs a usage error when no argument is given
+- [ ] Usage example is shown with the correct path format
+- [ ] No impact analysis is performed without a target GDD
+- [ ] Skill does NOT silently pick a GDD without user input
+
+---
+
+### Case 5: Director Gate — No gate spawned regardless of review mode
+
+**Fixture:**
+- A GDD has been revised with downstream references
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill reads the GDD and traces downstream references
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. No director gate agents are spawned at any point
+4. Impact report is produced and per-artifact approval proceeds normally
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
+- [ ] Review mode has no effect on this skill's behavior
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads revised GDD and all potentially affected artifacts before producing impact report
+- [ ] Impact report shown in full before any "May I write" ask
+- [ ] "May I write" asked per artifact — never for the entire set at once
+- [ ] In Progress stories flagged with elevated warning before their approval ask
+- [ ] No director gates — no review-mode.txt read
+- [ ] Ends with next-step handoff appropriate to verdict (COMPLETE or NO IMPACT)
+
+---
+
+## Coverage Notes
+
+- ADR impact (when a GDD change requires an ADR update or new ADR) follows the
+ same per-artifact approval pattern as story/epic updates — not independently
+ fixture-tested.
+- TR-registry impact (when changed GDD requires new or updated TR-IDs) is part
+ of the analysis phase but not independently fixture-tested.
+- The git diff comparison method (detecting what changed in the GDD) is a runtime
+ concern — fixtures use pre-arranged content differences.
diff --git a/tests/skills/story-done.md b/CCGS Skill Testing Framework/skills/readiness/story-done.md
similarity index 81%
rename from tests/skills/story-done.md
rename to CCGS Skill Testing Framework/skills/readiness/story-done.md
index 2a072bf..8aa87eb 100644
--- a/tests/skills/story-done.md
+++ b/CCGS Skill Testing Framework/skills/readiness/story-done.md
@@ -142,6 +142,50 @@ Verified automatically by `/skill-test static` — no fixture needed.
---
+---
+
+### Case 5: Director Gate — LP-CODE-REVIEW behavior across review modes
+
+**Fixture:**
+- Story file at `production/epics/core/story-light-pickup.md`
+- All acceptance criteria verified, no GDD deviations
+- `production/session-state/review-mode.txt` exists
+
+**Case 5a — full mode:**
+- `review-mode.txt` contains `full`
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md` (full mode)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `full`
+2. After implementation verification, skill invokes LP-CODE-REVIEW gate
+3. Lead programmer reviews the implementation
+4. If LP verdict is NEEDS CHANGES → story cannot be marked Complete
+5. If LP verdict is APPROVED → skill proceeds to mark story Complete
+
+**Assertions (5a):**
+- [ ] Skill reads review mode before deciding whether to invoke LP-CODE-REVIEW
+- [ ] LP-CODE-REVIEW gate is invoked in full mode after implementation check
+- [ ] An LP NEEDS CHANGES verdict prevents story from being marked Complete
+- [ ] Gate result is noted in output: "Gate: LP-CODE-REVIEW — [result]"
+- [ ] Skill still asks "May I write" before updating story status even if LP approved
+
+**Case 5b — lean or solo mode:**
+- `review-mode.txt` contains `lean` or `solo`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean` or `solo`
+2. LP-CODE-REVIEW gate is SKIPPED
+3. Output notes the skip: "[LP-CODE-REVIEW] skipped — Lean/Solo mode"
+4. Story completion proceeds based on acceptance criteria check only
+
+**Assertions (5b):**
+- [ ] LP-CODE-REVIEW gate does NOT spawn in lean or solo mode
+- [ ] Skip is explicitly noted in output
+- [ ] Skill still requires "May I write" approval before marking story Complete
+
+---
+
## Protocol Compliance
- [ ] Uses "May I write" before updating the story file
diff --git a/tests/skills/story-readiness.md b/CCGS Skill Testing Framework/skills/readiness/story-readiness.md
similarity index 79%
rename from tests/skills/story-readiness.md
rename to CCGS Skill Testing Framework/skills/readiness/story-readiness.md
index 946fe0d..7b3f523 100644
--- a/tests/skills/story-readiness.md
+++ b/CCGS Skill Testing Framework/skills/readiness/story-readiness.md
@@ -132,6 +132,48 @@ Verified automatically by `/skill-test static` — no fixture needed.
---
+---
+
+### Case 5: Director Gate — QL-STORY-READY behavior across review modes
+
+**Fixture:**
+- Story file exists and is READY (all 4 dimensions pass, ADR Accepted, criteria present)
+- `production/session-state/review-mode.txt` exists
+
+**Case 5a — full mode:**
+- `review-mode.txt` contains `full`
+
+**Input:** `/story-readiness production/epics/core/story-light-pickup.md` (full mode)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `full`
+2. After completing its own 4-dimension check, skill invokes QL-STORY-READY gate
+3. QA lead reviews the story for readiness
+4. If QA lead verdict is INADEQUATE → story verdict is BLOCKED regardless of 4-dimension result
+5. If QA lead verdict is ADEQUATE → verdict proceeds normally
+
+**Assertions (5a):**
+- [ ] Skill reads review mode before deciding whether to invoke QL-STORY-READY
+- [ ] QL-STORY-READY gate is invoked in full mode after the 4-dimension check completes
+- [ ] A QA lead INADEQUATE verdict overrides a READY 4-dimension result → final verdict BLOCKED
+- [ ] Gate invocation is noted in output: "Gate: QL-STORY-READY — [result]"
+
+**Case 5b — lean or solo mode:**
+- `review-mode.txt` contains `lean` or `solo`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean` or `solo`
+2. QL-STORY-READY gate is SKIPPED
+3. Output notes the skip: "[QL-STORY-READY] skipped — Lean/Solo mode"
+4. Verdict is based on 4-dimension check only
+
+**Assertions (5b):**
+- [ ] QL-STORY-READY gate does NOT spawn in lean or solo mode
+- [ ] Skip is explicitly noted in output
+- [ ] Verdict is based on 4-dimension check alone
+
+---
+
## Protocol Compliance
- [ ] Does NOT use Write or Edit tools (read-only skill)
diff --git a/CCGS Skill Testing Framework/skills/review/architecture-review.md b/CCGS Skill Testing Framework/skills/review/architecture-review.md
new file mode 100644
index 0000000..99b21c3
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/review/architecture-review.md
@@ -0,0 +1,192 @@
+# Skill Test Spec: /architecture-review
+
+## Skill Summary
+
+`/architecture-review` is an Opus-tier skill that validates a technical architecture
+document against the project's 8 required architecture sections and checks that it
+is internally consistent, non-contradictory with existing ADRs, and correctly
+targeting the pinned engine version. It produces a verdict of APPROVED /
+NEEDS REVISION / MAJOR REVISION NEEDED.
+
+In `full` review mode, the skill spawns two director gate agents in parallel:
+TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY (lead-programmer). In
+`lean` or `solo` mode, both gates are skipped and noted. The skill is read-only —
+no files are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents gate behavior: TD-ARCHITECTURE + LP-FEASIBILITY in full mode; skipped in lean/solo
+
+---
+
+## Director Gate Checks
+
+In `full` mode: TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
+(lead-programmer) are spawned in parallel after the skill reads the architecture doc.
+
+In `lean` mode: both gates are skipped. Output notes:
+"TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
+
+In `solo` mode: both gates are skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Complete architecture doc in full mode
+
+**Fixture:**
+- `docs/architecture/architecture.md` exists with all 8 required sections populated
+- All sections reference the correct engine version from `docs/engine-reference/`
+- No contradictions with existing Accepted ADRs in `docs/architecture/`
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/architecture-review docs/architecture/architecture.md`
+
+**Expected behavior:**
+1. Skill reads the architecture document
+2. Skill reads existing ADRs for cross-reference
+3. Skill reads engine version reference
+4. TD-ARCHITECTURE and LP-FEASIBILITY gate agents spawn in parallel
+5. Both gates return APPROVED
+6. Skill outputs section-by-section completeness check (8/8 sections present)
+7. Verdict: APPROVED
+
+**Assertions:**
+- [ ] All 8 required sections are checked and reported
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel (not sequentially)
+- [ ] Verdict is APPROVED when all sections are present and no conflicts exist
+- [ ] Skill does NOT write any files
+- [ ] Next-step handoff to `/create-control-manifest` or `/create-epics` is present
+
+---
+
+### Case 2: Failure Path — Missing required sections
+
+**Fixture:**
+- `docs/architecture/architecture.md` exists but is missing at least 2 required sections
+ (e.g., no data model section, no error handling section)
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/architecture-review docs/architecture/architecture.md`
+
+**Expected behavior:**
+1. Skill reads the document and identifies missing sections
+2. Section completeness shows fewer than 8/8 sections present
+3. Missing sections are listed by name with specific remediation guidance
+4. Verdict: MAJOR REVISION NEEDED (≥2 missing sections)
+
+**Assertions:**
+- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) for ≥2 missing sections
+- [ ] Each missing section is named explicitly in the output
+- [ ] Remediation guidance is specific (what to add, not just "add missing sections")
+- [ ] Skill does NOT pass a document missing required sections
+
+---
+
+### Case 3: Partial Path — Architecture contradicts an existing ADR
+
+**Fixture:**
+- `docs/architecture/architecture.md` exists with all 8 sections present
+- One Accepted ADR in `docs/architecture/` establishes a constraint that the architecture doc contradicts
+ (e.g., ADR-001 mandates ECS pattern; architecture.md describes a different pattern for the same system)
+
+**Input:** `/architecture-review docs/architecture/architecture.md`
+
+**Expected behavior:**
+1. Skill reads the architecture doc and all existing ADRs
+2. Conflict is detected between the architecture doc and the named ADR
+3. Conflict entry names: the ADR number/title, the contradicting sections, and impact
+4. Verdict: NEEDS REVISION (conflict exists but structure is otherwise sound)
+
+**Assertions:**
+- [ ] Verdict is NEEDS REVISION (not MAJOR REVISION NEEDED for a single contradiction)
+- [ ] The specific ADR number and title are named in the conflict entry
+- [ ] The contradicting sections in both documents are identified
+- [ ] Skill does NOT auto-resolve the contradiction
+
+---
+
+### Case 4: Edge Case — File not found
+
+**Fixture:**
+- The path provided does not exist in the project
+
+**Input:** `/architecture-review docs/architecture/nonexistent.md`
+
+**Expected behavior:**
+1. Skill attempts to read the file
+2. File not found
+3. Skill outputs a clear error naming the missing file
+4. Skill suggests checking `docs/architecture/` or running `/create-architecture`
+5. Skill does NOT produce a verdict
+
+**Assertions:**
+- [ ] Skill outputs a clear error when the file is not found
+- [ ] No verdict is produced (APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED)
+- [ ] Skill suggests a corrective action
+- [ ] Skill does NOT crash or produce a partial report
+
+---
+
+### Case 5: Director Gate — Full mode spawns both gates; solo mode skips both
+
+**Fixture (full mode):**
+- `docs/architecture/architecture.md` exists with all 8 sections
+- `production/session-state/review-mode.txt` contains `full`
+
+**Full mode expected behavior:**
+1. TD-ARCHITECTURE gate spawns
+2. LP-FEASIBILITY gate spawns in parallel with TD-ARCHITECTURE
+3. Both gates complete before verdict is issued
+
+**Assertions (full mode):**
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY both appear in the output as completed gates
+- [ ] Both gates spawn in parallel (not one after the other)
+- [ ] Verdict reflects gate feedback
+
+**Fixture (solo mode):**
+- Same architecture doc
+- `production/session-state/review-mode.txt` contains `solo`
+
+**Solo mode expected behavior:**
+1. Skill reads the architecture doc
+2. Gates are NOT spawned
+3. Output notes: "TD-ARCHITECTURE skipped — solo mode" and "LP-FEASIBILITY skipped — solo mode"
+4. Verdict is based on structural checks only
+
+**Assertions (solo mode):**
+- [ ] Neither TD-ARCHITECTURE nor LP-FEASIBILITY appears as an active gate
+- [ ] Both skipped gates are noted in the output
+- [ ] Verdict is still produced based on the structural check alone
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT write any files (read-only skill)
+- [ ] Presents section completeness check before issuing verdict
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel in full mode
+- [ ] Skipped gates are noted by name and mode in lean/solo output
+- [ ] Verdict is one of exactly: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Ends with next-step handoff appropriate to verdict
+
+---
+
+## Coverage Notes
+
+- The 8 required architecture sections are project-specific; tests use the
+ section list defined in the skill body — not re-enumerated here.
+- Engine version compatibility checking (cross-referencing `docs/engine-reference/`)
+ is part of Case 1's happy path but not independently fixture-tested.
+- RTM (requirement traceability matrix) mode is a separate concern covered by
+ the `/architecture-review` skill's own `rtm` argument mode, not tested here.
diff --git a/tests/skills/design-review.md b/CCGS Skill Testing Framework/skills/review/design-review.md
similarity index 85%
rename from tests/skills/design-review.md
rename to CCGS Skill Testing Framework/skills/review/design-review.md
index ef6af82..c2c8e04 100644
--- a/tests/skills/design-review.md
+++ b/CCGS Skill Testing Framework/skills/review/design-review.md
@@ -125,6 +125,32 @@ Verified automatically by `/skill-test static` — no fixture needed.
---
+---
+
+### Case 5: Director Gate — no gate spawned regardless of review mode
+
+**Fixture:**
+- `design/gdd/light-manipulation.md` exists with all 8 sections
+- `production/session-state/review-mode.txt` exists with `full` (most permissive mode)
+
+**Input:** `/design-review design/gdd/light-manipulation.md` (with full review mode active)
+
+**Expected behavior:**
+1. Skill reads the GDD document
+2. Skill does NOT read `review-mode.txt` — this skill has no director gates
+3. Skill produces the review output normally
+4. No director gate agents are spawned at any point
+5. Verdict is APPROVED (all 8 sections present in fixture)
+
+**Assertions:**
+- [ ] Skill does NOT spawn any director gate agent (CD-, TD-, PR-, AD- prefixed agents)
+- [ ] Skill does NOT read `review-mode.txt` or equivalent mode file
+- [ ] The `--review` flag or `full` mode state has NO effect on whether directors spawn
+- [ ] Output does not contain any "Gate: [GATE-ID]" entries
+- [ ] Skill IS the review — it does not delegate the review to a director
+
+---
+
## Protocol Compliance
- [ ] Does NOT use Write or Edit tools (read-only skill)
diff --git a/CCGS Skill Testing Framework/skills/review/review-all-gdds.md b/CCGS Skill Testing Framework/skills/review/review-all-gdds.md
new file mode 100644
index 0000000..07c5d8c
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/review/review-all-gdds.md
@@ -0,0 +1,178 @@
+# Skill Test Spec: /review-all-gdds
+
+## Skill Summary
+
+`/review-all-gdds` is an Opus-tier skill that performs a holistic cross-GDD review
+across all files in `design/gdd/`. It runs two complementary review phases in
+parallel: Phase 1 checks for consistency (contradictions, formula mismatches,
+stale references, competing ownership), and Phase 2 checks design theory (dominant
+strategies, pillar drift, cognitive overload, economic imbalance). Because the two
+phases are independent, they are spawned simultaneously to save time. The skill
+produces a CONSISTENT / MINOR ISSUES / MAJOR ISSUES verdict and is read-only — no
+files are written without explicit user approval.
+
+The skill is itself the holistic review gate in the pipeline. It is invoked after
+individual GDDs are complete and before architecture work begins. It does NOT spawn
+any director gate agents (it IS the director-level review).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥5 phase headings (complex multi-phase skill)
+- [ ] Contains verdict keywords: CONSISTENT, MINOR ISSUES, MAJOR ISSUES
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents parallel phase spawning (Phase 1 and Phase 2 are independent)
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. It IS the holistic
+review; delegating to a director gate would create a circular dependency.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clean GDD set with no conflicts
+
+**Fixture:**
+- `design/gdd/` contains ≥3 system GDDs
+- All GDDs are internally consistent: no formula contradictions, no competing ownership, no stale references
+- All GDDs align with the pillars defined in `design/gdd/game-pillars.md`
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Skill reads all GDD files in `design/gdd/`
+2. Phase 1 (consistency scan) and Phase 2 (design theory check) spawn in parallel
+3. Phase 1 finds no contradictions, no formula mismatches, no ownership conflicts
+4. Phase 2 finds no pillar drift, no dominant strategies, no cognitive overload
+5. Skill outputs a structured findings table with 0 blocking issues
+6. Verdict: CONSISTENT
+
+**Assertions:**
+- [ ] Both review phases are spawned in parallel (not sequentially)
+- [ ] Output includes a findings table (even if empty — shows "No issues found")
+- [ ] Verdict is CONSISTENT when no conflicts are found
+- [ ] Skill does NOT write any files without user approval
+- [ ] Next-step handoff to `/architecture-review` or `/create-architecture` is present
+
+---
+
+### Case 2: Failure Path — Conflicting rules between two GDDs
+
+**Fixture:**
+- GDD-A defines a floor value (e.g. "minimum [output] is [N]")
+- GDD-B states a mechanic that bypasses that floor (e.g. "[mechanic] can reduce [output] to 0")
+- The two GDDs are otherwise complete and valid
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Phase 1 (consistency scan) detects the contradiction between GDD-A and GDD-B
+2. Conflict is reported with: both filenames, the specific conflicting rules, and severity HIGH
+3. Verdict: MAJOR ISSUES
+4. Handoff instructs user to resolve the conflict and re-run before proceeding
+
+**Assertions:**
+- [ ] Verdict is MAJOR ISSUES (not CONSISTENT or MINOR ISSUES)
+- [ ] Both GDD filenames are named in the conflict entry
+- [ ] The specific contradicting rules are quoted or described (not vague "conflict found")
+- [ ] Issue is classified as severity HIGH (blocking)
+- [ ] Skill does NOT auto-resolve the conflict
+
+---
+
+### Case 3: Partial Path — Single GDD with orphaned dependency reference
+
+**Fixture:**
+- GDD-A lists a dependency in its Dependencies section pointing to "system-B"
+- No GDD for system-B exists in `design/gdd/`
+- All other GDDs are consistent
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Phase 1 detects the orphaned dependency reference in GDD-A
+2. Issue is reported as: DEPENDENCY GAP — GDD-A references system-B which has no GDD
+3. No other conflicts found
+4. Verdict: MINOR ISSUES (dependency gap is advisory, not blocking by itself)
+
+**Assertions:**
+- [ ] Verdict is MINOR ISSUES (not MAJOR ISSUES for a single orphaned reference)
+- [ ] The specific GDD filename and the missing dependency name are reported
+- [ ] Skill suggests running `/design-system system-B` to resolve the gap
+- [ ] Skill does NOT skip or silently ignore the missing dependency
+
+---
+
+### Case 4: Edge Case — No GDD files found
+
+**Fixture:**
+- `design/gdd/` directory is empty or does not exist
+- No GDD files are present
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Skill attempts to read files in `design/gdd/`
+2. No files found — skill outputs an error with guidance
+3. Skill recommends running `/brainstorm` and `/design-system` before re-running
+4. Skill does NOT produce a verdict (CONSISTENT / MINOR ISSUES / MAJOR ISSUES)
+
+**Assertions:**
+- [ ] Skill outputs a clear error message when no GDDs are found
+- [ ] No verdict is produced when the directory is empty
+- [ ] Skill recommends the correct next action (`/brainstorm` or `/design-system`)
+- [ ] Skill does NOT crash or produce a partial report
+
+---
+
+### Case 5: Director Gate — No gate spawned regardless of review mode
+
+**Fixture:**
+- `design/gdd/` contains ≥2 consistent system GDDs
+- `production/session-state/review-mode.txt` exists with content `full`
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Skill reads all GDDs and runs the two review phases
+2. Skill does NOT read `review-mode.txt`
+3. Skill does NOT spawn any director gate agent (CD-, TD-, PR-, AD- prefixed)
+4. Skill completes and outputs its verdict normally
+5. Review mode setting has no effect on this skill's behavior
+
+**Assertions:**
+- [ ] No director gate agents are spawned at any point
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output does not contain any "Gate: [GATE-ID]" or "skipped" gate entries
+- [ ] The skill produces a verdict regardless of review mode
+- [ ] R4 metric: gate count for this skill = 0 in all modes
+
+---
+
+## Protocol Compliance
+
+- [ ] Phase 1 (consistency) and Phase 2 (design theory) spawned in parallel — not sequentially
+- [ ] Does NOT write any files without "May I write" approval
+- [ ] Findings table shown before any write ask
+- [ ] Verdict is one of exactly: CONSISTENT, MINOR ISSUES, MAJOR ISSUES
+- [ ] Ends with appropriate handoff: MAJOR ISSUES → fix and re-run; MINOR ISSUES → may proceed with awareness; CONSISTENT → `/create-architecture`
+
+---
+
+## Coverage Notes
+
+- Economic balance analysis (source/sink loops) requires cross-GDD resource data — covered
+ structurally by Case 2 (the conflict detection pattern is the same).
+- The design theory phase (Phase 2) checks including dominant strategy detection and
+ cognitive overload are not individually fixture-tested — they follow the same
+ pattern as consistency checks and are validated via the pillar drift case structure.
+- The `since-last-review` scoping mode is not tested here — it is a runtime concern.
diff --git a/CCGS Skill Testing Framework/skills/sprint/changelog.md b/CCGS Skill Testing Framework/skills/sprint/changelog.md
new file mode 100644
index 0000000..b00c48f
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/sprint/changelog.md
@@ -0,0 +1,169 @@
+# Skill Test Spec: /changelog
+
+## Skill Summary
+
+`/changelog` is a Haiku-tier skill that auto-generates a developer-facing
+changelog by reading git commit history and closed sprint stories since the
+last release tag. It organizes entries into features, fixes, and known issues.
+No director gates are used. The skill asks "May I write to `docs/CHANGELOG.md`?"
+before persisting. Verdict is always COMPLETE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language (skill writes changelog)
+- [ ] Has a next-step handoff (e.g., run /patch-notes for player-facing version)
+
+---
+
+## Director Gate Checks
+
+None. Changelog generation is a fast compilation task; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Multiple sprints since last release tag
+
+**Fixture:**
+- Git history has a tag `v0.3.0` three sprints ago
+- Since that tag: 12 commits across sprints 006, 007, 008
+- Sprint story files reference task IDs matching commit messages
+- `docs/CHANGELOG.md` does not yet exist
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill reads git log since `v0.3.0` tag
+2. Skill reads sprint stories to cross-reference task IDs
+3. Skill compiles entries into Features, Fixes, and Known Issues sections
+4. Skill presents draft to user
+5. Skill asks "May I write to `docs/CHANGELOG.md`?"
+6. User approves; file written; verdict COMPLETE
+
+**Assertions:**
+- [ ] Changelog covers commits since the most recent git tag
+- [ ] Entries are organized into Features / Fixes / Known Issues sections
+- [ ] Sprint story references are used to enrich commit descriptions
+- [ ] "May I write" prompt appears before file write
+- [ ] Verdict is COMPLETE after write
+
+---
+
+### Case 2: No Git Tags Found — All commits used, version baseline noted
+
+**Fixture:**
+- Git repository has commits but no tags exist
+- 20 commits in history across 3 sprints
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill checks for git tags — finds none
+2. Skill uses all commits in history as the baseline
+3. Skill notes in the output: "No version tag found — using full commit history; version baseline is unset"
+4. Skill still compiles organized changelog from available commits
+5. Skill asks "May I write" and writes on approval
+
+**Assertions:**
+- [ ] Skill does not error when no git tags exist
+- [ ] Output explicitly notes that no version baseline was found
+- [ ] Full commit history is used as the source
+- [ ] Changelog is still organized into sections despite missing tag
+
+---
+
+### Case 3: Commit Messages Without Task IDs — Grouped by date with note
+
+**Fixture:**
+- Git log since last tag has 8 commits
+- 5 commits have no task ID in the message (e.g., "fix typo", "tweak values")
+- 3 commits reference task IDs matching sprint stories
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill reads commits and sprint stories
+2. 3 commits are matched to sprint stories and placed in appropriate sections
+3. 5 untagged commits are grouped by date under a "Misc" or "Other Changes" section
+4. Output notes: "5 commits without task IDs — grouped by date"
+5. Skill writes changelog on approval
+
+**Assertions:**
+- [ ] Commits with task IDs are placed in appropriate sections (Features or Fixes)
+- [ ] Commits without task IDs are grouped separately with a note
+- [ ] Output flags the number of commits missing task references
+- [ ] No commits are silently dropped from the changelog
+
+---
+
+### Case 4: Existing CHANGELOG.md — New section prepended, old entries preserved
+
+**Fixture:**
+- `docs/CHANGELOG.md` already exists with sections for `v0.2.0` and `v0.3.0`
+- New commits exist since `v0.3.0` tag
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill detects that `docs/CHANGELOG.md` already exists
+2. Skill compiles new entries for the period since `v0.3.0`
+3. Skill presents draft with new section prepended above existing content
+4. Skill asks "May I write to `docs/CHANGELOG.md`?" (confirming prepend strategy)
+5. User approves; new content is prepended, old entries intact; verdict COMPLETE
+
+**Assertions:**
+- [ ] Skill reads existing changelog before writing to detect prior content
+- [ ] New section is prepended (not appended or overwriting) existing entries
+- [ ] Old changelog entries for v0.2.0 and v0.3.0 are preserved in the written file
+- [ ] "May I write" prompt reflects the prepend operation
+
+---
+
+### Case 5: Gate Compliance — No gate; read-then-write with approval
+
+**Fixture:**
+- Git history has commits since last tag
+- `review-mode.txt` contains `full`
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill compiles changelog in full mode
+2. No director gate is invoked (changelog generation is compilation, not a delivery gate)
+3. Skill runs on Haiku model — fast compilation
+4. Skill asks user for approval and writes file on confirmation
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output does not reference any gate result
+- [ ] Skill proceeds directly from compilation to "May I write" prompt
+- [ ] Verdict is COMPLETE
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads git log and sprint story files before compiling
+- [ ] Always asks "May I write" before writing changelog
+- [ ] No director gates are invoked
+- [ ] Verdict is always COMPLETE
+- [ ] Runs on Haiku model tier (fast, low-cost)
+
+---
+
+## Coverage Notes
+
+- The case where git is not initialized in the repository is not tested;
+ behavior would depend on git command failure handling.
+- Merge commits vs. squash commits are not explicitly differentiated in
+ these tests; implementation detail of the git log parsing phase.
+- The `/patch-notes` skill should be run after `/changelog` for player-facing
+ output; that handoff is verified in the patch-notes spec.
diff --git a/CCGS Skill Testing Framework/skills/sprint/milestone-review.md b/CCGS Skill Testing Framework/skills/sprint/milestone-review.md
new file mode 100644
index 0000000..6c0933f
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/sprint/milestone-review.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /milestone-review
+
+## Skill Summary
+
+`/milestone-review` generates a comprehensive review of a completed milestone:
+what shipped, velocity metrics, deferred items, risks surfaced, and retrospective
+seeds. In full mode the PR-MILESTONE director gate runs after the review is
+compiled (producer reviews scope delivery). In lean and solo modes the gate is
+skipped. The skill asks "May I write to `production/milestones/review-milestone-N.md`?"
+before persisting. Verdicts: MILESTONE COMPLETE or MILESTONE INCOMPLETE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: MILESTONE COMPLETE, MILESTONE INCOMPLETE
+- [ ] Contains "May I write" language (skill writes review document)
+- [ ] Has a next-step handoff (what to do after review is written)
+
+---
+
+## Director Gate Checks
+
+| Gate ID | Trigger condition | Mode guard |
+|---------------|--------------------------------|-------------------------|
+| PR-MILESTONE | After review document compiled | full only (not lean/solo) |
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Nearly complete milestone with one deferred story
+
+**Fixture:**
+- `production/milestones/milestone-03.md` exists with 8 stories
+- 7 stories have `Status: Complete`
+- 1 story has `Status: Deferred` (deferred to milestone-04)
+- `review-mode.txt` contains `full`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill reads `milestone-03.md` and all referenced sprint files
+2. Skill compiles: 7 shipped, 1 deferred; velocity; no blockers
+3. Skill presents review draft to user
+4. PR-MILESTONE gate invoked; producer approves
+5. Skill asks "May I write to `production/milestones/review-milestone-03.md`?"
+6. User approves; file is written; verdict MILESTONE COMPLETE
+
+**Assertions:**
+- [ ] Deferred story is noted in the review with its target milestone
+- [ ] Verdict is MILESTONE COMPLETE despite the one deferred story
+- [ ] PR-MILESTONE gate is invoked after draft compilation in full mode
+- [ ] Skill asks "May I write" before writing review file
+- [ ] Review document path matches `production/milestones/review-milestone-03.md`
+
+---
+
+### Case 2: Blocked Milestone — Multiple blocked stories
+
+**Fixture:**
+- `production/milestones/milestone-03.md` exists with 5 stories
+- 2 stories have `Status: Complete`
+- 3 stories have `Status: Blocked` (named blockers listed in each story)
+- `review-mode.txt` contains `full`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill reads milestone and sprint files
+2. Skill finds 3 blocked stories; compiles blocker details
+3. Verdict is MILESTONE INCOMPLETE
+4. PR-MILESTONE gate runs; producer notes the unresolved blockers
+5. Review is written with blocker list on approval
+
+**Assertions:**
+- [ ] Verdict is MILESTONE INCOMPLETE when any stories are Blocked
+- [ ] Each blocked story's name and blocker reason is listed in the review
+- [ ] PR-MILESTONE gate is still invoked in full mode even for INCOMPLETE verdict
+- [ ] "May I write" prompt still appears before file write
+
+---
+
+### Case 3: Full Mode — PR-MILESTONE returns CONCERNS
+
+**Fixture:**
+- Milestone-03 has 6 complete stories but 2 were not in the original scope (added mid-sprint)
+- `review-mode.txt` contains `full`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill compiles review; notes 2 out-of-scope stories shipped
+2. PR-MILESTONE gate invoked; producer returns CONCERNS about scope drift
+3. Skill surfaces the CONCERNS to the user and adds a "scope drift" note to the review
+4. User approves revised review; file written as MILESTONE COMPLETE with caveat
+
+**Assertions:**
+- [ ] CONCERNS from PR-MILESTONE gate are shown to user before write
+- [ ] Scope drift is explicitly noted in the written review document
+- [ ] Verdict is MILESTONE COMPLETE (stories shipped) with CONCERNS annotation
+- [ ] Skill does not suppress gate feedback
+
+---
+
+### Case 4: Edge Case — No milestone file found for specified milestone
+
+**Fixture:**
+- User calls `/milestone-review milestone-07`
+- `production/milestones/milestone-07.md` does NOT exist
+
+**Input:** `/milestone-review milestone-07`
+
+**Expected behavior:**
+1. Skill attempts to read `production/milestones/milestone-07.md`
+2. File not found; skill outputs an error message
+3. Skill suggests checking available milestones in `production/milestones/`
+4. No gate is invoked; no file is written
+
+**Assertions:**
+- [ ] Skill does not crash when milestone file is absent
+- [ ] Output names the expected file path in the error message
+- [ ] Output suggests checking `production/milestones/` for valid milestone names
+- [ ] Verdict is BLOCKED (cannot review a non-existent milestone)
+
+---
+
+### Case 5: Lean/Solo Mode — PR-MILESTONE gate skipped
+
+**Fixture:**
+- `production/milestones/milestone-03.md` exists with 5 complete stories
+- `review-mode.txt` contains `solo`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `solo`
+2. Skill compiles review draft
+3. PR-MILESTONE gate is skipped; output notes "[PR-MILESTONE] skipped — Solo mode"
+4. Skill asks user for direct approval of the review
+5. User approves; review file is written; verdict MILESTONE COMPLETE
+
+**Assertions:**
+- [ ] PR-MILESTONE gate is NOT invoked in solo (or lean) mode
+- [ ] Skip is explicitly noted in skill output
+- [ ] User direct approval is still required before write
+- [ ] Verdict is MILESTONE COMPLETE after successful write
+
+---
+
+## Protocol Compliance
+
+- [ ] Shows compiled review draft before invoking PR-MILESTONE or asking to write
+- [ ] Always asks "May I write" before writing review document
+- [ ] PR-MILESTONE gate only runs in full mode
+- [ ] Skip message appears in lean and solo output
+- [ ] Verdict is MILESTONE COMPLETE or MILESTONE INCOMPLETE, stated clearly
+
+---
+
+## Coverage Notes
+
+- The case where the milestone has zero stories is not tested; it follows the
+ MILESTONE INCOMPLETE pattern with a note suggesting the milestone may not
+ have been planned.
+- Velocity calculation specifics (story points vs. story count) are not
+ verified here; they are implementation details of the review compilation phase.
diff --git a/CCGS Skill Testing Framework/skills/sprint/patch-notes.md b/CCGS Skill Testing Framework/skills/sprint/patch-notes.md
new file mode 100644
index 0000000..ae2399d
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/sprint/patch-notes.md
@@ -0,0 +1,170 @@
+# Skill Test Spec: /patch-notes
+
+## Skill Summary
+
+`/patch-notes` is a Haiku-tier skill that generates player-facing patch notes
+from existing changelog content, stripping internal task IDs and technical
+jargon in favor of plain language. It filters entries to only those relevant
+to players (visible features and bug fixes; internal refactors are excluded).
+No director gates are used. The skill asks "May I write to
+`docs/patch-notes-vX.X.md`?" before persisting. Verdict is always COMPLETE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language (skill writes patch notes file)
+- [ ] Has a next-step handoff (e.g., share with community manager)
+
+---
+
+## Director Gate Checks
+
+None. Patch notes generation is a fast compilation task; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Changelog filtered to player-facing entries
+
+**Fixture:**
+- `docs/CHANGELOG.md` exists with 5 entries:
+ - "Add dual-wield melee system" (Features — player-facing)
+ - "Fix crash on level transition" (Fixes — player-facing)
+ - "Add enemy patrol AI" (Features — player-facing)
+ - "Refactor input handler to use event bus" (Fixes — internal only)
+ - "Update dependency: Godot 4.6" (internal only)
+- Version is `v0.4.0`
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill reads `docs/CHANGELOG.md`
+2. Skill filters to 3 player-facing entries; excludes 2 internal entries
+3. Skill rewrites entries in plain language (no task IDs, no tech jargon)
+4. Skill presents draft to user
+5. Skill asks "May I write to `docs/patch-notes-v0.4.0.md`?"
+6. User approves; file written; verdict COMPLETE
+
+**Assertions:**
+- [ ] Only 3 entries appear in the patch notes (2 internal entries excluded)
+- [ ] Entries are written in plain language without internal task IDs
+- [ ] File path matches `docs/patch-notes-v0.4.0.md`
+- [ ] "May I write" prompt appears before file write
+- [ ] Verdict is COMPLETE after write
+
+---
+
+### Case 2: No Changelog Found — Directed to run /changelog first
+
+**Fixture:**
+- `docs/CHANGELOG.md` does NOT exist
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill attempts to read `docs/CHANGELOG.md` — not found
+2. Skill outputs: "No changelog found — run /changelog first to generate one"
+3. No patch notes are generated; no file is written
+
+**Assertions:**
+- [ ] Skill does not crash when changelog is absent
+- [ ] Output explicitly directs user to run `/changelog`
+- [ ] No "May I write" prompt appears (nothing to write)
+- [ ] Verdict is BLOCKED (dependency not met)
+
+---
+
+### Case 3: Tone Guidance from Design Folder — Incorporated into output
+
+**Fixture:**
+- `docs/CHANGELOG.md` exists with player-facing entries
+- `design/community/tone-guide.md` exists with guidance: "upbeat, encouraging tone; avoid passive voice"
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill reads changelog
+2. Skill detects tone guide at `design/community/tone-guide.md`
+3. Skill applies tone guidance when rewriting entries in plain language
+4. Patch notes use upbeat, active-voice phrasing
+5. Skill presents draft, asks to write, writes on approval
+
+**Assertions:**
+- [ ] Skill checks `design/` for a community or tone guidance file
+- [ ] Tone guide content influences phrasing of patch note entries
+- [ ] Output reflects active voice and upbeat tone where applicable
+- [ ] Skill notes that tone guidance was applied
+
+---
+
+### Case 4: Patch Note Template Exists — Used instead of generated structure
+
+**Fixture:**
+- `.claude/docs/templates/patch-notes-template.md` exists with a structured header format
+- `docs/CHANGELOG.md` exists with player-facing entries
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill reads changelog and detects template exists
+2. Skill populates the template with player-facing entries
+3. Template header/footer structure is preserved in the output
+4. Skill asks "May I write" and writes on approval
+
+**Assertions:**
+- [ ] Skill checks for a patch notes template before generating from scratch
+- [ ] Template structure is used when found (not overridden by default format)
+- [ ] Player-facing entries are inserted into the correct template section
+- [ ] Output note confirms template was used
+
+---
+
+### Case 5: Gate Compliance — No gate; community-manager is separate
+
+**Fixture:**
+- `docs/CHANGELOG.md` exists with player-facing entries
+- `review-mode.txt` contains `full`
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill compiles patch notes in full mode
+2. No director gate is invoked (community review is a separate, manual step)
+3. Skill runs on Haiku model — fast compilation
+4. Skill notes in output: "Consider sharing draft with community manager before publishing"
+5. Skill asks user for approval and writes on confirmation
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output suggests (but does not require) community manager review
+- [ ] Skill proceeds directly from compilation to "May I write" prompt
+- [ ] Verdict is COMPLETE
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `docs/CHANGELOG.md` before generating patch notes
+- [ ] Filters entries to player-facing items only
+- [ ] Rewrites entries in plain language without internal IDs
+- [ ] Always asks "May I write" before writing patch notes file
+- [ ] No director gates are invoked
+- [ ] Runs on Haiku model tier (fast, low-cost)
+
+---
+
+## Coverage Notes
+
+- The case where all changelog entries are internal (zero player-facing items)
+ is not tested; behavior is an empty patch notes draft with a warning.
+- Version number parsing from the changelog header is an implementation detail
+ not verified here.
+- The community manager consultation noted in Case 5 is advisory; a separate
+ skill or manual review handles that step.
diff --git a/CCGS Skill Testing Framework/skills/sprint/retrospective.md b/CCGS Skill Testing Framework/skills/sprint/retrospective.md
new file mode 100644
index 0000000..b49ad28
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/sprint/retrospective.md
@@ -0,0 +1,169 @@
+# Skill Test Spec: /retrospective
+
+## Skill Summary
+
+`/retrospective` generates a structured sprint or milestone retrospective
+covering three categories: what went well, what didn't, and action items.
+It reads sprint files and session logs to compile observations, then produces
+a retrospective document. No director gates are used — retrospectives are
+team self-reflection artifacts. The skill asks "May I write to
+`production/retrospectives/retro-sprint-NNN.md`?" before persisting.
+Verdict is always COMPLETE (retrospective is structured output, not a pass/fail
+assessment).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language (skill writes retrospective document)
+- [ ] Has a next-step handoff (what to do after retrospective is written)
+
+---
+
+## Director Gate Checks
+
+None. Retrospectives are team self-reflection documents; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Sprint with mixed outcomes
+
+**Fixture:**
+- `production/sprints/sprint-005.md` exists with 6 stories (4 Complete, 1 Blocked, 1 Deferred)
+- `production/session-logs/` contains log entries for the sprint period
+- No prior retrospective exists for sprint-005
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill reads sprint-005 and session logs
+2. Skill compiles three retrospective categories: went well (4 stories shipped),
+ didn't (1 blocked, 1 deferred), and action items (address blocker root cause)
+3. Skill presents retrospective draft to user
+4. Skill asks "May I write to `production/retrospectives/retro-sprint-005.md`?"
+5. User approves; file is written; verdict COMPLETE
+
+**Assertions:**
+- [ ] Retrospective contains all three categories (went well / didn't / actions)
+- [ ] Blocked and deferred stories appear in the "what didn't" section
+- [ ] At least one action item is generated from the blocked story
+- [ ] Skill asks "May I write" before writing file
+- [ ] Verdict is COMPLETE after successful write
+
+---
+
+### Case 2: No Sprint Data — Manual input fallback
+
+**Fixture:**
+- User calls `/retrospective sprint-009`
+- `production/sprints/sprint-009.md` does NOT exist
+- No session logs reference sprint-009
+
+**Input:** `/retrospective sprint-009`
+
+**Expected behavior:**
+1. Skill attempts to read sprint-009 — not found
+2. Skill informs user that no sprint data was found for sprint-009
+3. Skill prompts user to provide retrospective input manually (went well, didn't, actions)
+4. User provides input; skill formats it into the retrospective structure
+5. Skill asks "May I write" and writes the document on approval
+
+**Assertions:**
+- [ ] Skill does not crash or produce an empty document when sprint file is absent
+- [ ] User is prompted to provide manual input
+- [ ] Manual input is formatted into the three-category structure
+- [ ] "May I write" prompt still appears before file write
+
+---
+
+### Case 3: Prior Retrospective Exists — Offer to append or replace
+
+**Fixture:**
+- `production/retrospectives/retro-sprint-005.md` already exists with content
+- User re-runs `/retrospective sprint-005` after changes
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill detects that `retro-sprint-005.md` already exists
+2. Skill presents user with choice: append new observations or replace existing file
+3. User selects "replace"; skill compiles fresh retrospective
+4. Skill asks "May I write to `production/retrospectives/retro-sprint-005.md`?" (confirming overwrite)
+5. File is overwritten; verdict COMPLETE
+
+**Assertions:**
+- [ ] Skill checks for existing retrospective file before compiling
+- [ ] User is offered append or replace choice — not silently overwritten
+- [ ] "May I write" prompt reflects the overwrite scenario
+- [ ] Verdict is COMPLETE after write regardless of append vs. replace
+
+---
+
+### Case 4: Edge Case — Unresolved action items from previous retrospective
+
+**Fixture:**
+- `production/retrospectives/retro-sprint-004.md` exists with 2 action items marked `[ ]` (not done)
+- User runs `/retrospective sprint-005`
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill reads the most recent prior retrospective (retro-sprint-004)
+2. Skill detects 2 unchecked action items from sprint-004
+3. Skill includes a "Carry-over from Sprint 004" section in the new retrospective
+4. The unresolved items are listed with a note that they were not followed up
+
+**Assertions:**
+- [ ] Skill reads the most recent prior retrospective to check for open action items
+- [ ] Unresolved action items appear in the new retrospective under a carry-over section
+- [ ] Carry-over items are distinct from newly generated action items
+- [ ] Output notes that these items were not followed up in the previous sprint
+
+---
+
+### Case 5: Gate Compliance — No gate invoked in any mode
+
+**Fixture:**
+- `production/sprints/sprint-005.md` exists with complete stories
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill compiles retrospective in full mode
+2. No director gate is invoked (retrospectives are team self-reflection, not delivery gates)
+3. Skill asks user for approval and writes file on confirmation
+4. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output does not contain any gate invocation or gate result notation
+- [ ] Skill proceeds directly from compilation to "May I write" prompt
+- [ ] Review mode file content is irrelevant to this skill's behavior
+
+---
+
+## Protocol Compliance
+
+- [ ] Always shows retrospective draft before asking to write
+- [ ] Always asks "May I write" before writing retrospective file
+- [ ] No director gates are invoked
+- [ ] Verdict is always COMPLETE (not a pass/fail skill)
+- [ ] Checks prior retrospective for unresolved action items
+
+---
+
+## Coverage Notes
+
+- Milestone retrospectives (as opposed to sprint retrospectives) follow the
+ same pattern but read milestone files instead of sprint files; not
+ separately tested here.
+- The case where session logs are empty is similar to Case 2 (no data);
+ the skill falls back to manual input in both situations.
diff --git a/CCGS Skill Testing Framework/skills/sprint/sprint-plan.md b/CCGS Skill Testing Framework/skills/sprint/sprint-plan.md
new file mode 100644
index 0000000..b0c5aaa
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/sprint/sprint-plan.md
@@ -0,0 +1,177 @@
+# Skill Test Spec: /sprint-plan
+
+## Skill Summary
+
+`/sprint-plan` reads the current milestone file and backlog stories, then
+generates a new numbered sprint with stories prioritized by implementation layer
+and priority score. In full mode the PR-SPRINT director gate runs after the
+sprint draft is compiled (producer reviews the plan). In lean and solo modes
+the gate is skipped. The skill asks "May I write to `production/sprints/sprint-NNN.md`?"
+before persisting. Verdicts: COMPLETE (sprint generated and written) or
+BLOCKED (cannot proceed due to missing data or gate failure).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" language (skill writes sprint file)
+- [ ] Has a next-step handoff (what to do after sprint is written)
+
+---
+
+## Director Gate Checks
+
+| Gate ID | Trigger condition | Mode guard |
+|-----------|--------------------------|--------------------|
+| PR-SPRINT | After sprint draft built | full only (not lean/solo) |
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Backlog with stories generates sprint
+
+**Fixture:**
+- `production/milestones/milestone-02.md` exists with capacity `10 story points`
+- Backlog contains 5 unstarted stories across 2 epics, mixed priorities
+- `production/session-state/review-mode.txt` contains `full`
+- Next sprint number is `003` (sprints 001 and 002 already exist)
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads current milestone to obtain capacity and goals
+2. Skill reads all unstarted stories from backlog; sorts by layer + priority
+3. Skill drafts sprint-003 with stories fitting within capacity
+4. Skill presents draft to user before invoking gate
+5. Skill invokes PR-SPRINT gate (full mode); producer approves
+6. Skill asks "May I write to `production/sprints/sprint-003.md`?"
+7. User approves; file is written
+
+**Assertions:**
+- [ ] Stories are sorted by implementation layer before priority
+- [ ] Sprint draft is shown before any write or gate invocation
+- [ ] PR-SPRINT gate is invoked in full mode after draft is ready
+- [ ] Skill asks "May I write" before writing the sprint file
+- [ ] Written file path matches `production/sprints/sprint-003.md`
+- [ ] Verdict is COMPLETE after successful write
+
+---
+
+### Case 2: Blocked Path — Backlog is empty
+
+**Fixture:**
+- `production/milestones/milestone-02.md` exists
+- No unstarted stories exist in any epic backlog
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads backlog — finds no unstarted stories
+2. Skill outputs "No unstarted stories in backlog"
+3. Skill suggests running `/create-stories` to populate the backlog
+4. No gate is invoked; no file is written
+
+**Assertions:**
+- [ ] Verdict is BLOCKED
+- [ ] Output contains "No unstarted stories" or equivalent message
+- [ ] Output recommends `/create-stories`
+- [ ] PR-SPRINT gate is NOT invoked
+- [ ] No write tool is called
+
+---
+
+### Case 3: Gate returns CONCERNS — Sprint overloaded, revised before write
+
+**Fixture:**
+- Backlog has 8 stories totalling 16 points; milestone capacity is 10 points
+- `review-mode.txt` contains `full`
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill drafts sprint with all 8 stories (over capacity)
+2. PR-SPRINT gate runs; producer returns CONCERNS: sprint is overloaded
+3. Skill presents concern to user and asks which stories to defer
+4. User selects 3 stories to defer; sprint is revised to 5 stories / 10 points
+5. Skill asks "May I write" with revised sprint; writes on approval
+
+**Assertions:**
+- [ ] CONCERNS from PR-SPRINT gate surfaces to user before any write
+- [ ] Skill allows sprint to be revised after gate feedback
+- [ ] Revised sprint (not original) is written to file
+- [ ] Verdict is COMPLETE after revision and write
+
+---
+
+### Case 4: Lean Mode — PR-SPRINT gate skipped
+
+**Fixture:**
+- Backlog has 4 stories; milestone capacity is 8 points
+- `review-mode.txt` contains `lean`
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean`
+2. Skill drafts sprint and presents it to user
+3. PR-SPRINT gate is skipped; output notes "[PR-SPRINT] skipped — Lean mode"
+4. Skill asks user for direct approval of the sprint
+5. User approves; sprint file is written
+
+**Assertions:**
+- [ ] PR-SPRINT gate is NOT invoked in lean mode
+- [ ] Skip is explicitly noted in output
+- [ ] User approval is still required before write (gate skip ≠ approval skip)
+- [ ] Verdict is COMPLETE after write
+
+---
+
+### Case 5: Edge Case — Previous sprint still has open stories
+
+**Fixture:**
+- `production/sprints/sprint-002.md` exists with 2 stories still `Status: In Progress`
+- Backlog has 5 new unstarted stories
+- `review-mode.txt` contains `full`
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads sprint-002 and detects 2 open (in-progress) stories
+2. Skill flags: "Sprint 002 has 2 open stories — confirm carry-over before planning sprint 003"
+3. Skill presents user with choice: carry stories over, defer them, or cancel
+4. User confirms carry-over; carried stories are prepended to new sprint with `[CARRY]` tag
+5. Sprint draft is built; PR-SPRINT gate runs; sprint is written on approval
+
+**Assertions:**
+- [ ] Skill checks the most recent sprint file for open stories
+- [ ] User is asked to confirm carry-over before sprint planning continues
+- [ ] Carried stories appear in the new sprint draft with a distinguishing label
+- [ ] Skill does not silently ignore open stories from the previous sprint
+
+---
+
+## Protocol Compliance
+
+- [ ] Shows draft sprint before invoking PR-SPRINT gate or asking to write
+- [ ] Always asks "May I write" before writing sprint file
+- [ ] PR-SPRINT gate only runs in full mode
+- [ ] Skip message appears in lean and solo mode output
+- [ ] Verdict is clearly stated at the end of the skill output
+
+---
+
+## Coverage Notes
+
+- The case where no milestone file exists is not explicitly tested; behavior
+ follows the BLOCKED pattern with a suggestion to run `/gate-check` for
+ milestone progression.
+- Solo mode behavior is equivalent to lean (gate skipped, user approval
+ required) and is not separately tested.
+- Parallel story selection algorithms are not tested here; those are unit
+ concerns for the sprint-plan subagent.
diff --git a/CCGS Skill Testing Framework/skills/sprint/sprint-status.md b/CCGS Skill Testing Framework/skills/sprint/sprint-status.md
new file mode 100644
index 0000000..93170bf
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/sprint/sprint-status.md
@@ -0,0 +1,167 @@
+# Skill Test Spec: /sprint-status
+
+## Skill Summary
+
+`/sprint-status` is a Haiku-tier read-only skill that reads the current active
+sprint file and the session state to produce a concise sprint health summary.
+It reports story counts by status (Complete / In Progress / Blocked / Not Started)
+and emits one of three sprint-health verdicts: ON TRACK, AT RISK, or BLOCKED.
+It never writes files and does not invoke any director gates. It is designed for
+fast, low-cost status checks during a session.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings or numbered check sections
+- [ ] Contains verdict keywords: ON TRACK, AT RISK, BLOCKED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff (what to do based on the verdict)
+
+---
+
+## Director Gate Checks
+
+None. `/sprint-status` is a read-only reporting skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Mixed sprint, AT RISK with named blocker
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists (active sprint, linked in `active.md`)
+- Sprint contains 6 stories:
+ - 3 with `Status: Complete`
+ - 2 with `Status: In Progress`
+ - 1 with `Status: Blocked` (blocker: "Waiting on physics ADR acceptance")
+- Sprint end date is 2 days away
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads `production/session-state/active.md` to find active sprint reference
+2. Skill reads `production/sprints/sprint-004.md`
+3. Skill counts stories by status: 3 Complete, 2 In Progress, 1 Blocked
+4. Skill detects a Blocked story and the approaching deadline
+5. Skill outputs AT RISK verdict with the blocker named explicitly
+
+**Assertions:**
+- [ ] Output includes story count breakdown by status
+- [ ] Output names the specific blocked story and its blocker reason
+- [ ] Verdict is AT RISK (not BLOCKED, not ON TRACK) when any story is Blocked
+- [ ] Skill does not write any files
+
+---
+
+### Case 2: All Stories Complete — Sprint COMPLETE verdict
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists
+- All 5 stories have `Status: Complete`
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads sprint file — all stories are Complete
+2. Skill outputs ON TRACK verdict or SPRINT COMPLETE label
+3. Skill suggests running `/milestone-review` or `/sprint-plan` as next steps
+
+**Assertions:**
+- [ ] Verdict is ON TRACK or SPRINT COMPLETE when all stories are Complete
+- [ ] Output notes that the sprint is fully done
+- [ ] Next-step suggestion references `/milestone-review` or `/sprint-plan`
+- [ ] No files are written
+
+---
+
+### Case 3: No Active Sprint File — Guidance to run /sprint-plan
+
+**Fixture:**
+- `production/session-state/active.md` does not reference an active sprint
+- `production/sprints/` directory is empty or absent
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads `active.md` — finds no active sprint reference
+2. Skill checks `production/sprints/` — finds no files
+3. Skill outputs an informational message: no active sprint detected
+4. Skill suggests running `/sprint-plan` to create one
+
+**Assertions:**
+- [ ] Skill does not error or crash when no sprint file exists
+- [ ] Output clearly states no active sprint was found
+- [ ] Output recommends `/sprint-plan` as the next action
+- [ ] No verdict keyword is emitted (no sprint to assess)
+
+---
+
+### Case 4: Edge Case — Stale In Progress Story (flagged)
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists
+- One story has `Status: In Progress` with a note in `active.md`:
+ `Last updated: 2026-03-30` (more than 2 days before today's session date)
+- No stories are Blocked
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads sprint file and session state
+2. Skill detects the story has been In Progress for >2 days without update
+3. Skill flags the story as "stale" in the output
+4. Verdict is AT RISK (stale in-progress stories indicate a hidden blocker)
+
+**Assertions:**
+- [ ] Skill compares story "last updated" metadata against session date
+- [ ] Stale In Progress story is flagged by name in the output
+- [ ] Verdict is AT RISK, not ON TRACK, when a stale story is detected
+- [ ] Output does not conflate "stale" with "Blocked" — the label is distinct
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate invocation
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists with 4 stories (2 Complete, 2 In Progress)
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads sprint and produces status summary
+2. Skill does NOT invoke any director gate regardless of review mode
+3. Output is a plain status report with ON TRACK, AT RISK, or BLOCKED verdict
+4. Skill does not prompt for user approval or ask to write any file
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Output does not contain any "May I write" prompt
+- [ ] Skill completes and returns a verdict without user interaction
+- [ ] Review mode file is ignored (or confirmed irrelevant) by this skill
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT use Write or Edit tools (read-only skill)
+- [ ] Presents story count breakdown before emitting verdict
+- [ ] Does not ask for approval
+- [ ] Ends with a recommended next step based on verdict
+- [ ] Runs on Haiku model tier (fast, low-cost)
+
+---
+
+## Coverage Notes
+
+- The case where multiple sprints are active simultaneously is not tested;
+ the skill reads whichever sprint `active.md` references.
+- Partial sprint completion percentages are not explicitly verified; the
+ count-by-status output implies them.
+- The `solo` mode review-mode variant is not separately tested; gate
+ behavior in Case 5 applies to all modes equally.
diff --git a/CCGS Skill Testing Framework/skills/team/team-audio.md b/CCGS Skill Testing Framework/skills/team/team-audio.md
new file mode 100644
index 0000000..5aafb00
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-audio.md
@@ -0,0 +1,210 @@
+# Skill Test Spec: /team-audio
+
+## Skill Summary
+
+Orchestrates the audio team through a four-step pipeline: audio direction
+(audio-director) → sound design + accessibility review in parallel (sound-designer
++ accessibility-specialist) → technical implementation + engine validation in
+parallel (technical-artist + primary engine specialist) → code integration
+(gameplay-programmer). Reads relevant GDDs, the sound bible (if present), and
+existing audio asset lists before spawning agents. Compiles all outputs into an
+audio design document saved to `design/gdd/audio-[feature].md`. Uses
+`AskUserQuestion` at each step transition. Verdict is COMPLETE when the audio
+design document is produced. Skips the engine specialist spawn gracefully when no
+engine is configured.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 step/phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "File Write Protocol" section
+- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
+- [ ] Sub-agents enforce "May I write to [path]?" before any write
+- [ ] Has a next-step handoff at the end (references `/dev-story`, `/asset-audit`)
+- [ ] Error Recovery Protocol section is present
+- [ ] `AskUserQuestion` is used at step transitions before proceeding
+- [ ] Step 2 explicitly spawns sound-designer and accessibility-specialist in parallel
+- [ ] Step 3 explicitly spawns technical-artist and engine specialist in parallel (when engine is configured)
+- [ ] Skill reads `design/gdd/sound-bible.md` during context gathering if it exists
+- [ ] Output document is saved to `design/gdd/audio-[feature].md`
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All steps complete, audio design document saved
+
+**Fixture:**
+- GDD for the target feature exists at `design/gdd/combat.md`
+- Sound bible exists at `design/gdd/sound-bible.md`
+- Existing audio assets are listed in `assets/audio/`
+- Engine is configured in `.claude/docs/technical-preferences.md`
+- No accessibility gaps exist in the planned audio event list
+
+**Input:** `/team-audio combat`
+
+**Expected behavior:**
+1. Context gathering: orchestrator reads `design/gdd/combat.md`, `design/gdd/sound-bible.md`, and `assets/audio/` asset list before spawning any agent
+2. Step 1: audio-director is spawned; defines sonic identity, emotional tone, adaptive music direction, mix targets, and adaptive audio rules for combat
+3. `AskUserQuestion` presents audio direction; user approves before Step 2 begins
+4. Step 2: sound-designer and accessibility-specialist are spawned in parallel; sound-designer produces SFX specifications, audio event list with trigger conditions, and mixing groups; accessibility-specialist identifies critical gameplay audio events and specifies visual fallback and subtitle requirements
+5. `AskUserQuestion` presents SFX spec and accessibility requirements; user approves before Step 3 begins
+6. Step 3: technical-artist and primary engine specialist are spawned in parallel; technical-artist designs bus structure, middleware integration, memory budgets, and streaming strategy; engine specialist validates that the integration approach is idiomatic for the configured engine
+7. `AskUserQuestion` presents technical plan; user approves before Step 4 begins
+8. Step 4: gameplay-programmer is spawned; wires up audio events to gameplay triggers, implements adaptive music, sets up occlusion zones, writes unit tests for audio event triggers
+9. Orchestrator compiles all outputs into a single audio design document
+10. Subagent asks "May I write the audio design document to `design/gdd/audio-combat.md`?" before writing
+11. Summary output lists: audio event count, estimated asset count, implementation tasks, and any open questions
+12. Verdict: COMPLETE
+
+**Assertions:**
+- [ ] Sound bible is read during context gathering (before Step 1) when it exists
+- [ ] audio-director is spawned before sound-designer or accessibility-specialist
+- [ ] `AskUserQuestion` appears after Step 1 output and before Step 2 launch
+- [ ] sound-designer and accessibility-specialist Task calls are issued simultaneously in Step 2
+- [ ] technical-artist and engine specialist Task calls are issued simultaneously in Step 3
+- [ ] gameplay-programmer is not launched until Step 3 `AskUserQuestion` is approved
+- [ ] Audio design document is written to `design/gdd/audio-combat.md` (not another path)
+- [ ] Summary includes audio event count and estimated asset count
+- [ ] No files are written by the orchestrator directly
+- [ ] Verdict is COMPLETE after document delivery
+
+---
+
+### Case 2: Accessibility Gap — Critical gameplay audio event has no visual fallback
+
+**Fixture:**
+- GDD for the target feature exists
+- Step 1 and Step 2 are in progress
+- sound-designer's audio event list includes "EnemyNearbyAlert" — a spatial audio cue that warns the player an enemy is approaching from off-screen
+- accessibility-specialist reviews the event list and finds "EnemyNearbyAlert" has no visual fallback (no on-screen indicator, no subtitle, no controller rumble specified)
+
+**Input:** `/team-audio stealth` (Step 2 scenario)
+
+**Expected behavior:**
+1. Steps 1–2 proceed; accessibility-specialist and sound-designer are spawned in parallel
+2. accessibility-specialist returns its review with a BLOCKING concern: "`EnemyNearbyAlert` is a critical gameplay audio event (warns player of off-screen threat) with no visual fallback — hearing-impaired players cannot detect this threat. This is a BLOCKING accessibility gap."
+3. Orchestrator surfaces the concern immediately in conversation before presenting `AskUserQuestion`
+4. `AskUserQuestion` presents the accessibility concern as a BLOCKING issue with options:
+ - Add a visual indicator for EnemyNearbyAlert (e.g., directional arrow on HUD) and continue
+ - Add controller haptic feedback as the fallback and continue
+ - Stop here and resolve all accessibility gaps before proceeding to Step 3
+5. Step 3 (technical-artist + engine specialist) is not launched until the user resolves or explicitly accepts the gap
+6. The accessibility gap is included in the final audio design document under "Open Accessibility Issues" if unresolved
+
+**Assertions:**
+- [ ] Accessibility gap is labeled BLOCKING (not advisory) in the report
+- [ ] The specific event name ("EnemyNearbyAlert") and the nature of the gap are stated
+- [ ] `AskUserQuestion` surfaces the gap before Step 3 is launched
+- [ ] At least one resolution option is offered (add visual fallback, add haptic fallback)
+- [ ] Step 3 is not launched while the gap is unresolved without explicit user authorization
+- [ ] If the gap is carried forward unresolved, it is documented in the audio design doc as an open issue
+
+---
+
+### Case 3: No Argument — Usage guidance or design doc inference
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-audio` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs usage guidance: e.g., "Usage: `/team-audio [feature or area]` — specify the feature or area to design audio for (e.g., `combat`, `main menu`, `forest biome`, `boss encounter`)"
+3. Skill exits without spawning any agents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any agents when no argument is provided
+- [ ] Usage message includes the correct invocation format with argument examples
+- [ ] Skill does NOT attempt to infer a feature from existing design docs without user direction
+- [ ] No `AskUserQuestion` is used — output is direct guidance
+
+---
+
+### Case 4: Missing Sound Bible — Skill notes the gap and proceeds without it
+
+**Fixture:**
+- GDD for the target feature exists at `design/gdd/main-menu.md`
+- `design/gdd/sound-bible.md` does NOT exist
+- Engine is configured; other context files are present
+
+**Input:** `/team-audio main menu`
+
+**Expected behavior:**
+1. Context gathering: orchestrator reads `design/gdd/main-menu.md` and checks for `design/gdd/sound-bible.md`
+2. Sound bible is not found; orchestrator notes the gap in conversation: "Note: `design/gdd/sound-bible.md` not found — audio direction will proceed without a project-wide sonic identity reference. Consider creating a sound bible if this is an ongoing project."
+3. Pipeline proceeds normally through all four steps without the sound bible as input
+4. audio-director in Step 1 is informed that no sound bible exists and must establish sonic identity from the feature GDD alone
+5. The missing sound bible is mentioned in the final summary as a recommended next step
+
+**Assertions:**
+- [ ] Orchestrator checks for the sound bible during context gathering (before Step 1)
+- [ ] Missing sound bible is noted explicitly in conversation — not silently ignored
+- [ ] Pipeline does NOT halt due to the missing sound bible
+- [ ] audio-director is notified that no sound bible exists in its prompt context
+- [ ] Summary or Next Steps section recommends creating a sound bible
+- [ ] Verdict is still COMPLETE if all other steps succeed
+
+---
+
+### Case 5: Engine Not Configured — Engine specialist step skipped gracefully
+
+**Fixture:**
+- Engine is NOT configured in `.claude/docs/technical-preferences.md` (shows `[TO BE CONFIGURED]`)
+- GDD for the target feature exists
+- Sound bible may or may not exist
+
+**Input:** `/team-audio boss encounter`
+
+**Expected behavior:**
+1. Context gathering: orchestrator reads `.claude/docs/technical-preferences.md` and detects no engine is configured
+2. Steps 1–2 proceed normally (audio-director, sound-designer, accessibility-specialist)
+3. Step 3: technical-artist is spawned normally; engine specialist spawn is SKIPPED
+4. Orchestrator notes in conversation: "Engine specialist not spawned — no engine configured in technical-preferences.md. Engine integration validation will be deferred until an engine is selected."
+5. Step 4: gameplay-programmer proceeds with a note that engine-specific audio integration patterns could not be validated
+6. The engine specialist gap is included in the audio design document under "Deferred Validation"
+7. Verdict: COMPLETE (skip is graceful, not a blocker)
+
+**Assertions:**
+- [ ] Engine specialist is NOT spawned when no engine is configured
+- [ ] Skill does NOT error out due to the missing engine configuration
+- [ ] The skip is explicitly noted in conversation — not silently omitted
+- [ ] technical-artist is still spawned in Step 3 (skip applies only to the engine specialist)
+- [ ] gameplay-programmer proceeds in Step 4 with the deferred validation noted
+- [ ] Deferred engine validation is recorded in the audio design document
+- [ ] Verdict is COMPLETE (engine not configured is a known graceful case)
+
+---
+
+## Protocol Compliance
+
+- [ ] Context gathering (GDDs, sound bible, asset list) runs before any agent is spawned
+- [ ] `AskUserQuestion` is used after every step output before the next step launches
+- [ ] Parallel spawning: Step 2 (sound-designer + accessibility-specialist) and Step 3 (technical-artist + engine specialist) issue all Task calls before waiting for results
+- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
+- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
+- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
+- [ ] A partial report is always produced when some agents complete and others block
+- [ ] Audio design document path follows the pattern `design/gdd/audio-[feature].md`
+- [ ] Verdict is exactly COMPLETE or BLOCKED — no other verdict values used
+- [ ] Next Steps handoff references `/dev-story` and `/asset-audit`
+
+---
+
+## Coverage Notes
+
+- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
+ Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+ + partial-report pattern validated in Cases 2 and 5.
+- Step 4 (gameplay-programmer) happy-path behavior is validated implicitly by Case 1.
+ Failure modes for this step follow the standard Error Recovery Protocol.
+- The accessibility-specialist's subtitle and caption requirements (beyond visual fallbacks)
+ are validated implicitly by Case 1. Case 2 focuses on the more severe case where a
+ critical gameplay event has no fallback at all.
+- Engine specialist validation logic (idiomatic integration, version-specific changes) is
+ tested only for the configured and unconfigured states. The specific content of the
+ engine specialist's output is out of scope for this behavioral spec.
diff --git a/CCGS Skill Testing Framework/skills/team/team-combat.md b/CCGS Skill Testing Framework/skills/team/team-combat.md
new file mode 100644
index 0000000..e73c303
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-combat.md
@@ -0,0 +1,180 @@
+# Skill Test Spec: /team-combat
+
+## Skill Summary
+
+Orchestrates the full combat team pipeline end-to-end for a single combat feature.
+Coordinates game-designer, gameplay-programmer, ai-programmer, technical-artist,
+sound-designer, the primary engine specialist, and qa-tester through six structured
+phases: Design → Architecture (with engine specialist validation) → Implementation
+(parallel) → Integration → Validation → Sign-off. Uses `AskUserQuestion` at each
+phase transition. Delegates all file writes to sub-agents. Produces a summary report
+with verdict COMPLETE / NEEDS WORK / BLOCKED and handoffs to `/code-review`,
+`/balance-check`, and `/team-polish`.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings (Phase 1 through Phase 6 are all present)
+- [ ] Contains verdict keywords: COMPLETE, NEEDS WORK, BLOCKED
+- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents, orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end (references `/code-review`, `/balance-check`, `/team-polish`)
+- [ ] Error Recovery Protocol section is present with all four recovery steps
+- [ ] Uses `AskUserQuestion` at phase transitions for user approval before proceeding
+- [ ] Phase 3 is explicitly marked as parallel (gameplay-programmer, ai-programmer, technical-artist, sound-designer)
+- [ ] Phase 2 includes spawning the primary engine specialist (read from `.claude/docs/technical-preferences.md`)
+- [ ] Team Composition lists all seven roles (game-designer, gameplay-programmer, ai-programmer, technical-artist, sound-designer, engine specialist, qa-tester)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All agents succeed, full pipeline runs to completion
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists and is populated
+- Engine is configured in `.claude/docs/technical-preferences.md` (Engine Specialists section filled)
+- No existing GDD for the requested combat feature
+
+**Input:** `/team-combat parry and riposte system`
+
+**Expected behavior:**
+1. Phase 1 — game-designer spawned; produces `design/gdd/parry-riposte.md` covering all 8 required sections (overview, player fantasy, rules, formulas, edge cases, dependencies, tuning knobs, acceptance criteria); asks user to approve design doc
+2. Phase 2 — gameplay-programmer + ai-programmer spawned; produce architecture sketch with class structure, interfaces, and file list; then primary engine specialist is spawned to validate idioms; engine specialist output incorporated; `AskUserQuestion` presented with architecture options before Phase 3 begins
+3. Phase 3 — gameplay-programmer, ai-programmer, technical-artist, sound-designer spawned in parallel; all four return outputs before Phase 4 begins
+4. Phase 4 — integration wires together all Phase 3 outputs; tuning knobs verified as data-driven; `AskUserQuestion` confirms integration before Phase 5
+5. Phase 5 — qa-tester spawned; writes test cases from acceptance criteria; verifies edge cases; performance impact checked against budget
+6. Phase 6 — summary report produced: design COMPLETE, all team members COMPLETE, test cases listed, verdict: COMPLETE
+7. Next steps listed: `/code-review`, `/balance-check`, `/team-polish`
+
+**Assertions:**
+- [ ] `AskUserQuestion` called at each phase gate (at minimum before Phase 3 and before Phase 5)
+- [ ] Phase 3 agents launched simultaneously — no sequential dependency between gameplay-programmer, ai-programmer, technical-artist, sound-designer
+- [ ] Engine specialist runs in Phase 2 before Phase 3 begins (output incorporated into architecture)
+- [ ] All file writes delegated to sub-agents (orchestrator never calls Write/Edit directly)
+- [ ] Verdict COMPLETE present in final report
+- [ ] Next steps include `/code-review`, `/balance-check`, `/team-polish`
+- [ ] Design doc covers all 8 required GDD sections
+
+---
+
+### Case 2: Blocked Agent — One subagent returns BLOCKED mid-pipeline
+
+**Fixture:**
+- `design/gdd/parry-riposte.md` exists (Phase 1 already complete)
+- ai-programmer agent returns BLOCKED because no AI system architecture ADR exists (ADR status is Proposed)
+
+**Input:** `/team-combat parry and riposte system`
+
+**Expected behavior:**
+1. Phase 1 — design doc found; game-designer confirms it is valid; phase approved
+2. Phase 2 — gameplay-programmer completes architecture sketch; ai-programmer returns BLOCKED: "ADR for AI behavior system is Proposed — cannot implement until ADR is Accepted"
+3. Error Recovery Protocol triggered: "ai-programmer: BLOCKED — AI behavior ADR is Proposed"
+4. `AskUserQuestion` presented with options: (a) Skip ai-programmer and note the gap; (b) Retry with narrower scope; (c) Stop here and run `/architecture-decision` first
+5. If user chooses (a): Phase 3 proceeds with gameplay-programmer, technical-artist, sound-designer only; ai-programmer gap noted in partial report
+6. Final report produced: partial implementation documented, ai-programmer section marked BLOCKED, overall verdict: BLOCKED
+
+**Assertions:**
+- [ ] BLOCKED surface message appears before any dependent phase continues
+- [ ] `AskUserQuestion` offers at minimum three options: skip / retry / stop
+- [ ] Partial report produced — completed agents' work is not discarded
+- [ ] Overall verdict is BLOCKED (not COMPLETE) when any agent is unresolved
+- [ ] Blocked reason references the ADR and suggests `/architecture-decision`
+- [ ] Orchestrator does not silently proceed past the blocked dependency
+
+---
+
+### Case 3: No Argument — Clear usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-combat` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument provided
+2. Outputs usage message explaining the required argument (combat feature description)
+3. Provides an example invocation: `/team-combat [combat feature description]`
+4. Skill exits without spawning any subagents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any subagents when no argument is given
+- [ ] Usage message includes the argument-hint format from frontmatter
+- [ ] Error message includes at least one example of a valid invocation
+- [ ] No file reads beyond what is needed to detect the missing argument
+- [ ] Verdict is NOT shown (pipeline never runs)
+
+---
+
+### Case 4: Parallel Phase Validation — Phase 3 agents run simultaneously
+
+**Fixture:**
+- `design/gdd/parry-riposte.md` exists and is complete
+- Architecture sketch has been approved
+- Engine specialist has validated architecture
+
+**Input:** `/team-combat parry and riposte system` (resuming from Phase 2 complete)
+
+**Expected behavior:**
+1. Phase 3 begins after architecture approval
+2. All four Task calls — gameplay-programmer, ai-programmer, technical-artist, sound-designer — are issued before any result is awaited
+3. Skill waits for all four agents to complete before proceeding to Phase 4
+4. If any single agent completes early, skill does not begin Phase 4 until all four have returned
+
+**Assertions:**
+- [ ] Four Task calls issued in a single batch (no sequential waiting between them)
+- [ ] Phase 4 does not begin until all four Phase 3 agents have returned results
+- [ ] Skill does not pass one Phase 3 agent's output as input to another Phase 3 agent (they are independent)
+- [ ] All four Phase 3 agent results referenced in the Phase 4 integration step
+
+---
+
+### Case 5: Architecture Phase Engine Routing — Engine specialist receives correct context
+
+**Fixture:**
+- `.claude/docs/technical-preferences.md` has Engine Specialists section populated (e.g., Primary: godot-specialist)
+- Architecture sketch produced by gameplay-programmer is available
+- Engine version pinned in `docs/engine-reference/godot/VERSION.md`
+
+**Input:** `/team-combat parry and riposte system`
+
+**Expected behavior:**
+1. Phase 2 — gameplay-programmer produces architecture sketch
+2. Skill reads `.claude/docs/technical-preferences.md` Engine Specialists section to identify the primary engine specialist agent type
+3. Engine specialist is spawned with: the architecture sketch, the GDD path, the engine version from `VERSION.md`, and explicit instructions to check for deprecated APIs
+4. Engine specialist output (idiom notes, deprecated API warnings, native system recommendations) is returned to orchestrator
+5. Orchestrator incorporates engine notes into the architecture before presenting Phase 2 results to user
+6. `AskUserQuestion` includes engine specialist's notes alongside the architecture sketch
+
+**Assertions:**
+- [ ] Engine specialist agent type is read from `.claude/docs/technical-preferences.md` — not hardcoded
+- [ ] Engine specialist prompt includes the architecture sketch and GDD path
+- [ ] Engine specialist checks for deprecated APIs against the pinned engine version
+- [ ] Engine specialist output is incorporated before Phase 3 begins (not skipped or appended separately)
+- [ ] If no engine is configured, engine specialist step is skipped and a note is added to the report
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each phase transition — user approves before pipeline advances
+- [ ] All file writes delegated to sub-agents via Task — orchestrator does not call Write or Edit directly
+- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
+- [ ] Phase 3 agents launched in parallel per skill spec
+- [ ] Partial report always produced even when agents are BLOCKED
+- [ ] Verdict is one of COMPLETE / NEEDS WORK / BLOCKED
+- [ ] Next steps present at end of output: `/code-review`, `/balance-check`, `/team-polish`
+
+---
+
+## Coverage Notes
+
+- The NEEDS WORK verdict path (qa-tester finds failures in Phase 5) is not separately tested
+ here; it follows the same error recovery and partial report protocol as Case 2.
+- "Retry with narrower scope" error recovery option is listed in assertions but its full
+ recursive behavior (splitting via `/create-stories`) is covered by the `/create-stories` spec.
+- Phase 4 integration logic (wiring gameplay, AI, VFX, audio) is validated implicitly by
+ the Happy Path case; a dedicated integration test would require fixture code files.
+- Engine specialist unavailable (no engine configured) is partially covered in Case 5
+ assertions — a dedicated fixture for unconfigured engine state would strengthen coverage.
diff --git a/CCGS Skill Testing Framework/skills/team/team-level.md b/CCGS Skill Testing Framework/skills/team/team-level.md
new file mode 100644
index 0000000..2208d2d
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-level.md
@@ -0,0 +1,209 @@
+# Skill Test Spec: /team-level
+
+## Skill Summary
+
+Orchestrates the full level design team for a single level or area. Coordinates
+narrative-director, world-builder, level-designer, systems-designer, art-director,
+accessibility-specialist, and qa-tester through five sequential steps with one
+parallel phase (Step 4). Compiles all team outputs into a single level design
+document saved to `design/levels/[level-name].md`. Uses `AskUserQuestion` at each
+step transition. Delegates all file writes to sub-agents. Produces a summary report
+with verdict COMPLETE / BLOCKED and handoffs to `/design-review`, `/dev-story`,
+`/qa-plan`.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase/step headings (Step 1 through Step 5 are all present)
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents, orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end (references `/design-review`, `/dev-story`, `/qa-plan`)
+- [ ] Error Recovery Protocol section is present with all four recovery steps
+- [ ] Uses `AskUserQuestion` at step transitions for user approval before proceeding
+- [ ] Step 4 is explicitly marked as parallel (art-director and accessibility-specialist run simultaneously)
+- [ ] Context gathering reads: `design/gdd/game-concept.md`, `design/gdd/game-pillars.md`, `design/levels/`, `design/narrative/`, and relevant world-building docs
+- [ ] Team Composition lists all seven roles (narrative-director, world-builder, level-designer, systems-designer, art-director, accessibility-specialist, qa-tester)
+- [ ] accessibility-specialist output includes severity ratings (BLOCKING / RECOMMENDED / NICE TO HAVE)
+- [ ] Final level design document saved to `design/levels/[level-name].md`
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All team members produce outputs, document compiled and saved
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists and is populated
+- `design/gdd/game-pillars.md` exists
+- `design/levels/` directory exists (may contain other level docs)
+- `design/narrative/` directory exists with relevant narrative docs
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Context gathering — orchestrator reads game-concept.md, game-pillars.md, existing level docs in `design/levels/`, narrative docs in `design/narrative/`, and world-building docs for the forest region
+2. Step 1 — narrative-director spawned: defines narrative purpose, key characters, dialogue triggers, emotional arc; world-builder spawned: provides lore context, environmental storytelling opportunities, world rules; `AskUserQuestion` confirms Step 1 outputs before Step 2
+3. Step 2 — level-designer spawned: designs spatial layout (critical path, optional paths, secrets), pacing curve, encounters, puzzles, entry/exit points and connections to adjacent areas; `AskUserQuestion` confirms layout before Step 3
+4. Step 3 — systems-designer spawned: specifies enemy compositions, loot tables, difficulty balance, area-specific mechanics, resource distribution; `AskUserQuestion` confirms systems before Step 4
+5. Step 4 — art-director and accessibility-specialist spawned in parallel; art-director: visual theme, color palette, lighting, asset list, VFX needs; accessibility-specialist: navigation clarity, colorblind safety, cognitive load check — each concern rated BLOCKING / RECOMMENDED / NICE TO HAVE; `AskUserQuestion` presents both outputs before Step 5
+6. Step 5 — qa-tester spawned: test cases for critical path, boundary/edge cases (sequence breaks, softlocks), playtest checklist, acceptance criteria
+7. Orchestrator compiles all team outputs into level design document format; sub-agent asked "May I write to `design/levels/forest-dungeon.md`?"; file saved
+8. Summary report: area overview, encounter count, estimated asset list, narrative beats, cross-team dependencies, verdict: COMPLETE
+9. Next steps listed: `/design-review design/levels/forest-dungeon.md`, `/dev-story`, `/qa-plan`
+
+**Assertions:**
+- [ ] All five sources read during context gathering before any agent is spawned
+- [ ] narrative-director and world-builder both spawned in Step 1 (may be sequential or parallel — both must complete before Step 2)
+- [ ] `AskUserQuestion` called at each step gate (minimum: after Step 1, Step 2, Step 3, Step 4)
+- [ ] Step 4 agents (art-director, accessibility-specialist) launched simultaneously
+- [ ] All file writes delegated to sub-agents — orchestrator does not write directly
+- [ ] Level doc saved to `design/levels/forest-dungeon.md` (slugified from argument)
+- [ ] Verdict COMPLETE in final summary report
+- [ ] Next steps include `/design-review`, `/dev-story`, `/qa-plan`
+- [ ] Summary report includes: area overview, encounter count, estimated asset list, narrative beats
+
+---
+
+### Case 2: Blocked Agent (world-builder) — Partial report produced with gap noted
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists
+- World-building docs for the forest region do NOT exist
+- world-builder agent returns BLOCKED: "No world-building docs found for the forest region — cannot provide lore context"
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Context gathering completes; missing world-building docs noted
+2. Step 1 — narrative-director completes successfully; world-builder spawned and returns BLOCKED
+3. Error Recovery Protocol triggered: "world-builder: BLOCKED — no world-building docs for forest region"
+4. `AskUserQuestion` presented with options:
+ - (a) Skip world-builder and note the lore gap in the level doc
+ - (b) Retry with narrower scope (world-builder focuses only on what can be inferred from game-concept.md)
+ - (c) Stop here and create world-building docs first
+5. If user chooses (a): pipeline continues with Steps 2–5 using narrative-director context only; level doc compiled with a clearly marked gap section: "World-building context: NOT PROVIDED — see open dependency"
+6. Final report produced: partial outputs documented, world-builder section marked BLOCKED, overall verdict: BLOCKED
+
+**Assertions:**
+- [ ] BLOCKED surface message appears immediately when world-builder fails — before Step 2 begins without user input
+- [ ] `AskUserQuestion` offers at minimum three options (skip / retry / stop)
+- [ ] Partial report produced — narrative-director's completed work is not discarded
+- [ ] Level doc (if compiled) contains an explicit gap notation for the missing world-building context
+- [ ] Overall verdict is BLOCKED (not COMPLETE) when world-builder remains unresolved
+- [ ] Skill does NOT silently fabricate lore content to fill the gap
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-level` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument provided
+2. Outputs usage message explaining the required argument (level name or area to design)
+3. Provides example invocations: `/team-level tutorial`, `/team-level forest dungeon`, `/team-level final boss arena`
+4. Skill exits without reading any project files or spawning any subagents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any subagents when no argument is given
+- [ ] Usage message includes the argument-hint format from frontmatter
+- [ ] At least one example of a valid invocation is shown
+- [ ] No GDD or level files read before failing
+- [ ] Verdict is NOT shown (pipeline never starts)
+
+---
+
+### Case 4: Accessibility Review Gate — Blocking concern surfaces before sign-off
+
+**Fixture:**
+- Steps 1–3 complete successfully
+- `design/accessibility-requirements.md` committed tier: Enhanced
+- accessibility-specialist (Step 4, parallel) flags a BLOCKING concern: the critical path through the forest dungeon requires players to distinguish between two environmental hazards (toxic pools vs. shallow water) using color alone — no shape, icon, or audio cue differentiates them
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Steps 1–3 complete; Step 4 parallel phase begins
+2. accessibility-specialist returns: BLOCKING concern — "Critical path hazard distinction relies on color only (toxic pools vs. shallow water). Shape, icon, or audio cue required per Enhanced accessibility tier."
+3. art-director returns Step 4 output (complete)
+4. Skill presents both Step 4 results via `AskUserQuestion` — BLOCKING concern highlighted prominently
+5. `AskUserQuestion` offers:
+ - (a) Return to level-designer + art-director to redesign hazard visual/audio language before Step 5
+ - (b) Document as a known accessibility gap and proceed to Step 5 with the concern logged
+6. Skill does NOT silently proceed past the BLOCKING concern
+7. If user chooses (a): level-designer and art-director revision spawned; re-run Step 4 accessibility check
+8. Final report includes BLOCKING concern and its resolution status regardless of user choice
+
+**Assertions:**
+- [ ] BLOCKING accessibility concern is not treated as advisory — it is surfaced as a blocker
+- [ ] `AskUserQuestion` presents the specific concern text (not just "accessibility issue found")
+- [ ] Step 5 (qa-tester) does NOT begin without user acknowledging the BLOCKING concern
+- [ ] Revision path offered: level-designer + art-director can be sent back before proceeding
+- [ ] Final report includes the accessibility concern and its resolution status
+- [ ] art-director's completed output is NOT discarded when accessibility-specialist blocks
+
+---
+
+### Case 5: Circular Level Reference — Adjacent area dependency flagged
+
+**Fixture:**
+- Steps 1–3 in progress
+- level-designer (Step 2) produces a layout that specifies entry/exit points connecting to "the crystal caves" (an adjacent area)
+- `design/levels/crystal-caves.md` does NOT exist — the crystal caves area has not been designed yet
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Step 2 — level-designer produces layout including: "West exit connects to crystal-caves entry point A"
+2. Orchestrator (or level-designer subagent) checks `design/levels/` for `crystal-caves.md`; file not found
+3. Dependency gap surfaced: "Level references crystal-caves as an adjacent area but `design/levels/crystal-caves.md` does not exist"
+4. `AskUserQuestion` presented with options:
+ - (a) Proceed with a placeholder reference — note the dependency in the level doc as UNRESOLVED
+ - (b) Pause and run `/team-level crystal caves` first to establish that area
+5. Skill does NOT invent crystal caves content to satisfy the reference
+6. If user chooses (a): level doc compiled with the west exit marked "→ crystal-caves (UNRESOLVED — area not yet designed)"; flagged in the open dependencies section of the summary report
+7. Final report includes open cross-level dependencies section
+
+**Assertions:**
+- [ ] Skill detects the missing adjacent area by checking `design/levels/` — does not assume it will be created later
+- [ ] Skill does NOT fabricate crystal caves content (lore, layout, connections) to resolve the reference
+- [ ] `AskUserQuestion` offers a "design crystal caves first" option referencing `/team-level`
+- [ ] If user proceeds with placeholder, level doc explicitly marks the west exit as UNRESOLVED
+- [ ] Summary report includes an open cross-level dependencies section listing unresolved references
+- [ ] Circular or forward references do not cause the skill to loop or crash
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each step transition — user approves before pipeline advances
+- [ ] All file writes delegated to sub-agents via Task — orchestrator does not call Write or Edit directly
+- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
+- [ ] Step 4 agents (art-director, accessibility-specialist) launched in parallel per skill spec
+- [ ] Partial report always produced even when agents are BLOCKED
+- [ ] Accessibility BLOCKING concerns surface before sign-off and require explicit user acknowledgment
+- [ ] Verdict is one of COMPLETE / BLOCKED
+- [ ] Next steps present at end: `/design-review`, `/dev-story`, `/qa-plan`
+
+---
+
+## Coverage Notes
+
+- narrative-director and world-builder in Step 1 may be sequential or parallel — the skill spec
+ spawns both but does not mandate simultaneous launch; coverage of parallel Step 1 would require
+ an explicit timing assertion fixture.
+- The "Retry with narrower scope" option in the blocked world-builder case (Case 2) — the
+ retry behavior itself is not tested in depth; its full path is analogous to the blocked agent
+ pattern covered in Case 2 and in other team-* specs.
+- systems-designer (Step 3) block scenarios are not separately tested; the same Error Recovery
+ Protocol applies and the pattern is validated by Case 2.
+- Step 4 parallel ordering (art-director completing before or after accessibility-specialist)
+ does not affect outcomes — both must return before Step 5 regardless of order.
+- The level doc slug convention (argument → filename) is implicitly tested by Case 1
+ (`forest dungeon` → `forest-dungeon.md`); multi-word slugification edge cases (special
+ characters, very long names) are not covered.
diff --git a/CCGS Skill Testing Framework/skills/team/team-live-ops.md b/CCGS Skill Testing Framework/skills/team/team-live-ops.md
new file mode 100644
index 0000000..9463e15
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-live-ops.md
@@ -0,0 +1,178 @@
+# Skill Test Spec: /team-live-ops
+
+## Skill Summary
+
+Orchestrates the live-ops team through a 7-phase planning pipeline to produce a
+season or event plan. Coordinates live-ops-designer, economy-designer,
+analytics-engineer, community-manager, narrative-director, and writer. Phases 3
+and 4 (economy design and analytics) run simultaneously. Ends with a consolidated
+season plan requiring user approval before handoff to production.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" language in the File Write Protocol section (delegated to sub-agents)
+- [ ] Has a File Write Protocol section stating that the orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end referencing `/design-review`, `/sprint-plan`, and `/team-release`
+- [ ] Uses `AskUserQuestion` at phase transitions to capture user approval before proceeding
+- [ ] States explicitly that Phases 3 and 4 can run simultaneously (parallel spawning)
+- [ ] Error recovery section present (or implied through BLOCKED handling)
+- [ ] Output documents section specifies paths under `design/live-ops/seasons/`
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All 7 phases complete, season plan produced
+
+**Fixture:**
+- `design/live-ops/economy-rules.md` exists with current economy configuration
+- `design/live-ops/ethics-policy.md` exists with the project ethics policy
+- Game concept document exists at its standard path
+- No existing season documents for the new season name being planned
+
+**Input:** `/team-live-ops "Season 2: The Frozen Wastes"`
+
+**Expected behavior:**
+1. Phase 1: Spawns `live-ops-designer` via Task; receives season brief with scope, content list, and retention mechanic; presents to user
+2. AskUserQuestion: user approves Phase 1 output before Phase 2 begins
+3. Phase 2: Spawns `narrative-director` via Task; reads the Phase 1 season brief; produces narrative framing document (theme, story hook, lore connections); presents to user
+4. Phase 3 and 4 (parallel): Spawns `economy-designer` and `analytics-engineer` simultaneously via two Task calls before waiting for either result; economy-designer reads `design/live-ops/economy-rules.md`
+5. Phase 5: Spawns `narrative-director` and `writer` in parallel to produce in-game narrative text and player-facing copy; both read Phase 2 narrative framing doc
+6. Phase 6: Spawns `community-manager` via Task; reads season brief, economy design, and narrative framing; produces communication calendar with draft copy
+7. Phase 7: Collects all phase outputs; presents consolidated season plan summary including economy health check, analytics readiness, ethics review, and open questions
+8. AskUserQuestion: user approves the full season plan
+9. Sub-agents ask "May I write to `design/live-ops/seasons/S2_The_Frozen_Wastes.md`?", `...analytics.md`, and `...comms.md` before writing
+10. Verdict: COMPLETE — season plan produced and handed off for production
+
+**Assertions:**
+- [ ] All 7 phases execute in order; Phase 3 and 4 are issued as parallel Task calls
+- [ ] Phase 7 consolidated summary includes all six sections (season brief, narrative framing, economy design, analytics plan, content inventory, communication calendar)
+- [ ] Ethics review section in Phase 7 explicitly references `design/live-ops/ethics-policy.md`
+- [ ] Three output documents written to `design/live-ops/seasons/` with correct naming convention
+- [ ] File writes are delegated to sub-agents — orchestrator does not write directly
+- [ ] Verdict: COMPLETE appears in final output
+- [ ] Next steps reference `/design-review`, `/sprint-plan`, and `/team-release`
+
+---
+
+### Case 2: Ethics Violation Found — Reward element violates ethics policy
+
+**Fixture:**
+- All standard live-ops fixtures present (economy-rules.md, ethics-policy.md)
+- `design/live-ops/ethics-policy.md` explicitly prohibits loot boxes targeting players under 18
+- economy-designer (Phase 3) proposes a "Mystery Chest" mechanic with randomized premium rewards and no pity timer
+
+**Input:** `/team-live-ops "Season 3: Shadow Tournament"`
+
+**Expected behavior:**
+1. Phases 1–4 proceed normally; economy-designer proposes Mystery Chest mechanic
+2. Phase 7: Orchestrator reviews Phase 3 output against ethics policy; identifies Mystery Chest as a violation of the "no untransparent random premium rewards" rule in the ethics policy
+3. Ethics review section of the Phase 7 summary flags the violation explicitly: "ETHICS FLAG: Mystery Chest mechanic in Phase 3 economy design violates [policy rule]. Approval is blocked until this is resolved."
+4. AskUserQuestion presented with resolution options before season plan approval is offered
+5. Skill does NOT issue a COMPLETE verdict or write output documents until the ethics violation is resolved or explicitly waived by the user
+
+**Assertions:**
+- [ ] Phase 7 ethics review section explicitly names the violating element and the policy rule it breaks
+- [ ] Skill does not auto-approve the season plan when an ethics violation is present
+- [ ] AskUserQuestion is used to surface the violation and offer resolution options (revise economy design, override with documented rationale, cancel)
+- [ ] Output documents are NOT written while the violation is unresolved
+- [ ] If user chooses to revise: skill re-spawns economy-designer to produce a corrected design before returning to Phase 7 review
+- [ ] Verdict: COMPLETE is only issued after the ethics flag is cleared
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-live-ops` (no argument)
+
+**Expected behavior:**
+1. Phase 1: No argument detected
+2. Outputs: "Usage: `/team-live-ops [season name or event description]` — Provide the name or description of the season or live event to plan."
+3. Skill exits immediately without spawning any subagents
+
+**Assertions:**
+- [ ] Skill does NOT guess a season name or fabricate a scope
+- [ ] Error message includes the correct usage format with the argument-hint
+- [ ] No Task calls are issued before the argument check fails
+- [ ] No files are read or written
+
+---
+
+### Case 4: Parallel Phase Validation — Phases 3 and 4 run simultaneously
+
+**Fixture:**
+- All standard live-ops fixtures present
+- Phase 1 (season brief) and Phase 2 (narrative framing) already approved
+- Phase 3 (economy-designer) and Phase 4 (analytics-engineer) inputs are independent of each other
+
+**Input:** `/team-live-ops "Season 1: The First Thaw"` (observed at Phase 3/4 transition)
+
+**Expected behavior:**
+1. After Phase 2 is approved by the user, the orchestrator issues both Task calls (economy-designer and analytics-engineer) before awaiting either result
+2. Both agents receive the season brief as context; analytics-engineer does NOT wait for economy-designer output to begin
+3. Economy-designer output and analytics-engineer output are collected together before Phase 5 begins
+4. If one of the two parallel agents blocks, the other continues; a partial result is reported
+
+**Assertions:**
+- [ ] Both Task calls for Phase 3 and Phase 4 are issued before either result is awaited — they are not sequential
+- [ ] Analytics-engineer prompt does NOT include economy-designer output as a required input (the inputs are independent)
+- [ ] If economy-designer blocks but analytics-engineer succeeds, analytics output is preserved and the block is surfaced via AskUserQuestion
+- [ ] Phase 5 does not begin until BOTH Phase 3 and Phase 4 results are collected
+- [ ] Skill documentation explicitly states "Phases 3 and 4 can run simultaneously"
+
+---
+
+### Case 5: Missing Ethics Policy — `design/live-ops/ethics-policy.md` does not exist
+
+**Fixture:**
+- `design/live-ops/economy-rules.md` exists
+- `design/live-ops/ethics-policy.md` does NOT exist
+- All other fixtures are present
+
+**Input:** `/team-live-ops "Season 4: Desert Heat"`
+
+**Expected behavior:**
+1. Phases 1–4 proceed; economy-designer and analytics-engineer are given the ethics policy path but it is absent
+2. Phase 7: Orchestrator attempts to run ethics review; detects that `design/live-ops/ethics-policy.md` is missing
+3. Phase 7 summary includes a gap flag: "ETHICS REVIEW SKIPPED: `design/live-ops/ethics-policy.md` not found. Economy design was not reviewed against an ethics policy. Recommend creating one before production begins."
+4. Skill still completes the season plan and reaches COMPLETE verdict, but the gap is prominently flagged in the output and in the season design document
+5. Next steps include a recommendation to create the ethics policy document
+
+**Assertions:**
+- [ ] Skill does NOT error out when the ethics policy file is missing
+- [ ] Skill does NOT fabricate ethics policy rules in the absence of the file
+- [ ] Phase 7 summary explicitly notes that ethics review was skipped and why
+- [ ] Verdict: COMPLETE is still reachable despite the missing file
+- [ ] Gap flag appears in the season design output document (not just in conversation)
+- [ ] Next steps recommend creating `design/live-ops/ethics-policy.md`
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at every phase transition — user approves before the next phase begins
+- [ ] Phases 3 and 4 are always spawned in parallel, not sequentially
+- [ ] File Write Protocol: orchestrator never calls Write/Edit directly — all writes are delegated to sub-agents
+- [ ] Each output document gets its own "May I write to [path]?" ask from the relevant sub-agent
+- [ ] Ethics review in Phase 7 always references the ethics policy file path explicitly
+- [ ] Error recovery: any BLOCKED agent is surfaced immediately with AskUserQuestion options (skip / retry / stop)
+- [ ] Partial reports are produced if any phase blocks — work is never discarded
+- [ ] Verdict: COMPLETE only after user approves the consolidated season plan; BLOCKED if any unresolved ethics violation exists
+- [ ] Next steps always include `/design-review`, `/sprint-plan`, and `/team-release`
+
+---
+
+## Coverage Notes
+
+- Phase 5 parallel spawning (narrative-director + writer) follows the same pattern as Phases 3/4 but is not separately tested here — it uses the same parallel Task protocol validated in Case 4.
+- The "economy-rules.md absent" edge case is not separately tested — it would surface as a BLOCKED result from economy-designer and follow the standard error recovery path tested implicitly in Case 4.
+- The full content writing pipeline (Phase 5 output validation) is validated implicitly by the Case 1 happy path consolidated summary check.
+- Community manager communication calendar format (pre-launch, launch day, mid-season, final week) is validated implicitly by Case 1; no separate edge case is needed.
diff --git a/CCGS Skill Testing Framework/skills/team/team-narrative.md b/CCGS Skill Testing Framework/skills/team/team-narrative.md
new file mode 100644
index 0000000..1892785
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-narrative.md
@@ -0,0 +1,209 @@
+# Skill Test Spec: /team-narrative
+
+## Skill Summary
+
+Orchestrates the narrative team through a five-phase pipeline: narrative direction
+(narrative-director) → world foundation + dialogue drafting (world-builder and writer
+in parallel) → level narrative integration (level-designer) → consistency review
+(narrative-director) → polish + localization compliance (writer, localization-lead,
+and world-builder in parallel). Uses `AskUserQuestion` at each phase transition to
+present proposals as selectable options. Produces a narrative summary report and
+delivers narrative documents via subagents that each enforce the "May I write?"
+protocol. Verdict is COMPLETE when all phases succeed, or BLOCKED when a dependency
+is unresolved.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "File Write Protocol" section
+- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
+- [ ] Sub-agents enforce "May I write to [path]?" before any write
+- [ ] Has a next-step handoff at the end (references `/design-review`, `/localize extract`, `/dev-story`)
+- [ ] Error Recovery Protocol section is present
+- [ ] `AskUserQuestion` is used at phase transitions before proceeding
+- [ ] Phase 2 explicitly spawns world-builder and writer in parallel
+- [ ] Phase 5 explicitly spawns writer, localization-lead, and world-builder in parallel
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All five phases complete, narrative doc delivered
+
+**Fixture:**
+- A game concept and GDD exist for the target feature (e.g., `design/gdd/faction-intro.md`)
+- Character voice profiles exist (e.g., `design/narrative/characters/`)
+- Existing lore entries exist for cross-reference (e.g., `design/narrative/lore/`)
+- No lore contradictions exist between existing entries and the new content
+
+**Input:** `/team-narrative faction introduction cutscene for the Ironveil faction`
+
+**Expected behavior:**
+1. Phase 1: narrative-director is spawned; outputs a narrative brief defining the story beat, characters involved, emotional tone, and lore dependencies
+2. `AskUserQuestion` presents the narrative brief; user approves before Phase 2 begins
+3. Phase 2: world-builder and writer are spawned in parallel; world-builder produces lore entries for the Ironveil faction; writer drafts dialogue lines using character voice profiles
+4. `AskUserQuestion` presents world foundation and dialogue drafts; user approves before Phase 3 begins
+5. Phase 3: level-designer is spawned; produces environmental storytelling layout, trigger placement, and pacing plan
+6. `AskUserQuestion` presents level narrative plan; user approves before Phase 4 begins
+7. Phase 4: narrative-director reviews all dialogue against voice profiles, verifies lore consistency, confirms pacing; approves or flags issues
+8. `AskUserQuestion` presents review results; user approves before Phase 5 begins
+9. Phase 5: writer, localization-lead, and world-builder are spawned in parallel; writer performs final self-review; localization-lead validates i18n compliance; world-builder finalizes canon levels
+10. Final summary report is presented; subagent asks "May I write the narrative document to [path]?" before writing
+11. Verdict: COMPLETE
+
+**Assertions:**
+- [ ] narrative-director is spawned in Phase 1 before any other agents
+- [ ] `AskUserQuestion` appears after Phase 1 output and before Phase 2 launch
+- [ ] world-builder and writer Task calls are issued simultaneously in Phase 2 (not sequentially)
+- [ ] level-designer is not launched until Phase 2 `AskUserQuestion` is approved
+- [ ] narrative-director is re-spawned in Phase 4 for consistency review
+- [ ] Phase 5 spawns all three agents (writer, localization-lead, world-builder) simultaneously
+- [ ] Summary report includes: narrative brief status, lore entries created/updated, dialogue lines written, level narrative integration points, consistency review results
+- [ ] No files are written by the orchestrator directly
+- [ ] Verdict is COMPLETE after delivery
+
+---
+
+### Case 2: Lore Contradiction Found — world-builder finds conflict before writer proceeds
+
+**Fixture:**
+- Existing lore entry at `design/narrative/lore/ironveil-history.md` states the Ironveil faction was founded 200 years ago
+- The new narrative brief (from Phase 1) states the Ironveil were founded 50 years ago
+- The writer has been spawned in parallel with the world-builder in Phase 2
+
+**Input:** `/team-narrative ironveil faction introduction cutscene`
+
+**Expected behavior:**
+1. Phases 1–2 begin normally
+2. Phase 2 world-builder detects a factual contradiction between the narrative brief and existing lore: founding date conflict
+3. world-builder returns BLOCKED with reason: "Lore contradiction found — founding date conflicts with `design/narrative/lore/ironveil-history.md`"
+4. Orchestrator surfaces the contradiction immediately: "world-builder: BLOCKED — Lore contradiction: founding date in narrative brief (50 years ago) conflicts with existing canon (200 years ago in `ironveil-history.md`)"
+5. Orchestrator assesses dependency: the writer's dialogue depends on canon lore — the writer's draft cannot be finalized without resolving the contradiction
+6. `AskUserQuestion` presents options:
+ - Revise the narrative brief to match existing canon (200 years ago)
+ - Update the existing lore entry to reflect the new canon (50 years ago)
+ - Stop here and resolve the contradiction in the lore docs first
+7. Writer output is preserved but flagged as pending canon resolution — work is not discarded
+8. Orchestrator does NOT proceed to Phase 3 until the contradiction is resolved or user explicitly chooses to skip
+
+**Assertions:**
+- [ ] Contradiction is surfaced before Phase 3 begins
+- [ ] Orchestrator does not silently resolve the contradiction by picking one version
+- [ ] `AskUserQuestion` presents at least 3 options including "stop and resolve first"
+- [ ] Writer's draft output is preserved in the partial report, not discarded
+- [ ] Phase 3 (level-designer) is not launched until the user resolves the contradiction
+- [ ] Verdict is BLOCKED (not COMPLETE) if the user stops to resolve the contradiction
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-narrative` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs usage guidance: e.g., "Usage: `/team-narrative [narrative content description]` — describe the story content, scene, or narrative area to work on (e.g., `boss encounter cutscene`, `faction intro dialogue`, `tutorial narrative`)"
+3. Skill exits without spawning any agents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any agents when no argument is provided
+- [ ] Usage message includes the correct invocation format with an argument example
+- [ ] Skill does NOT attempt to guess or infer a narrative topic from project files
+- [ ] No `AskUserQuestion` is used — output is direct guidance
+
+---
+
+### Case 4: Localization Compliance — localization-lead flags a non-translatable string
+
+**Fixture:**
+- Phases 1–4 complete successfully
+- Phase 5 begins; writer and world-builder complete without issues
+- localization-lead finds a dialogue line that uses a hardcoded formatted date string (e.g., `"On March 12th, Year 3"`) that cannot survive locale-specific translation without a locale-aware formatter
+
+**Input:** `/team-narrative ironveil faction introduction cutscene` (Phase 5 scenario)
+
+**Expected behavior:**
+1. Phase 5 spawns writer, localization-lead, and world-builder in parallel
+2. localization-lead completes its review and flags: "String key `dialogue.ironveil.intro.003` contains a hardcoded date format (`March 12th, Year 3`) that will not localize correctly — requires a locale-aware date placeholder"
+3. Orchestrator surfaces the localization blocker in the summary report
+4. The localization issue is labeled as BLOCKING in the final report (not advisory)
+5. `AskUserQuestion` presents options:
+ - Fix the string now (writer revises the line)
+ - Note the gap and deliver the narrative doc with the issue flagged
+ - Stop and resolve before finalizing
+6. If the user chooses to proceed with the issue flagged, verdict is COMPLETE with noted localization debt; if user stops, verdict is BLOCKED
+
+**Assertions:**
+- [ ] localization-lead is spawned in Phase 5 simultaneously with writer and world-builder
+- [ ] Hardcoded date format is identified as a localization blocker (not silently passed)
+- [ ] The specific string key and reason are included in the issue report
+- [ ] `AskUserQuestion` offers the option to fix now vs. flag and proceed
+- [ ] Verdict notes the localization debt if the user proceeds without fixing
+- [ ] Skill does NOT automatically rewrite the offending line without user approval
+
+---
+
+### Case 5: Writer Blocked — Missing character voice profiles
+
+**Fixture:**
+- Phase 1 narrative-director produces a narrative brief referencing two characters: Commander Varek and Advisor Selene
+- No character voice profiles exist in `design/narrative/characters/` for either character
+- Phase 2 begins; world-builder proceeds normally
+
+**Input:** `/team-narrative ironveil surrender negotiation scene`
+
+**Expected behavior:**
+1. Phase 1 completes; narrative brief lists Commander Varek and Advisor Selene as characters
+2. Phase 2: writer is spawned in parallel with world-builder
+3. writer returns BLOCKED: "Cannot produce dialogue — no voice profiles found for Commander Varek or Advisor Selene in `design/narrative/characters/`. Voice profiles required to match character tone and speech patterns."
+4. Orchestrator surfaces the blocker immediately: "writer: BLOCKED — Missing prerequisite: character voice profiles for Commander Varek and Advisor Selene"
+5. world-builder output is preserved; partial report is produced with lore entries
+6. `AskUserQuestion` presents options:
+ - Create voice profiles first (redirects to the narrative-director or design workflow)
+ - Provide minimal voice direction inline and retry the writer with that context
+ - Stop here and create voice profiles before proceeding
+7. Orchestrator does NOT proceed to Phase 3 (level-designer) without writer output
+
+**Assertions:**
+- [ ] Writer block is surfaced before Phase 3 begins
+- [ ] world-builder's completed lore output is preserved in the partial report
+- [ ] Missing prerequisite (voice profiles) is named specifically (character names and expected file path)
+- [ ] `AskUserQuestion` offers at least one option to resolve the missing prerequisite
+- [ ] Orchestrator does not fabricate voice profiles or invent character voices
+- [ ] Phase 3 is not launched while writer is BLOCKED without explicit user authorization
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
+- [ ] Parallel spawning: Phase 2 (world-builder + writer) and Phase 5 (writer + localization-lead + world-builder) issue all Task calls before waiting for results
+- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
+- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
+- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
+- [ ] A partial report is always produced when some agents complete and others block
+- [ ] Verdict is exactly COMPLETE or BLOCKED — no other verdict values used
+- [ ] Next Steps handoff references `/design-review`, `/localize extract`, and `/dev-story`
+
+---
+
+## Coverage Notes
+
+- Phase 3 (level-designer) and Phase 4 (narrative-director review) happy-path behavior are
+ validated implicitly by Case 1. Separate edge cases are not needed for these phases as
+ their failure modes follow the standard Error Recovery Protocol.
+- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
+ Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+ + partial-report pattern validated in Cases 2 and 5.
+- Localization concerns that are advisory (e.g., German/Finnish +30% expansion warnings)
+ vs. blocking (hardcoded formats) are distinguished in Case 4; advisory-only scenarios
+ follow the same pattern but do not change the verdict.
+- The writer's "all lines under 120 characters" and "string keys not raw strings" checks
+ in Phase 5 are covered implicitly by Case 4's localization compliance scenario.
diff --git a/CCGS Skill Testing Framework/skills/team/team-polish.md b/CCGS Skill Testing Framework/skills/team/team-polish.md
new file mode 100644
index 0000000..6b38d4b
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-polish.md
@@ -0,0 +1,218 @@
+# Skill Test Spec: /team-polish
+
+## Skill Summary
+
+Orchestrates the polish team through a six-phase pipeline: performance assessment
+(performance-analyst) → optimization (performance-analyst, optionally with
+engine-programmer when engine-level root causes are found) → visual polish
+(technical-artist, parallel with Phase 2) → audio polish (sound-designer, parallel
+with Phase 2) → hardening (qa-tester) → sign-off (orchestrator collects all results
+and issues READY FOR RELEASE or NEEDS MORE WORK). Uses `AskUserQuestion` at each
+phase transition. Engine-programmer is spawned conditionally only when Phase 1
+identifies engine-level root causes. Verdict is READY FOR RELEASE or NEEDS MORE WORK.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: READY FOR RELEASE, NEEDS MORE WORK
+- [ ] Contains "File Write Protocol" section
+- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
+- [ ] Sub-agents enforce "May I write to [path]?" before any write
+- [ ] Has a next-step handoff at the end (references `/release-checklist`, `/sprint-plan update`, `/gate-check`)
+- [ ] Error Recovery Protocol section is present
+- [ ] `AskUserQuestion` is used at phase transitions before proceeding
+- [ ] Phase 3 (visual polish) and Phase 4 (audio polish) are explicitly run in parallel with Phase 2
+- [ ] engine-programmer is conditionally spawned in Phase 2 only when Phase 1 identifies engine-level root causes
+- [ ] Phase 6 sign-off compares metrics against budgets before issuing verdict
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Full pipeline completes, READY FOR RELEASE verdict
+
+**Fixture:**
+- Feature exists and is functionally complete (e.g., `combat` system)
+- Performance budgets are defined in technical-preferences.md (e.g., target 60fps, 16ms frame budget)
+- No frame budget violations exist before polishing begins
+- No audio events are missing; VFX assets are complete
+- No regressions are introduced by polish changes
+
+**Input:** `/team-polish combat`
+
+**Expected behavior:**
+1. Phase 1: performance-analyst is spawned; profiles the combat system, measures frame budget, checks memory usage; output: performance report showing all metrics within budget, no violations
+2. `AskUserQuestion` presents performance report; user approves before Phases 2, 3, and 4 begin
+3. Phase 2: performance-analyst applies minor optimizations (e.g., draw call batching); no engine-programmer needed (no engine-level root causes identified)
+4. Phases 3 and 4 are launched in parallel alongside Phase 2:
+ - Phase 3: technical-artist reviews VFX for quality, optimizes particle systems, adds screen shake and visual juice
+ - Phase 4: sound-designer reviews audio events for completeness, checks mix levels, adds ambient audio layers
+5. All three parallel phases complete; `AskUserQuestion` presents results; user approves before Phase 5 begins
+6. Phase 5: qa-tester runs edge case tests, soak tests, stress tests, and regression tests; all pass
+7. `AskUserQuestion` presents test results; user approves before Phase 6
+8. Phase 6: orchestrator collects all results; compares before/after performance metrics against budgets; all metrics pass
+9. Subagent asks "May I write the polish report to `production/qa/evidence/polish-combat-[date].md`?" before writing
+10. Verdict: READY FOR RELEASE
+
+**Assertions:**
+- [ ] performance-analyst is spawned first in Phase 1 before any other agents
+- [ ] `AskUserQuestion` appears after Phase 1 output and before Phases 2/3/4 launch
+- [ ] Phases 3 and 4 Task calls are issued at the same time as Phase 2 (not after Phase 2 completes)
+- [ ] engine-programmer is NOT spawned when Phase 1 finds no engine-level root causes
+- [ ] qa-tester (Phase 5) is not launched until the parallel phases complete and user approves
+- [ ] Phase 6 verdict is based on comparison of metrics against defined budgets
+- [ ] Summary report includes: before/after performance metrics, visual polish changes, audio polish changes, test results
+- [ ] No files are written by the orchestrator directly
+- [ ] Verdict is READY FOR RELEASE
+
+---
+
+### Case 2: Performance Blocker — Frame budget violation cannot be fully resolved
+
+**Fixture:**
+- Feature being polished: `particle-storm` VFX system
+- Phase 1 identifies a frame budget violation: particle-storm costs 12ms on target hardware (budget is 6ms for this system)
+- Phase 2 performance-analyst applies optimizations reducing cost to 9ms — still over the 6ms budget
+- Phase 2 cannot fully resolve the violation without a fundamental design change
+
+**Input:** `/team-polish particle-storm`
+
+**Expected behavior:**
+1. Phase 1: performance-analyst identifies the 12ms frame cost vs. 6ms budget; reports "FRAME BUDGET VIOLATION: particle-storm costs 12ms, budget is 6ms"
+2. `AskUserQuestion` presents the violation; user chooses to proceed with optimization attempt
+3. Phase 2: performance-analyst applies optimizations; achieves 9ms — reduced but still over budget; reports "Optimization reduced cost to 9ms (was 12ms) — 3ms over budget. No further gains achievable without design changes."
+4. Phases 3 and 4 run in parallel with Phase 2 (visual and audio polish)
+5. Phase 5: qa-tester runs regression and edge case tests; all pass
+6. Phase 6: orchestrator collects results; frame budget violation (9ms vs 6ms budget) remains unresolved
+7. Verdict: NEEDS MORE WORK
+8. Report lists the specific unresolved issue: "particle-storm frame cost (9ms) exceeds budget (6ms) by 3ms — requires design scope reduction or budget renegotiation"
+9. Next Steps: schedule the remaining issue in `/sprint-plan update`; re-run `/team-polish` after fix
+
+**Assertions:**
+- [ ] Frame budget violation is flagged in Phase 1 with specific numbers (actual vs. budget)
+- [ ] Phase 2 reports the post-optimization metric explicitly (9ms achieved, 3ms still over)
+- [ ] Verdict is NEEDS MORE WORK (not READY FOR RELEASE) when a budget violation remains
+- [ ] The specific unresolved issue is listed by name with the remaining gap quantified
+- [ ] Next Steps references `/sprint-plan update` for scheduling the remaining fix
+- [ ] Phases 3 and 4 still run (polish work is not abandoned due to a Phase 2 partial resolution)
+- [ ] Phase 5 qa-tester still runs (regression testing is independent of the performance outcome)
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-polish` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs usage guidance: e.g., "Usage: `/team-polish [feature or area]` — specify the feature or area to polish (e.g., `combat`, `main menu`, `inventory system`, `level-1`)"
+3. Skill exits without spawning any agents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any agents when no argument is provided
+- [ ] Usage message includes the correct invocation format with argument examples
+- [ ] Skill does NOT attempt to guess a feature from project files
+- [ ] No `AskUserQuestion` is used — output is direct guidance
+
+---
+
+### Case 4: Engine-Level Bottleneck — engine-programmer spawned conditionally in Phase 2
+
+**Fixture:**
+- Feature being polished: `open-world` environment streaming
+- Phase 1 identifies a performance bottleneck with a root cause in the rendering pipeline: "draw call overhead is caused by the engine's scene tree traversal in the spatial indexer — this is an engine-level issue, not a game code issue"
+- Performance budgets are defined; the rendering overhead exceeds target frame budget
+
+**Input:** `/team-polish open-world`
+
+**Expected behavior:**
+1. Phase 1: performance-analyst profiles the environment; identifies frame budget violation; root cause analysis points to engine-level rendering pipeline (spatial indexer traversal overhead)
+2. Phase 1 output explicitly classifies the root cause as engine-level
+3. `AskUserQuestion` presents the performance report including the engine-level root cause; user approves before Phase 2
+4. Phase 2: performance-analyst is spawned for game-code-level optimizations AND engine-programmer is spawned in parallel for the engine-level rendering fix
+5. Phases 3 and 4 also run in parallel with Phase 2 (visual and audio polish)
+6. engine-programmer addresses the spatial indexer traversal; provides profiler validation showing the fix reduces overhead
+7. Phase 5: qa-tester runs regression tests including tests for the engine-level fix
+8. Phase 6: orchestrator collects all results; if metrics are now within budget, verdict is READY FOR RELEASE; if not, NEEDS MORE WORK
+
+**Assertions:**
+- [ ] engine-programmer is NOT spawned in Phase 2 unless Phase 1 explicitly identifies an engine-level root cause
+- [ ] engine-programmer is spawned in Phase 2 when Phase 1 identifies an engine-level root cause
+- [ ] engine-programmer and performance-analyst Task calls in Phase 2 are issued simultaneously (not sequentially)
+- [ ] Phases 3 and 4 also run in parallel with Phase 2 (not deferred until Phase 2 completes)
+- [ ] engine-programmer's output includes profiler validation of the fix
+- [ ] qa-tester in Phase 5 runs regression tests that cover the engine-level change
+- [ ] Verdict correctly reflects whether all metrics including the engine fix now meet budgets
+
+---
+
+### Case 5: Regression Found — Polish change broke an existing feature
+
+**Fixture:**
+- Feature being polished: `inventory-ui`
+- Phases 1–4 complete successfully; performance and polish changes are applied
+- Phase 5: qa-tester runs regression tests and finds that a shader optimization applied in Phase 3 broke the item highlight glow effect on hover — an existing feature that was working before the polish pass
+
+**Input:** `/team-polish inventory-ui` (Phase 5 scenario)
+
+**Expected behavior:**
+1. Phases 1–4 complete; polish changes include a shader optimization from technical-artist
+2. Phase 5: qa-tester runs regression tests and detects "Item highlight glow on hover no longer renders — regression introduced by shader optimization in Phase 3"
+3. qa-tester returns test results with the regression noted
+4. Orchestrator surfaces the regression immediately: "qa-tester: REGRESSION FOUND — `item-highlight-hover` glow broken by Phase 3 shader optimization"
+5. Subagent files a bug report asking "May I write the bug report to `production/qa/evidence/bug-polish-inventory-ui-[date].md`?" before writing
+6. Bug report is written after approval; it includes: the broken behavior, the polish change that caused it, reproduction steps, and severity
+7. `AskUserQuestion` presents the regression with options:
+ - Revert the shader optimization and find an alternative approach
+ - Fix the shader optimization to preserve the glow effect
+ - Accept the regression and schedule a fix in the next sprint
+8. Verdict: NEEDS MORE WORK (regression present regardless of user's chosen resolution path, unless fix is applied within the current session)
+
+**Assertions:**
+- [ ] Regression is surfaced before Phase 6 sign-off
+- [ ] The specific broken behavior and the responsible change are both named in the report
+- [ ] Subagent asks "May I write the bug report to [path]?" before filing
+- [ ] Bug report includes: broken behavior, causal change, reproduction steps, severity
+- [ ] `AskUserQuestion` offers options including revert, fix in place, and schedule later
+- [ ] Verdict is NEEDS MORE WORK when a regression is present and unresolved
+- [ ] Verdict may become READY FOR RELEASE only if the regression is fixed within the current polish session and qa-tester re-runs to confirm
+
+---
+
+## Protocol Compliance
+
+- [ ] Phase 1 (assessment) must complete before any other phase begins
+- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
+- [ ] Phases 3 and 4 are always launched in parallel with Phase 2 (not deferred)
+- [ ] engine-programmer is only spawned when Phase 1 explicitly identifies engine-level root causes
+- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
+- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
+- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
+- [ ] A partial report is always produced when some agents complete and others block
+- [ ] Verdict is exactly READY FOR RELEASE or NEEDS MORE WORK — no other verdict values used
+- [ ] NEEDS MORE WORK verdict always lists specific remaining issues with severity
+- [ ] Next Steps handoff references `/release-checklist` (on success) and `/sprint-plan update` + `/gate-check` (on failure)
+
+---
+
+## Coverage Notes
+
+- The tools-programmer optional agent (for content pipeline tool verification) is not
+ separately tested — it follows the same conditional spawn pattern as engine-programmer
+ and is invoked only when content authoring tools are involved in the polished area.
+- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
+ Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+ + partial-report pattern validated in Cases 2 and 5.
+- Phase 6 sign-off logic (collecting and comparing all metrics) is validated implicitly
+ by Cases 1 and 2. The distinction between READY FOR RELEASE and NEEDS MORE WORK is
+ exercised in both directions across these cases.
+- Soak testing and stress testing (Phase 5) are validated implicitly by Case 1's
+ qa-tester output. Case 5 focuses on the regression detection aspect of Phase 5.
+- The "minimum spec hardware" test path in Phase 5 is not separately tested — it follows
+ the same qa-tester delegation pattern when the hardware is available.
diff --git a/CCGS Skill Testing Framework/skills/team/team-qa.md b/CCGS Skill Testing Framework/skills/team/team-qa.md
new file mode 100644
index 0000000..84b9d38
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-qa.md
@@ -0,0 +1,204 @@
+# Skill Test Spec: /team-qa
+
+## Skill Summary
+
+Orchestrates the QA team through a 7-phase structured testing cycle. Coordinates
+qa-lead (strategy, test plan, sign-off report) and qa-tester (test case writing,
+bug report writing). Covers scope detection, story classification, QA plan
+generation, smoke check gate, test case writing, manual QA execution with bug
+filing, and a final sign-off report with an APPROVED / APPROVED WITH CONDITIONS /
+NOT APPROVED verdict. Parallel qa-tester spawning is used in Phase 5 for
+independent stories.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains verdict keywords for sign-off report: APPROVED, APPROVED WITH CONDITIONS, NOT APPROVED
+- [ ] Contains "May I write" language for both the QA plan and the sign-off report
+- [ ] Has an Error Recovery Protocol section
+- [ ] Uses `AskUserQuestion` at phase transitions to capture user approval before proceeding
+- [ ] Phase 4 (smoke check) is a hard gate: FAIL stops the cycle
+- [ ] Bug reports are written to `production/qa/bugs/` with `BUG-[NNN]-[short-slug].md` naming
+- [ ] Next-step guidance differs by verdict (APPROVED / APPROVED WITH CONDITIONS / NOT APPROVED)
+- [ ] Independent qa-tester tasks in Phase 5 are spawned in parallel
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All stories pass manual QA, APPROVED verdict
+
+**Fixture:**
+- `production/sprints/sprint-03/` exists with 4 story files
+- Stories are a mix of types: 1 Logic, 1 Integration, 2 Visual/Feel
+- All stories have acceptance criteria populated
+- `tests/smoke/` contains a smoke test list; all items are verifiable
+- No existing bugs in `production/qa/bugs/`
+
+**Input:** `/team-qa sprint-03`
+
+**Expected behavior:**
+1. Phase 1: Reads all story files in `production/sprints/sprint-03/`; reads `production/stage.txt`; reports "Found 4 stories. Current stage: [stage]. Ready to begin QA strategy?"
+2. Phase 2: Spawns `qa-lead` via Task; produces strategy table classifying all 4 stories; no blockers flagged; presents to user; AskUserQuestion: user selects "Looks good — proceed to test plan"
+3. Phase 3: Produces QA plan document; asks "May I write the QA plan to `production/qa/qa-plan-sprint-03-[date].md`?"; writes after approval
+4. Phase 4: Spawns `qa-lead` via Task; reviews `tests/smoke/`; returns PASS; reports "Smoke check passed. Proceeding to test case writing."
+5. Phase 5: Spawns `qa-tester` via Task for each Visual/Feel and Integration story (2–3 stories); run in parallel; presents test cases grouped by story; AskUserQuestion per group; user approves
+6. Phase 6: Walks through each approved story; user marks all as PASS; result summary: "Stories PASS: 4, FAIL: 0, BLOCKED: 0"
+7. Phase 7: Spawns `qa-lead` via Task to produce sign-off report; report shows all stories PASS; no bugs filed; Verdict: APPROVED; asks "May I write this QA sign-off report to `production/qa/qa-signoff-sprint-03-[date].md`?"; writes after approval
+8. Verdict: COMPLETE — QA cycle finished
+
+**Assertions:**
+- [ ] Phase 1 correctly counts and reports 4 stories with current stage
+- [ ] Strategy table in Phase 2 classifies all 4 stories with correct types
+- [ ] QA plan written only after "May I write?" approval
+- [ ] Smoke check PASS allows pipeline to continue without user intervention
+- [ ] Phase 5 qa-tester tasks for independent stories are issued in parallel
+- [ ] Sign-off report includes Test Coverage Summary table and Verdict: APPROVED
+- [ ] Sign-off report written only after "May I write?" approval
+- [ ] Verdict: COMPLETE appears in final output
+- [ ] Next step: "Run `/gate-check` to validate advancement."
+
+---
+
+### Case 2: Smoke Check Fail — QA cycle stops at Phase 4
+
+**Fixture:**
+- `production/sprints/sprint-04/` exists with 3 story files
+- `tests/smoke/` exists with 5 smoke test items; 2 items cannot be verified (e.g., build is unstable, core navigation broken)
+
+**Input:** `/team-qa sprint-04`
+
+**Expected behavior:**
+1. Phases 1–3 complete normally; QA plan is written
+2. Phase 4: Spawns `qa-lead` via Task; smoke check returns FAIL; two specific failures are identified
+3. Skill reports: "Smoke check failed. QA cannot begin until these issues are resolved: [list of 2 failures]. Fix them and re-run `/smoke-check`, or re-run `/team-qa` once resolved."
+4. Skill stops immediately after Phase 4 — no Phase 5, 6, or 7 is executed
+5. No sign-off report is produced; no "May I write?" for a sign-off is issued
+
+**Assertions:**
+- [ ] Smoke check FAIL causes the pipeline to halt at Phase 4 — Phases 5, 6, 7 are NOT executed
+- [ ] Failure list is shown to the user explicitly (not summarized vaguely)
+- [ ] Skill recommends `/smoke-check` and `/team-qa` re-run as remediation steps
+- [ ] No QA sign-off report is written or offered
+- [ ] Skill does NOT produce a COMPLETE verdict
+- [ ] Any QA plan already written in Phase 3 is preserved (not deleted)
+
+---
+
+### Case 3: Bug Found — Visual/Feel story fails manual QA, bug report filed
+
+**Fixture:**
+- `production/sprints/sprint-05/` exists with 2 story files: 1 Logic (passes automated tests), 1 Visual/Feel
+- `tests/smoke/` smoke check passes
+- The Visual/Feel story's animation timing is visibly wrong (acceptance criterion not met)
+- `production/qa/bugs/` directory exists (empty or with existing bugs)
+
+**Input:** `/team-qa sprint-05`
+
+**Expected behavior:**
+1. Phases 1–5 complete normally; test cases are written for the Visual/Feel story
+2. Phase 6: User marks Visual/Feel story as FAIL; AskUserQuestion collects failure description: "Animation plays at 2x speed — jitter visible on every loop"
+3. Phase 6: Spawns `qa-tester` via Task to write a formal bug report; bug report written to `production/qa/bugs/BUG-001-animation-speed-jitter.md` (or next increment if bugs exist); report includes severity field
+4. Result summary: "Stories PASS: 1, FAIL: 1 — bugs filed: BUG-001"
+5. Phase 7: Spawns `qa-lead` to produce sign-off report; Bugs Found table lists BUG-001 with severity and status Open; Verdict: NOT APPROVED (S1/S2 bug open, or FAIL without documented workaround)
+6. Sign-off report write is offered; writes after approval
+7. Next step: "Resolve S1/S2 bugs and re-run `/team-qa` or targeted manual QA before advancing."
+
+**Assertions:**
+- [ ] FAIL result in Phase 6 triggers AskUserQuestion to collect the failure description before the bug report is written
+- [ ] `qa-tester` is spawned via Task to write the bug report — orchestrator does not write it directly
+- [ ] Bug report follows naming convention: `BUG-[NNN]-[short-slug].md` in `production/qa/bugs/`
+- [ ] Bug report NNN is incremented correctly from existing bugs in the directory
+- [ ] Phase 7 sign-off report Bugs Found table includes the bug ID, story name, severity, and status
+- [ ] Verdict in sign-off report is NOT APPROVED
+- [ ] Next step explicitly mentions re-running `/team-qa`
+- [ ] Verdict: COMPLETE is still issued by the orchestrator (the QA cycle finished — the verdict is NOT APPROVED, but the skill completed its pipeline)
+
+---
+
+### Case 4: No Argument — Skill infers active sprint or asks user
+
+**Fixture (variant A — state files present):**
+- `production/session-state/active.md` exists and contains a reference to `sprint-06`
+- `production/sprint-status.yaml` exists and identifies `sprint-06` as active
+
+**Fixture (variant B — state files absent):**
+- `production/session-state/active.md` does NOT exist
+- `production/sprint-status.yaml` does NOT exist
+
+**Input:** `/team-qa` (no argument)
+
+**Expected behavior (variant A):**
+1. Phase 1: No argument provided; reads `production/session-state/active.md`; reads `production/sprint-status.yaml`
+2. Detects `sprint-06` as the active sprint from both sources
+3. Proceeds as if `/team-qa sprint-06` was the input; reports "No sprint argument provided — inferred sprint-06 from session state. Found [N] stories."
+
+**Expected behavior (variant B):**
+1. Phase 1: No argument provided; attempts to read `production/session-state/active.md` — file missing; attempts to read `production/sprint-status.yaml` — file missing
+2. Cannot infer sprint; uses AskUserQuestion: "Which sprint or feature should QA cover?" with options to type a sprint identifier or cancel
+
+**Assertions:**
+- [ ] Skill does NOT default to a hardcoded sprint name when no argument is provided
+- [ ] Skill reads both `production/session-state/active.md` AND `production/sprint-status.yaml` before asking the user (variant A)
+- [ ] When both state files are absent, skill uses AskUserQuestion rather than guessing (variant B)
+- [ ] Inferred sprint is reported to the user before proceeding (variant A transparency)
+- [ ] Skill does NOT error out when state files are missing — it falls back to asking (variant B)
+
+---
+
+### Case 5: Mixed Results — Some PASS, one FAIL with S1 bug, one BLOCKED
+
+**Fixture:**
+- `production/sprints/sprint-07/` exists with 4 story files
+- Smoke check passes
+- Story A (Logic): automated test passes — PASS
+- Story B (UI): manual QA — PASS WITH NOTES (minor text overflow)
+- Story C (Visual/Feel): manual QA — FAIL; tester identifies S1 crash on ability activation
+- Story D (Integration): cannot test — BLOCKED (dependency system not yet implemented)
+
+**Input:** `/team-qa sprint-07`
+
+**Expected behavior:**
+1. Phases 1–5 proceed; Phase 5 test cases cover stories B, C, D
+2. Phase 6: User marks Story A as implicitly PASS (automated); Story B: PASS WITH NOTES; Story C: FAIL; Story D: BLOCKED
+3. After Story C FAIL: qa-tester spawned to write bug report `BUG-001-crash-ability-activation.md` with S1 severity
+4. Result summary presented: "Stories PASS: 1, PASS WITH NOTES: 1, FAIL: 1 — bugs filed: BUG-001 (S1), BLOCKED: 1"
+5. Phase 7: qa-lead produces sign-off report covering all 4 stories; BUG-001 listed as S1/Open; Story D listed as BLOCKED; Verdict: NOT APPROVED
+6. Sign-off report written after "May I write?" approval
+7. Next step: "Resolve S1/S2 bugs and re-run `/team-qa` or targeted manual QA before advancing."
+
+**Assertions:**
+- [ ] All 4 stories appear in the Phase 7 sign-off report Test Coverage Summary table — none are silently omitted
+- [ ] Story D (BLOCKED) is listed in the report with a BLOCKED status, not silently dropped
+- [ ] S1 bug causes Verdict: NOT APPROVED regardless of the other stories passing
+- [ ] PASS WITH NOTES stories do not downgrade to FAIL — they are tracked separately
+- [ ] BUG-001 severity is listed as S1 in the Bugs Found table
+- [ ] Partial results are preserved — the sign-off report is still produced even with failures and blocks
+- [ ] Verdict: COMPLETE is issued by the orchestrator (pipeline completed); sign-off verdict is NOT APPROVED
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at Phase 2 (strategy review), Phase 5 (test case approval per group), and Phase 6 (per-story manual QA result)
+- [ ] Phase 4 smoke check is a hard gate: FAIL halts the pipeline at Phase 4 with no exceptions
+- [ ] "May I write?" asked separately for QA plan (Phase 3) and sign-off report (Phase 7)
+- [ ] Bug reports are always written by `qa-tester` via Task — orchestrator does not write directly
+- [ ] Phase 5 qa-tester tasks for independent stories are issued in parallel where possible
+- [ ] Error recovery: any BLOCKED agent is surfaced immediately with AskUserQuestion options
+- [ ] Partial report always produced — no work is discarded because one story failed or blocked
+- [ ] Sign-off verdict rules are strictly applied: any S1/S2 bug open = NOT APPROVED; no exceptions
+- [ ] Orchestrator-level Verdict: COMPLETE is distinct from the sign-off report's APPROVED/NOT APPROVED verdict
+
+---
+
+## Coverage Notes
+
+- The "APPROVED WITH CONDITIONS" verdict path (S3/S4 bugs, PASS WITH NOTES) is covered implicitly by Case 5's PASS WITH NOTES story (Story B) — if no S1/S2 bugs existed, that case would produce APPROVED WITH CONDITIONS. A dedicated case is not required as the verdict logic is table-driven.
+- The `feature: [system-name]` argument form is not separately tested — it follows the same Phase 1 logic as the sprint form, using glob instead of directory read. The no-argument inference path (Case 4) provides sufficient coverage of the detection logic.
+- Logic stories with passing automated tests do not need manual QA — this is validated implicitly by Case 5 (Story A) where the Logic story receives no manual QA phase.
+- Parallel qa-tester spawning in Phase 5 is validated implicitly by Case 1 (multiple Visual/Feel stories issued simultaneously); no dedicated parallelism case is required beyond the Static Assertions check.
diff --git a/CCGS Skill Testing Framework/skills/team/team-release.md b/CCGS Skill Testing Framework/skills/team/team-release.md
new file mode 100644
index 0000000..ed2bb13
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-release.md
@@ -0,0 +1,215 @@
+# Skill Test Spec: /team-release
+
+## Skill Summary
+
+Orchestrates the release team through a 7-phase pipeline from release candidate to
+deployment and post-release monitoring. Coordinates release-manager, qa-lead,
+devops-engineer, producer, security-engineer (optional, required for online/
+multiplayer), network-programmer (optional, required for multiplayer),
+analytics-engineer, and community-manager. Phase 3 agents run in parallel. Ends
+with a go/no-go decision; deployment (Phase 6) is skipped if the producer calls
+NO-GO. Closes with a post-release monitoring plan.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" language in the File Write Protocol section (delegated to sub-agents)
+- [ ] Has a File Write Protocol section stating that the orchestrator does not write files directly
+- [ ] Has an Error Recovery Protocol section with four recovery options (surface / assess / offer options / partial report)
+- [ ] Has a next-step handoff referencing post-release monitoring, `/retrospective`, and `production/stage.txt`
+- [ ] Uses `AskUserQuestion` at phase transitions requiring user approval before proceeding
+- [ ] Phase 3 agents (qa-lead, devops-engineer, and optionally security-engineer, network-programmer) are explicitly stated to run in parallel
+- [ ] Phase 6 (Deployment) is conditional on a GO decision from Phase 5
+- [ ] security-engineer is described as conditional on online features / player data — not always spawned
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path (Single-Player) — All phases complete, version deployed
+
+**Fixture:**
+- `production/stage.txt` exists and contains a Production-or-later stage
+- Milestone acceptance criteria are all met (producer can confirm)
+- No online features, no multiplayer, no player data collection
+- All CI builds are clean on the current branch
+- No open S1/S2 bugs
+- `production/sprints/` contains the completed sprint stories for this milestone
+
+**Input:** `/team-release v1.0.0`
+
+**Expected behavior:**
+1. Phase 1: Spawns `producer` via Task; confirms all milestone acceptance criteria met; identifies any deferred scope; produces release authorization; presents to user; AskUserQuestion: user approves before Phase 2
+2. Phase 2: Spawns `release-manager` via Task; cuts release branch from agreed commit; bumps version numbers; invokes `/release-checklist`; freezes branch; output: branch name and checklist; AskUserQuestion: user approves before Phase 3
+3. Phase 3 (parallel): Issues Task calls simultaneously for `qa-lead` (regression suite, critical path sign-off) and `devops-engineer` (build artifacts, CI verification); security-engineer is NOT spawned (no online features); network-programmer is NOT spawned (no multiplayer); both complete successfully
+4. Phase 4: Verifies localization strings all translated; `analytics-engineer` verifies telemetry fires correctly on the release build; performance benchmarks pass; sign-off produced
+5. Phase 5: Spawns `producer` via Task; collects sign-offs from qa-lead, release-manager, devops-engineer; no open blocking issues; producer declares GO; AskUserQuestion: user sees GO decision and confirms deployment
+6. Phase 6: Spawns `release-manager` + `devops-engineer` (parallel); tags release in version control; invokes `/changelog`; deploys to staging; smoke test passes; deploys to production; simultaneously spawns `community-manager` to finalize patch notes via `/patch-notes v1.0.0` and prepare launch announcement
+7. Phase 7: release-manager generates release report; producer updates milestone tracking; qa-lead begins monitoring for regressions; community-manager publishes communication; analytics-engineer confirms live dashboards healthy
+8. Verdict: COMPLETE — release executed and deployed
+
+**Assertions:**
+- [ ] Phase 3 qa-lead and devops-engineer Task calls are issued simultaneously, not sequentially
+- [ ] security-engineer is NOT spawned when the game has no online features, multiplayer, or player data
+- [ ] Phase 5 producer collects sign-offs from all required parties before declaring GO
+- [ ] Phase 6 deployment only begins after GO decision is confirmed by the user
+- [ ] `/changelog` is invoked by release-manager in Phase 6 (not written directly)
+- [ ] `/patch-notes v1.0.0` is invoked by community-manager in Phase 6
+- [ ] Phase 7 monitoring plan includes a 48-hour post-release monitoring commitment
+- [ ] Next steps recommend updating `production/stage.txt` to `Live` after successful deployment
+- [ ] Verdict: COMPLETE appears in the final output
+
+---
+
+### Case 2: Go/No-Go: NO — S1 bug found in Phase 3, deployment skipped
+
+**Fixture:**
+- Release candidate branch exists for v0.9.0
+- qa-lead discovers a previously unreported S1 crash in the main menu during Phase 3 regression testing
+- devops-engineer build is clean and artifacts are ready
+- producer is aware of the S1 bug
+
+**Input:** `/team-release v0.9.0`
+
+**Expected behavior:**
+1. Phases 1–2 complete normally; release candidate is cut
+2. Phase 3 (parallel): devops-engineer returns clean build sign-off; qa-lead returns with an S1 bug identified and regression suite failing; qa-lead declares quality gate: NOT PASSED
+3. Orchestrator surfaces the qa-lead result immediately: "QA-LEAD: S1 bug found — [crash description]. Quality gate: NOT PASSED."
+4. Phase 4 proceeds cautiously or is paused (AskUserQuestion: continue to Phase 4 or skip to Phase 5 for go/no-go?)
+5. Phase 5: Spawns `producer` via Task; producer receives qa-lead's NOT PASSED verdict; no S1 sign-off available; producer declares NO-GO with rationale: "S1 bug [ID] is open and unresolved. Releasing is not safe."
+6. AskUserQuestion: user is presented with the NO-GO decision and the S1 bug details; options: fix the bug and re-run, defer the release, or override (with documented rationale)
+7. Phase 6 (Deployment) is SKIPPED entirely — no branch tagging, no deploy to staging, no deploy to production
+8. community-manager is NOT spawned in Phase 6 (no deployment to announce)
+9. Skill ends with a partial report summarizing what was completed (Phases 1–5) and what was skipped (Phase 6) and why
+10. Verdict: BLOCKED — release not deployed
+
+**Assertions:**
+- [ ] qa-lead S1 bug finding is surfaced to the user immediately after Phase 3 completes — not suppressed until Phase 5
+- [ ] producer's NO-GO decision explicitly references the S1 bug and the quality gate result
+- [ ] Phase 6 Deployment is completely skipped when producer declares NO-GO
+- [ ] community-manager is NOT spawned for patch notes or launch announcement on NO-GO
+- [ ] The partial report clearly states which phases completed and which were skipped, with reasons
+- [ ] Verdict: BLOCKED (not COMPLETE) when deployment is skipped due to NO-GO
+- [ ] AskUserQuestion offers the user resolution options (fix and re-run / defer / override with rationale)
+- [ ] Override path (if chosen) requires user to provide a documented rationale before proceeding to Phase 6
+
+---
+
+### Case 3: Security Audit for Online Game — security-engineer is spawned in Phase 3
+
+**Fixture:**
+- Game has multiplayer features and stores player account data
+- Release candidate exists for v2.1.0
+- qa-lead and devops-engineer both return clean sign-offs
+- security-engineer audit is required per team composition rules
+
+**Input:** `/team-release v2.1.0`
+
+**Expected behavior:**
+1. Phases 1–2 complete normally
+2. Phase 3 (parallel): Orchestrator detects that the game has online/multiplayer features and player data; issues Task calls simultaneously for `qa-lead`, `devops-engineer`, AND `security-engineer`; also spawns `network-programmer` for netcode stability sign-off
+3. security-engineer conducts pre-release security audit: reviews authentication flows, anti-cheat presence, data privacy compliance; returns sign-off
+4. network-programmer verifies lag compensation, reconnect handling, and bandwidth under load; returns sign-off
+5. All four Phase 3 agents complete; their results are collected before Phase 4 begins
+6. Phase 5: producer collects sign-offs from all four Phase 3 agents (qa-lead, devops-engineer, security-engineer, network-programmer) before making the go/no-go call
+7. Remaining phases proceed normally to COMPLETE
+
+**Assertions:**
+- [ ] security-engineer IS spawned in Phase 3 when the game has online features, multiplayer, or player data — this is not skipped
+- [ ] network-programmer IS spawned in Phase 3 when the game has multiplayer
+- [ ] All four Phase 3 Task calls (qa-lead, devops-engineer, security-engineer, network-programmer) are issued simultaneously
+- [ ] security-engineer audit covers authentication, anti-cheat, and data privacy compliance
+- [ ] Phase 5 producer sign-off collection includes security-engineer (four parties, not two)
+- [ ] Phase 6 deployment does not begin until security-engineer has signed off
+- [ ] Skill does NOT treat security-engineer as optional for a game with player data
+
+---
+
+### Case 4: Localization Miss — Untranslated strings block the ship
+
+**Fixture:**
+- Release candidate exists for v1.2.0
+- Phase 3 (qa-lead, devops-engineer) complete with clean sign-offs
+- Phase 4: localization verification detects 47 untranslated strings in the French locale (a supported language in the game's localization scope)
+- localization-lead is available as a delegatable agent
+
+**Input:** `/team-release v1.2.0`
+
+**Expected behavior:**
+1. Phases 1–3 complete with clean sign-offs
+2. Phase 4: Localization verification step detects untranslated strings; identifies 47 strings in French locale; localization-lead (if available) is spawned to assess the severity
+3. Orchestrator surfaces: "LOCALIZATION MISS: 47 untranslated strings found in French locale. Localization sign-off is required before shipping."
+4. AskUserQuestion: options presented — (a) Fix translations and re-run Phase 4, (b) Remove French locale from this release, (c) Ship as-is with a known issues note
+5. If user selects (a): Phase 4 is re-run after translations are provided; skill waits for localization sign-off
+6. Phase 5 go/no-go does NOT proceed while localization sign-off is outstanding
+7. Ship is blocked (Phase 6 not entered) until localization issue is resolved or explicitly waived
+
+**Assertions:**
+- [ ] Localization verification in Phase 4 detects untranslated strings and counts them (not just "some strings missing")
+- [ ] Untranslated strings for a supported locale block the pipeline before Phase 5
+- [ ] AskUserQuestion is used to offer the user resolution choices — the skill does not auto-waive
+- [ ] Phase 5 go/no-go is NOT called while localization sign-off is pending
+- [ ] If user chooses to re-run Phase 4: the skill does not require restarting from Phase 1
+- [ ] If user explicitly waives (ships as-is): the waiver is documented in the release report (Phase 7) as a known issue
+- [ ] Skill does NOT fabricate translated strings to unblock itself
+
+---
+
+### Case 5: No Argument — Skill infers version or asks
+
+**Fixture (variant A — milestone data present):**
+- `production/milestones/` exists with a milestone file; most recent milestone is "v1.1.0 — Gold"
+- `production/session-state/active.md` references a version or milestone
+
+**Fixture (variant B — no discoverable version):**
+- `production/milestones/` does not exist
+- `production/session-state/active.md` does not reference a version
+- No git tags are present from which to infer a version
+
+**Input:** `/team-release` (no argument)
+
+**Expected behavior (variant A):**
+1. Phase 1: No argument provided; reads `production/session-state/active.md`; reads most recent milestone file in `production/milestones/`
+2. Infers v1.1.0 as the target version; reports "No version argument provided — inferred v1.1.0 from milestone data. Proceeding."
+3. Confirms with AskUserQuestion before beginning Phase 1 proper: "Releasing v1.1.0. Is this correct?"
+4. Proceeds as if `/team-release v1.1.0` was the input
+
+**Expected behavior (variant B):**
+1. Phase 1: No argument provided; reads available state files — no version discoverable
+2. Uses AskUserQuestion: "What version number should be released? (e.g., v1.0.0)"
+3. Waits for user input before proceeding
+
+**Assertions:**
+- [ ] Skill does NOT default to a hardcoded version string when no argument is provided
+- [ ] Skill reads `production/session-state/active.md` and milestone files before asking (variant A)
+- [ ] Inferred version is confirmed with the user via AskUserQuestion before proceeding (variant A)
+- [ ] When no version is discoverable, AskUserQuestion is used — skill does not guess (variant B)
+- [ ] Skill does NOT error out when milestone files are absent — it falls back to asking (variant B)
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each phase transition gate (post-Phase 1, post-Phase 2, post-Phase 3/4 if issues, post-Phase 5 go/no-go)
+- [ ] Phase 3 agents are always issued as parallel Task calls — qa-lead and devops-engineer are never sequential
+- [ ] security-engineer is conditionally spawned based on game features — never silently skipped when features are present
+- [ ] File Write Protocol: orchestrator never calls Write/Edit directly — all writes are delegated to sub-agents or sub-skills
+- [ ] Phase 6 Deployment is strictly conditional on a GO verdict from Phase 5 — never auto-triggered
+- [ ] Error recovery: any BLOCKED agent is surfaced immediately before continuing to dependent phases
+- [ ] Partial reports are always produced if any phase fails or the pipeline is halted (Case 2)
+- [ ] Verdict: COMPLETE only when deployment completes; BLOCKED when go/no-go is NO or a hard blocker is unresolved
+- [ ] Next steps always include 48-hour post-release monitoring, `/retrospective` recommendation, and `production/stage.txt` update to `Live`
+
+---
+
+## Coverage Notes
+
+- Phase 7 post-release actions (release report, milestone tracking, community publishing, dashboard monitoring) are validated implicitly by Case 1. No separate edge case is required as Phase 7 is non-gated and does not have a blocking failure mode.
+- The "devops-engineer build fails" path is not separately tested — it would surface as a BLOCKED result in Phase 3 and follow the standard error recovery protocol (surface → assess → AskUserQuestion options). This is validated structurally by the Static Assertions error recovery check.
+- The parallel Phase 4 path (localization + performance + analytics simultaneously with Phase 3) is a documented option in the skill ("can run in parallel with Phase 3 if resources available"). Case 4 tests Phase 4 as a sequential gate; the parallel variant is left to the skill's implementation judgment.
+- The `network-programmer` sign-off path for multiplayer is validated as part of Case 3 rather than a separate case, as it follows the same parallel-spawn pattern as security-engineer.
+- The "override NO-GO with documented rationale" path in Case 2 is referenced but not exhaustively tested — it is an escape hatch that the skill must support, and its existence is validated by the AskUserQuestion options assertion in Case 2.
diff --git a/CCGS Skill Testing Framework/skills/team/team-ui.md b/CCGS Skill Testing Framework/skills/team/team-ui.md
new file mode 100644
index 0000000..57a1237
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/team/team-ui.md
@@ -0,0 +1,201 @@
+# Skill Test Spec: /team-ui
+
+## Skill Summary
+
+Orchestrates the UI team through the full UX pipeline for a single UI feature.
+Coordinates ux-designer, ui-programmer, art-director, the engine UI specialist,
+and accessibility-specialist through five structured phases: Context Gathering +
+UX Spec (Phase 1a/1b) → UX Review Gate (Phase 1c) → Visual Design (Phase 2) →
+Implementation (Phase 3) → Review in parallel (Phase 4) → Polish (Phase 5).
+Uses `AskUserQuestion` at each phase transition. Delegates all file writes to
+sub-agents and sub-skills (`/ux-design`, `ui-programmer`). Produces a summary report
+with verdict COMPLETE / BLOCKED and handoffs to `/ux-review`, `/code-review`,
+`/team-polish`.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings (Phase 1a through Phase 5 are all present)
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents and sub-skills, orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end (references `/ux-review`, `/code-review`, `/team-polish`)
+- [ ] Error Recovery Protocol section is present with all four recovery steps
+- [ ] Uses `AskUserQuestion` at phase transitions for user approval before proceeding
+- [ ] Phase 4 is explicitly marked as parallel (ux-designer, art-director, accessibility-specialist)
+- [ ] UX Review Gate (Phase 1c) is defined as a blocking gate — skill must not proceed to Phase 2 without APPROVED verdict
+- [ ] Team Composition lists all five roles (ux-designer, ui-programmer, art-director, engine UI specialist, accessibility-specialist)
+- [ ] References the interaction pattern library (`design/ux/interaction-patterns.md`) — ui-programmer must use existing patterns
+- [ ] Phase 1a reads `design/accessibility-requirements.md` before design begins
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Full pipeline from UX spec through polish succeeds
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists with platform targets and intended audience
+- `design/player-journey.md` exists
+- `design/ux/interaction-patterns.md` exists with relevant patterns
+- `design/accessibility-requirements.md` exists with committed tier (e.g., Enhanced)
+- Engine UI specialist configured in `.claude/docs/technical-preferences.md`
+
+**Input:** `/team-ui inventory screen`
+
+**Expected behavior:**
+1. Phase 1a — orchestrator reads game-concept.md, player-journey.md, relevant GDD UI sections, interaction-patterns.md, accessibility-requirements.md; summarizes a brief for the ux-designer
+2. Phase 1b — `/ux-design inventory-screen` invoked (or ux-designer spawned directly); produces `design/ux/inventory-screen.md` using `ux-spec.md` template; `AskUserQuestion` confirms spec before review
+3. Phase 1c — `/ux-review design/ux/inventory-screen.md` invoked; returns APPROVED; gate passed, proceed to Phase 2
+4. Phase 2 — art-director spawned; reviews full UX spec (not only wireframes); applies visual treatment; verifies color contrast; produces visual design spec with asset manifest; `AskUserQuestion` confirms before Phase 3
+5. Phase 3 — engine UI specialist spawned first (read from technical-preferences.md); produces implementation notes for ui-programmer; ui-programmer spawned with UX spec + visual spec + engine notes; implementation produced; interaction-patterns.md updated if new patterns introduced
+6. Phase 4 — ux-designer, art-director, accessibility-specialist spawned in parallel; all three return results before Phase 5
+7. Phase 5 — review feedback addressed; animations verified skippable; UI sounds confirmed through audio event system; interaction-patterns.md final check; verdict: COMPLETE
+8. Summary report: UX spec APPROVED, visual design COMPLETE, implementation COMPLETE, accessibility COMPLIANT, all input methods supported, pattern library updated, verdict: COMPLETE
+
+**Assertions:**
+- [ ] Phase 1a reads all five sources before briefing ux-designer
+- [ ] UX Review Gate checked before Phase 2 — Phase 2 does NOT begin until APPROVED
+- [ ] Art-director in Phase 2 reviews full spec, not just wireframe images
+- [ ] Engine UI specialist spawned before ui-programmer in Phase 3
+- [ ] Phase 4 agents launched simultaneously (ux-designer, art-director, accessibility-specialist)
+- [ ] All file writes delegated to sub-agents and sub-skills
+- [ ] Verdict COMPLETE in final summary report
+- [ ] Next steps include `/ux-review`, `/code-review`, `/team-polish`
+
+---
+
+### Case 2: UX Review Gate — Spec fails review; skill halts before implementation
+
+**Fixture:**
+- `design/ux/inventory-screen.md` produced by Phase 1b
+- `/ux-review` returns verdict NEEDS REVISION with specific concerns flagged (e.g., gamepad navigation flow incomplete, contrast ratio below minimum)
+
+**Input:** `/team-ui inventory screen`
+
+**Expected behavior:**
+1. Phase 1a + 1b complete — UX spec produced
+2. Phase 1c — `/ux-review design/ux/inventory-screen.md` returns NEEDS REVISION
+3. Skill does NOT advance to Phase 2
+4. `AskUserQuestion` presented with the specific flagged concerns and options:
+ - (a) Return to ux-designer to address the issues and re-review
+ - (b) Accept the risk and proceed to Phase 2 anyway (conscious decision)
+5. If user chooses (a): ux-designer revises spec, `/ux-review` re-run; loop continues until APPROVED or user overrides
+6. If user chooses (b): skill proceeds with an explicit NEEDS REVISION note in the final report
+7. Skill does NOT silently proceed past the gate
+
+**Assertions:**
+- [ ] Phase 2 does NOT begin while UX review verdict is NEEDS REVISION
+- [ ] `AskUserQuestion` presents the specific flagged concerns before offering options
+- [ ] User must make a conscious choice to override — skill does not assume override
+- [ ] If user accepts risk, NEEDS REVISION concern is documented in the final report
+- [ ] Revision-and-re-review loop is offered (not just a one-shot failure)
+- [ ] Skill does NOT discard the produced UX spec on review failure
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-ui` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument provided
+2. Outputs usage message explaining the required argument (UI feature description)
+3. Provides an example invocation: `/team-ui [UI feature description]`
+4. Skill exits without spawning any subagents or reading any project files
+
+**Assertions:**
+- [ ] Skill does NOT spawn any subagents when no argument is given
+- [ ] Usage message includes the argument-hint format from frontmatter
+- [ ] At least one example of a valid invocation is shown
+- [ ] No UX spec files or GDDs read before failing
+- [ ] Verdict is NOT shown (pipeline never starts)
+
+---
+
+### Case 4: Accessibility Parallel Review — Phase 4 runs three streams simultaneously
+
+**Fixture:**
+- `design/ux/inventory-screen.md` exists (APPROVED)
+- Visual design spec complete
+- Implementation complete
+- `design/accessibility-requirements.md` committed tier: Enhanced
+
+**Input:** `/team-ui inventory screen` (resuming from Phase 3 complete)
+
+**Expected behavior:**
+1. Phase 4 begins after implementation is confirmed complete
+2. Three Task calls issued simultaneously: ux-designer, art-director, accessibility-specialist
+3. Each stream operates independently:
+ - ux-designer: verifies implementation matches wireframes, tests keyboard-only and gamepad-only navigation, checks accessibility features function
+ - art-director: verifies visual consistency with art bible at minimum and maximum supported resolutions
+ - accessibility-specialist: audits against the Enhanced accessibility tier in `design/accessibility-requirements.md`; any violation flagged as a blocker
+4. Skill waits for all three results before proceeding to Phase 5
+5. `AskUserQuestion` presents all three review results before Phase 5 begins
+
+**Assertions:**
+- [ ] All three Task calls issued before any result is awaited (parallel, not sequential)
+- [ ] Phase 5 does NOT begin until all three Phase 4 agents have returned
+- [ ] Accessibility-specialist explicitly reads `design/accessibility-requirements.md` for the committed tier
+- [ ] Accessibility violations flagged as BLOCKING (not merely advisory)
+- [ ] `AskUserQuestion` shows all three review streams' results together before Phase 5 approval
+- [ ] No Phase 4 agent's output is used as input for another Phase 4 agent
+
+---
+
+### Case 5: Missing Interaction Pattern Library — Skill notes the gap rather than inventing patterns
+
+**Fixture:**
+- `design/ux/interaction-patterns.md` does NOT exist
+- All other required files present
+
+**Input:** `/team-ui settings menu`
+
+**Expected behavior:**
+1. Phase 1a — orchestrator attempts to read `design/ux/interaction-patterns.md`; file not found
+2. Skill surfaces the gap: "interaction-patterns.md does not exist — no existing patterns to reuse"
+3. `AskUserQuestion` presented with options:
+ - (a) Run `/ux-design patterns` first to establish the pattern library, then continue
+ - (b) Proceed without the pattern library — ux-designer will document new patterns as they are created
+4. Skill does NOT invent or assume patterns from other sources
+5. If user chooses (b): ui-programmer is explicitly instructed to treat all patterns created as new and to add each to a new `design/ux/interaction-patterns.md` at completion
+6. Final report notes that interaction-patterns.md was created (or is still absent if user skipped)
+
+**Assertions:**
+- [ ] Skill does NOT silently ignore the missing pattern library
+- [ ] Skill does NOT invent patterns by guessing from the feature name or GDD alone
+- [ ] `AskUserQuestion` offers a "create pattern library first" option (referencing `/ux-design patterns`)
+- [ ] If user proceeds without the library, ui-programmer is told to treat all patterns as new
+- [ ] Final report documents pattern library status (created / absent / updated)
+- [ ] Skill does NOT fail entirely — the gap is noted and user is given a choice
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each phase transition — user approves before pipeline advances
+- [ ] UX Review Gate (Phase 1c) is blocking — Phase 2 cannot begin without APPROVED or explicit user override
+- [ ] All file writes delegated to sub-agents and sub-skills — orchestrator does not call Write or Edit directly
+- [ ] Phase 4 agents launched in parallel per skill spec
+- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
+- [ ] Partial report always produced even when agents are BLOCKED
+- [ ] Verdict is one of COMPLETE / BLOCKED
+- [ ] Next steps present at end: `/ux-review`, `/code-review`, `/team-polish`
+
+---
+
+## Coverage Notes
+
+- The HUD-specific path (`/ux-design hud` + `hud-design.md` template + visual budget check in Phase 5)
+ is not separately tested here; it shares the same phase structure but uses different templates.
+- The "Update in place" path for interaction-patterns.md (new pattern added during implementation)
+ is exercised implicitly in Case 1 Step 5 — a dedicated fixture with a known new pattern would
+ strengthen coverage.
+- Engine UI specialist unavailable (no engine configured) — skill spec states "skip if no engine
+ configured"; this path is asserted in Case 1 but not given a dedicated fixture.
+- The NEEDS REVISION acceptance-risk override (Case 2 option b) requires the override to be
+ explicitly documented in the report; this is asserted but not further tested for downstream effects.
diff --git a/CCGS Skill Testing Framework/skills/utility/adopt.md b/CCGS Skill Testing Framework/skills/utility/adopt.md
new file mode 100644
index 0000000..5751319
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/adopt.md
@@ -0,0 +1,170 @@
+# Skill Test Spec: /adopt
+
+## Skill Summary
+
+`/adopt` performs brownfield onboarding: it reads an existing non-Claude-Code
+project's source files, detects the engine and language, and generates a
+matching CLAUDE.md stub plus a populated `technical-preferences.md` to bring
+the project under the Claude Code Game Studios framework. It may also produce
+skeleton GDD files if enough design intent can be inferred from the code.
+
+Each generated file is gated behind a "May I write" ask. If an existing CLAUDE.md
+or `technical-preferences.md` is detected, the skill offers to merge rather than
+overwrite. The skill has no director gates. Verdicts: COMPLETE (full analysis done
+and files written), PARTIAL (analysis complete but some fields are ambiguous),
+or BLOCKED (cannot proceed — no source code found or user declined all writes).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, PARTIAL, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language before each file creation
+- [ ] Has a next-step handoff at the end (e.g., `/setup-engine` to refine or `/brainstorm`)
+
+---
+
+## Director Gate Checks
+
+None. `/adopt` is a brownfield onboarding utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Existing Unity project with C# code detected
+
+**Fixture:**
+- `src/` contains `.cs` files with Unity-specific namespaces (`using UnityEngine;`)
+- No CLAUDE.md overrides, no `technical-preferences.md` beyond placeholders
+- Project has a recognizable folder structure (Assets/, Scripts/)
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill scans `src/` and detects C# files with Unity API imports
+2. Skill identifies engine as Unity, language as C#
+3. Skill produces a draft `technical-preferences.md` with engine/language fields populated
+4. Skill produces a draft CLAUDE.md stub with detected project structure
+5. Skill asks "May I write `technical-preferences.md`?" and then "May I write CLAUDE.md?"
+6. Files are written after approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Engine detected as Unity (not Godot or Unreal)
+- [ ] Language detected as C#
+- [ ] Draft is shown to user before any "May I write" ask
+- [ ] "May I write" is asked separately for each file
+- [ ] Verdict is COMPLETE after both files are written
+
+---
+
+### Case 2: Mixed Languages — Partial analysis, asks user to clarify
+
+**Fixture:**
+- `src/` contains both `.gd` (GDScript) and `.cs` (C#) files
+- Engine cannot be definitively identified from the mix
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill scans source and detects conflicting language signals
+2. Skill reports: "Mixed language signals detected (GDScript + C#) — cannot auto-identify engine"
+3. Skill presents the ambiguous findings and asks the user to confirm: Godot with C# or Unity?
+4. After user clarifies, skill resumes analysis with confirmed engine
+5. Produces a PARTIAL analysis noting fields that required manual clarification
+
+**Assertions:**
+- [ ] Skill does NOT guess or silently pick an engine when signals conflict
+- [ ] Ambiguous findings are reported to the user explicitly
+- [ ] User choice is incorporated into the generated config
+- [ ] Verdict is PARTIAL (not COMPLETE) when manual clarification was required
+
+---
+
+### Case 3: CLAUDE.md Already Exists — Offers merge rather than overwrite
+
+**Fixture:**
+- `CLAUDE.md` exists with custom content (project name, existing imports)
+- `technical-preferences.md` exists with some fields populated
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill reads existing CLAUDE.md and detects it is already populated
+2. Skill reports: "CLAUDE.md already exists — offering to merge, not overwrite"
+3. Skill presents a diff of new fields vs. existing content
+4. Skill asks "May I merge new fields into CLAUDE.md?" (not "May I write")
+5. If user approves: only new or changed fields are added; existing content preserved
+
+**Assertions:**
+- [ ] Skill does NOT overwrite existing CLAUDE.md without explicit user approval for a full replace
+- [ ] Merge option is offered when the file already exists
+- [ ] Diff is shown before the merge ask
+- [ ] Existing custom content is preserved in the merged output
+
+---
+
+### Case 4: No Source Code Found — Stops with error
+
+**Fixture:**
+- Repository has only documentation files (`.md`) and no source code in `src/`
+- No engine-identifiable files anywhere in the repo
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill scans `src/` and all likely code locations — finds nothing
+2. Skill outputs: "No source code detected — cannot perform brownfield analysis"
+3. Skill suggests alternatives: run `/start` for a new project, or point to a
+ different directory if source is located elsewhere
+4. No files are written
+
+**Assertions:**
+- [ ] Verdict is BLOCKED
+- [ ] Error message explicitly states no source code was found
+- [ ] Alternatives (`/start` or directory guidance) are provided
+- [ ] No "May I write" prompts appear (nothing to write)
+
+---
+
+### Case 5: Director Gate Check — No gate; adopt is a utility onboarding skill
+
+**Fixture:**
+- Existing project with detectable source code
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill completes full brownfield analysis and produces config files
+2. No director agents are spawned at any point
+3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Skill reaches COMPLETE or PARTIAL without any gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Scans source before generating any config content
+- [ ] Shows draft config to user before asking to write
+- [ ] Asks "May I write" (or "May I merge") before each file operation
+- [ ] Detects existing files and offers merge path rather than silent overwrite
+- [ ] Ends with COMPLETE, PARTIAL, or BLOCKED verdict
+
+---
+
+## Coverage Notes
+
+- The Unreal Engine + Blueprint detection case (`.uasset`, `.umap` files)
+ follows the same happy path pattern as Case 1 and is not separately tested.
+- Multi-directory source layouts (monorepo style) are not tested; the skill
+ assumes a conventional single-project structure.
+- GDD skeleton generation from inferred design intent is noted as a capability
+ but not fixture-tested here — it follows from the PARTIAL analysis pattern.
diff --git a/CCGS Skill Testing Framework/skills/utility/asset-spec.md b/CCGS Skill Testing Framework/skills/utility/asset-spec.md
new file mode 100644
index 0000000..9e812c1
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/asset-spec.md
@@ -0,0 +1,179 @@
+# Skill Test Spec: /asset-spec
+
+## Skill Summary
+
+`/asset-spec` generates per-asset visual specification documents from design
+requirements. It reads the relevant GDD, art bible, and design system to produce
+a structured asset spec sheet that defines: dimensions, animation states (if
+applicable), color palette reference, style notes, technical constraints
+(format, file size budget), and deliverable checklist.
+
+Spec sheets are written to `assets/specs/[asset-name]-spec.md` after a "May I write"
+ask. If a spec already exists, the skill offers to update it. When multiple assets
+are requested in a single invocation, a "May I write" ask is made per asset. No
+director gates apply. The verdict is COMPLETE when all requested specs are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language (per asset)
+- [ ] Has a next-step handoff (e.g., assign to an artist, or `/asset-audit` later)
+
+---
+
+## Director Gate Checks
+
+None. `/asset-spec` is a design documentation utility. Technical artists may
+review specs separately but this is not a gate within this skill.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Enemy sprite spec with full GDD and art bible
+
+**Fixture:**
+- `design/gdd/enemies.md` exists with enemy variants defined
+- `design/art-bible.md` exists with color palette and style notes
+- No existing asset spec for "goblin-enemy"
+
+**Input:** `/asset-spec goblin-enemy`
+
+**Expected behavior:**
+1. Skill reads enemies GDD and art bible
+2. Skill generates a spec for the goblin enemy sprite:
+ - Dimensions: inferred from engine defaults or explicitly from GDD
+ - Animation states: idle, walk, attack, hurt, death
+ - Color palette reference: links to art-bible palette section
+ - Style notes: from art bible character design rules
+ - Technical constraints: format (PNG), size budget
+ - Deliverable checklist
+3. Skill asks "May I write to `assets/specs/goblin-enemy-spec.md`?"
+4. File written on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] All 6 spec components are present (dimensions, animations, palette, style, tech, checklist)
+- [ ] Color palette reference links to art bible (not duplicated)
+- [ ] Animation states are drawn from GDD (not invented)
+- [ ] "May I write" is asked with the correct path
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: No Art Bible Found — Spec with Placeholder Style Notes, Dependency Flagged
+
+**Fixture:**
+- `design/gdd/player.md` exists
+- `design/art-bible.md` does NOT exist
+
+**Input:** `/asset-spec player-sprite`
+
+**Expected behavior:**
+1. Skill reads player GDD but cannot find the art bible
+2. Skill generates spec with placeholder style notes: "DEPENDENCY GAP: art bible
+ not found — style notes are placeholders"
+3. Color palette section uses: "TBD — see art bible when created"
+4. Skill asks "May I write to `assets/specs/player-sprite-spec.md`?"
+5. File written with placeholders and dependency flag; verdict is COMPLETE with advisory
+
+**Assertions:**
+- [ ] DEPENDENCY GAP is flagged for the missing art bible
+- [ ] Spec is still generated (not blocked)
+- [ ] Style notes contain placeholder markers, not invented styles
+- [ ] Verdict is COMPLETE with advisory note
+
+---
+
+### Case 3: Asset Spec Already Exists — Offers to Update
+
+**Fixture:**
+- `assets/specs/goblin-enemy-spec.md` already exists
+- GDD has been updated since the spec was written (new attack animation added)
+
+**Input:** `/asset-spec goblin-enemy`
+
+**Expected behavior:**
+1. Skill detects existing spec file
+2. Skill reports: "Asset spec already exists for goblin-enemy — checking for updates"
+3. Skill diffs GDD against existing spec and identifies: new "charge-attack" animation
+ state added in GDD but not in spec
+4. Skill presents the diff: "1 new animation state found — offering to update spec"
+5. Skill asks "May I update `assets/specs/goblin-enemy-spec.md`?" (not overwrite)
+6. Spec is updated; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Existing spec is detected and "update" path is offered
+- [ ] Diff between GDD and existing spec is shown
+- [ ] "May I update" language is used (not "May I write")
+- [ ] Existing spec content is preserved; only the diff is applied
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: Multiple Assets Requested — May-I-Write Per Asset
+
+**Fixture:**
+- GDD and art bible exist
+- User requests specs for 3 assets: goblin-enemy, orc-enemy, treasure-chest
+
+**Input:** `/asset-spec goblin-enemy orc-enemy treasure-chest`
+
+**Expected behavior:**
+1. Skill generates all 3 specs in sequence
+2. For each asset, skill shows the draft and asks "May I write to
+ `assets/specs/[name]-spec.md`?" individually
+3. User can approve all 3 or skip individual assets
+4. All approved specs are written; verdict is COMPLETE
+
+**Assertions:**
+- [ ] "May I write" is asked 3 times (once per asset), not once for all
+- [ ] User can decline one asset without blocking the others
+- [ ] All 3 spec files are written for approved assets
+- [ ] Verdict is COMPLETE when all approved specs are written
+
+---
+
+### Case 5: Director Gate Check — No gate; asset-spec is a design utility
+
+**Fixture:**
+- GDD and art bible exist
+
+**Input:** `/asset-spec goblin-enemy`
+
+**Expected behavior:**
+1. Skill generates and writes the asset spec
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads GDD, art bible, and design system before generating spec
+- [ ] Includes all 6 spec components (dimensions, animations, palette, style, tech, checklist)
+- [ ] Flags missing dependencies (art bible, GDD) with DEPENDENCY GAP notes
+- [ ] Asks "May I write" (or "May I update") per asset
+- [ ] Handles multiple assets with individual write confirmations
+- [ ] Verdict is COMPLETE when all approved specs are written
+
+---
+
+## Coverage Notes
+
+- Audio asset specs (sound effects, music) follow the same structure with
+ different fields (duration, sample rate, looping) and are not separately tested.
+- UI asset specs (icons, button states) follow the same flow with interaction
+ state requirements aligned to the UX spec.
+- The case where GDD is also missing (neither GDD nor art bible exists) is not
+ separately tested; spec would be generated with both dependency gaps flagged.
diff --git a/CCGS Skill Testing Framework/skills/utility/brainstorm.md b/CCGS Skill Testing Framework/skills/utility/brainstorm.md
new file mode 100644
index 0000000..846ecf5
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/brainstorm.md
@@ -0,0 +1,189 @@
+# Skill Test Spec: /brainstorm
+
+## Skill Summary
+
+`/brainstorm` facilitates guided game concept ideation. It presents 2-4 concept
+options with pros/cons, lets the user choose and refine a concept, and produces
+a structured `design/gdd/game-concept.md` document. The skill is collaborative —
+it asks questions before proposing options and iterates until the user approves
+a concept direction.
+
+In `full` review mode, four director gates spawn in parallel after the concept
+is drafted: CD-PILLARS (creative-director), AD-CONCEPT-VISUAL (art-director),
+TD-FEASIBILITY (technical-director), and PR-SCOPE (producer). In `lean` mode,
+all 4 inline gates are skipped (lean mode only runs PHASE-GATEs, and brainstorm
+has none). In `solo` mode, all gates are skipped. The skill asks "May I write"
+before writing `design/gdd/game-concept.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, REJECTED, CONCERNS
+- [ ] Contains "May I write" collaborative protocol language (for game-concept.md)
+- [ ] Has a next-step handoff at the end (`/map-systems`)
+- [ ] Documents 4 director gates in full mode: CD-PILLARS, AD-CONCEPT-VISUAL, TD-FEASIBILITY, PR-SCOPE
+- [ ] Documents that all 4 gates are skipped in lean and solo modes
+
+---
+
+## Director Gate Checks
+
+In `full` mode: CD-PILLARS, AD-CONCEPT-VISUAL, TD-FEASIBILITY, and PR-SCOPE
+spawn in parallel after the concept draft is approved by the user.
+
+In `lean` mode: all 4 inline gates are skipped (brainstorm has no PHASE-GATEs,
+so lean mode skips everything). Output notes all 4 as: "[GATE-ID] skipped — lean mode".
+
+In `solo` mode: all 4 gates are skipped. Output notes all 4 as: "[GATE-ID] skipped — solo mode".
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Full mode, 3 concepts, user picks one, all 4 directors approve
+
+**Fixture:**
+- No existing `design/gdd/game-concept.md`
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/brainstorm`
+
+**Expected behavior:**
+1. Skill asks the user questions about genre, scope, and target feeling
+2. Skill presents 3 concept options with pros/cons each
+3. User selects one concept
+4. Skill elaborates the chosen concept into a structured draft
+5. All 4 director gates spawn in parallel: CD-PILLARS, AD-CONCEPT-VISUAL, TD-FEASIBILITY, PR-SCOPE
+6. All 4 return APPROVED
+7. Skill asks "May I write `design/gdd/game-concept.md`?"
+8. Concept written after approval
+
+**Assertions:**
+- [ ] Exactly 3 concept options are presented (not 1, not 5+)
+- [ ] All 4 director gates spawn in parallel (not sequentially)
+- [ ] All 4 gates complete before the "May I write" ask
+- [ ] "May I write `design/gdd/game-concept.md`?" is asked before writing
+- [ ] Concept file is NOT written without user approval
+- [ ] Next-step handoff to `/map-systems` is present
+
+---
+
+### Case 2: Failure Path — CD-PILLARS returns REJECT
+
+**Fixture:**
+- Concept draft is complete
+- `production/session-state/review-mode.txt` contains `full`
+- CD-PILLARS gate returns REJECT: "The concept has no identifiable creative pillar"
+
+**Input:** `/brainstorm`
+
+**Expected behavior:**
+1. CD-PILLARS gate returns REJECT with specific feedback
+2. Skill surfaces the rejection to the user
+3. Concept is NOT written to file
+4. User is asked: rethink the concept direction, or override the rejection
+5. If rethinking: skill returns to the concept options phase
+
+**Assertions:**
+- [ ] Concept is NOT written when CD-PILLARS returns REJECT
+- [ ] Rejection feedback is shown to the user verbatim
+- [ ] User is given the option to rethink or override
+- [ ] Skill returns to concept ideation phase if user chooses to rethink
+
+---
+
+### Case 3: Lean Mode — All 4 gates skipped; concept written after user confirms
+
+**Fixture:**
+- No existing game concept
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/brainstorm`
+
+**Expected behavior:**
+1. Concept options are presented and user selects one
+2. Concept is elaborated into a structured draft
+3. All 4 director gates are skipped — each noted: "[GATE-ID] skipped — lean mode"
+4. Skill asks user to confirm the concept is ready to write
+5. "May I write `design/gdd/game-concept.md`?" asked after confirmation
+6. Concept written after approval
+
+**Assertions:**
+- [ ] All 4 gate skip notes appear: "CD-PILLARS skipped — lean mode", "AD-CONCEPT-VISUAL skipped — lean mode", "TD-FEASIBILITY skipped — lean mode", "PR-SCOPE skipped — lean mode"
+- [ ] Concept is written after user confirmation only (no director approval needed in lean)
+- [ ] "May I write" is still asked before writing
+
+---
+
+### Case 4: Solo Mode — All gates skipped; concept written with only user approval
+
+**Fixture:**
+- No existing game concept
+- `production/session-state/review-mode.txt` contains `solo`
+
+**Input:** `/brainstorm`
+
+**Expected behavior:**
+1. Concept options are presented and user selects one
+2. Concept draft is shown to user
+3. All 4 director gates are skipped — each noted with "solo mode"
+4. "May I write `design/gdd/game-concept.md`?" asked
+5. Concept written after user approval
+
+**Assertions:**
+- [ ] All 4 skip notes appear with "solo mode" label
+- [ ] No director agents are spawned
+- [ ] Concept is written with only user approval
+- [ ] Behavior is otherwise equivalent to lean mode for this skill
+
+---
+
+### Case 5: Director Gate — PR-SCOPE returns CONCERNS (scope too large)
+
+**Fixture:**
+- Concept draft is complete
+- `production/session-state/review-mode.txt` contains `full`
+- PR-SCOPE gate returns CONCERNS: "The concept scope would require 18+ months for a solo developer"
+
+**Input:** `/brainstorm`
+
+**Expected behavior:**
+1. PR-SCOPE gate returns CONCERNS with specific scope feedback
+2. Skill surfaces the scope concerns to the user
+3. Scope concerns are documented in the concept draft before writing
+4. User is asked: reduce scope, accept concerns and document them, or rethink
+5. If concerns are accepted: concept is written with a "Scope Risk" note embedded
+
+**Assertions:**
+- [ ] PR-SCOPE concerns are shown to the user before the "May I write" ask
+- [ ] Skill does NOT write concept without surfacing scope concerns
+- [ ] If user accepts: scope concerns are documented in the concept file
+- [ ] Skill does NOT auto-reject a concept due to PR-SCOPE CONCERNS (user decides)
+
+---
+
+## Protocol Compliance
+
+- [ ] Presents 2-4 concept options with pros/cons before user commits
+- [ ] User confirms concept direction before director gates are invoked
+- [ ] All 4 director gates spawn in parallel in full mode
+- [ ] All 4 gates skipped in lean AND solo mode — each noted by name
+- [ ] "May I write `design/gdd/game-concept.md`?" asked before writing
+- [ ] Ends with next-step handoff: `/map-systems`
+
+---
+
+## Coverage Notes
+
+- AD-CONCEPT-VISUAL gate (art director feasibility) is grouped with the other
+ 3 gates in the parallel spawn — not independently fixture-tested.
+- The iterative concept refinement loop (user rejects all options, skill
+ generates new ones) is not fixture-tested — it follows the same pattern as
+ the option selection phase.
+- The game-concept.md document structure (required sections) is defined in the
+ skill body and not re-enumerated in test assertions.
diff --git a/CCGS Skill Testing Framework/skills/utility/bug-report.md b/CCGS Skill Testing Framework/skills/utility/bug-report.md
new file mode 100644
index 0000000..d514dce
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/bug-report.md
@@ -0,0 +1,174 @@
+# Skill Test Spec: /bug-report
+
+## Skill Summary
+
+`/bug-report` creates a structured bug report document from a user description.
+It produces a report with the following required fields: Title, Repro Steps,
+Expected Behavior, Actual Behavior, Severity (CRITICAL/HIGH/MEDIUM/LOW), Affected
+System(s), and Build/Version. If the user's initial description is missing any
+required field, the skill asks follow-up questions to fill the gaps before
+producing the draft.
+
+The skill checks for possibly duplicate reports (by comparing to existing files
+in `production/bugs/`) and offers to link rather than create a new report. Each
+report is written to `production/bugs/bug-[date]-[slug].md` after a "May I write"
+ask. No director gates are used — bug reporting is an operational utility.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before writing the report
+- [ ] Has a next-step handoff (e.g., `/bug-triage` to reprioritize, `/hotfix` for critical)
+
+---
+
+## Director Gate Checks
+
+None. `/bug-report` is an operational documentation skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — User describes a crash, full report produced
+
+**Fixture:**
+- `production/bugs/` directory exists and is empty
+- No similar existing reports
+
+**Input:** `/bug-report` (user describes: "Game crashes when player enters the boss arena")
+
+**Expected behavior:**
+1. Skill extracts: Title = "Game crashes when entering boss arena"
+2. Skill recognizes crash reports as CRITICAL severity
+3. Skill confirms repro steps, expected (no crash), actual (crash), affected system
+ (arena/boss), and build version with the user
+4. Skill drafts the full structured report
+5. Skill asks "May I write to `production/bugs/bug-2026-04-06-game-crashes-boss-arena.md`?"
+6. File is written on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] All 7 required fields are present in the report
+- [ ] Severity is CRITICAL for a crash report
+- [ ] Filename follows the `bug-[date]-[slug].md` convention
+- [ ] "May I write" is asked with the full file path
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Minimal Input — Skill asks follow-up questions for missing fields
+
+**Fixture:**
+- User provides: "Sometimes the audio cuts out"
+- No existing reports
+
+**Input:** `/bug-report`
+
+**Expected behavior:**
+1. Skill identifies missing required fields: repro steps, expected vs. actual,
+ severity, affected system, build
+2. Skill asks targeted follow-up questions for each missing field (one at a time
+ or in a structured prompt)
+3. User provides answers
+4. Skill compiles complete report from answers
+5. Skill asks "May I write?" and writes on approval
+
+**Assertions:**
+- [ ] At least 3 follow-up questions are asked to fill missing fields
+- [ ] Each required field is filled before the report is finalized
+- [ ] Report is not written until all required fields are present
+- [ ] Verdict is COMPLETE after all fields are filled and file is written
+
+---
+
+### Case 3: Possible Duplicate — Offers to link rather than create new
+
+**Fixture:**
+- `production/bugs/bug-2026-03-20-audio-cut-out.md` already exists with
+ similar title and MEDIUM severity
+
+**Input:** `/bug-report` (user describes: "Audio randomly stops working")
+
+**Expected behavior:**
+1. Skill scans existing reports and finds the similar audio bug
+2. Skill reports: "A similar bug report exists: bug-2026-03-20-audio-cut-out.md"
+3. Skill presents options: link as duplicate (add note to existing), create new anyway
+4. If user chooses link: skill adds a cross-reference note to the existing file
+ (asks "May I update the existing report?")
+5. If user chooses create new: normal report creation proceeds
+
+**Assertions:**
+- [ ] Existing similar report is surfaced before creating a new one
+- [ ] User is given the choice (not forced to link or create)
+- [ ] If linking: "May I update" is asked before modifying the existing file
+- [ ] Verdict is COMPLETE in either path
+
+---
+
+### Case 4: Multi-System Bug — Report created with multiple system tags
+
+**Fixture:**
+- No existing reports
+
+**Input:** `/bug-report` (user describes: "After finishing a level, the save system
+ freezes and the UI doesn't show the completion screen")
+
+**Expected behavior:**
+1. Skill identifies 2 affected systems from the description: Save System and UI
+2. Report is drafted with both systems listed under Affected System(s)
+3. Severity is assessed (likely HIGH — data loss risk from save freeze)
+4. Skill asks "May I write" with the appropriate filename
+5. Report is written with both systems tagged; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Both affected systems are listed in the report
+- [ ] Single report is created (not one per system)
+- [ ] Severity reflects the most impactful component (save freeze → HIGH or CRITICAL)
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; bug reporting is operational
+
+**Fixture:**
+- Any bug description provided
+
+**Input:** `/bug-report`
+
+**Expected behavior:**
+1. Skill creates and writes the bug report
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Skill reaches COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Collects all 7 required fields before drafting the report
+- [ ] Asks follow-up questions for any missing required fields
+- [ ] Checks for similar existing reports before creating a new one
+- [ ] Asks "May I write to `production/bugs/bug-[date]-[slug].md`?" before writing
+- [ ] Verdict is COMPLETE when the report file is written
+
+---
+
+## Coverage Notes
+
+- The case where the user provides a severity that seems too low for the
+ described impact (e.g., LOW for a crash) is not tested; the skill may suggest
+ a higher severity but ultimately respects user input.
+- Build/version field is required but may be "unknown" if the user doesn't know —
+ this is accepted as a valid value and not tested separately.
+- Report slug generation (sanitizing the title into a filename) is an
+ implementation detail not assertion-tested here.
diff --git a/CCGS Skill Testing Framework/skills/utility/bug-triage.md b/CCGS Skill Testing Framework/skills/utility/bug-triage.md
new file mode 100644
index 0000000..980d178
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/bug-triage.md
@@ -0,0 +1,174 @@
+# Skill Test Spec: /bug-triage
+
+## Skill Summary
+
+`/bug-triage` reads all open bug reports in `production/bugs/` and produces a
+prioritized triage table sorted by severity (CRITICAL → HIGH → MEDIUM → LOW).
+It runs on the Haiku model (read-only, formatting/sorting task) and produces no
+file writes — the triage output is conversational. The skill flags bugs missing
+reproduction steps and identifies possible duplicates by comparing titles and
+affected systems.
+
+The verdict is always TRIAGED — the skill is advisory and informational. No
+director gates apply. The output is intended to help a producer or QA lead
+prioritize which bugs to address next.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: TRIAGED
+- [ ] Does NOT contain "May I write" language (skill is read-only)
+- [ ] Has a next-step handoff (e.g., `/bug-report` to create new reports, `/hotfix` for critical bugs)
+
+---
+
+## Director Gate Checks
+
+None. `/bug-triage` is a read-only advisory skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — 5 bugs of varying severity, sorted table produced
+
+**Fixture:**
+- `production/bugs/` contains 5 bug report files:
+ - bug-2026-03-10-audio-crash.md (CRITICAL)
+ - bug-2026-03-12-score-overflow.md (HIGH)
+ - bug-2026-03-14-ui-overlap.md (MEDIUM)
+ - bug-2026-03-15-typo-tutorial.md (LOW)
+ - bug-2026-03-16-vfx-flicker.md (HIGH)
+
+**Input:** `/bug-triage`
+
+**Expected behavior:**
+1. Skill reads all 5 bug report files
+2. Skill extracts severity, title, system, and repro status from each
+3. Skill produces a triage table sorted: CRITICAL first, then HIGH, MEDIUM, LOW
+4. Within the same severity, bugs are ordered by date (oldest first)
+5. Verdict is TRIAGED
+
+**Assertions:**
+- [ ] Triage table has exactly 5 rows
+- [ ] CRITICAL bug appears before both HIGH bugs
+- [ ] HIGH bugs appear before MEDIUM and LOW bugs
+- [ ] Verdict is TRIAGED
+- [ ] No files are written
+
+---
+
+### Case 2: No Bug Reports Found — Guidance to run /bug-report
+
+**Fixture:**
+- `production/bugs/` directory exists but is empty (or does not exist)
+
+**Input:** `/bug-triage`
+
+**Expected behavior:**
+1. Skill scans `production/bugs/` and finds no reports
+2. Skill outputs: "No open bug reports found in production/bugs/"
+3. Skill suggests running `/bug-report` to create a bug report
+4. No triage table is produced
+
+**Assertions:**
+- [ ] Output explicitly states no bugs were found
+- [ ] `/bug-report` is suggested as the next step
+- [ ] Skill does not error out — it handles empty directory gracefully
+- [ ] Verdict is TRIAGED (with "no bugs found" context)
+
+---
+
+### Case 3: Bug Missing Reproduction Steps — Flagged as NEEDS REPRO INFO
+
+**Fixture:**
+- `production/bugs/` contains 3 bug reports; one has an empty "Repro Steps" section
+
+**Input:** `/bug-triage`
+
+**Expected behavior:**
+1. Skill reads all 3 reports
+2. Skill detects the report with no repro steps
+3. That bug appears in the triage table with a `NEEDS REPRO INFO` tag
+4. Other bugs are triaged normally
+5. Verdict is TRIAGED
+
+**Assertions:**
+- [ ] `NEEDS REPRO INFO` tag appears next to the bug missing repro steps
+- [ ] The flagged bug is still included in the table (not excluded)
+- [ ] Other bugs are unaffected
+- [ ] Verdict is TRIAGED
+
+---
+
+### Case 4: Possible Duplicate Bugs — Flagged in triage output
+
+**Fixture:**
+- `production/bugs/` contains 2 bug reports with similar titles:
+ - bug-2026-03-18-player-fall-through-floor.md
+ - bug-2026-03-20-player-clips-through-floor.md
+ - Both affect the "Physics" system with identical severity
+
+**Input:** `/bug-triage`
+
+**Expected behavior:**
+1. Skill reads both reports and detects similar title + same system + same severity
+2. Both bugs are included in the triage table
+3. Each is tagged with `POSSIBLE DUPLICATE` and cross-references the other report
+4. No bugs are merged or deleted — flagging is advisory
+5. Verdict is TRIAGED
+
+**Assertions:**
+- [ ] Both bugs appear in the table (not merged)
+- [ ] Both are tagged `POSSIBLE DUPLICATE`
+- [ ] Each cross-references the other (by filename or title)
+- [ ] Verdict is TRIAGED
+
+---
+
+### Case 5: Director Gate Check — No gate; triage is advisory
+
+**Fixture:**
+- `production/bugs/` contains any number of reports
+
+**Input:** `/bug-triage`
+
+**Expected behavior:**
+1. Skill produces the triage table
+2. No director agents are spawned
+3. No gate IDs appear in output
+4. No write tool is called
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No write tool is called
+- [ ] No gate skip messages appear
+- [ ] Verdict is TRIAGED without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads all files in `production/bugs/` before generating the table
+- [ ] Sorts by severity (CRITICAL → HIGH → MEDIUM → LOW)
+- [ ] Flags bugs missing repro steps
+- [ ] Flags possible duplicates by title/system similarity
+- [ ] Does not write any files
+- [ ] Verdict is TRIAGED in all cases (even empty)
+
+---
+
+## Coverage Notes
+
+- The case where a bug report is malformed (missing severity field entirely)
+ is not fixture-tested; skill would flag it as `UNKNOWN SEVERITY` and sort it
+ last in the table.
+- Status transitions (marking bugs as resolved) are outside this skill's scope —
+ bug-triage is read-only.
+- The duplicate detection heuristic (title similarity + same system) is
+ approximate; exact matching logic is defined in the skill body.
diff --git a/CCGS Skill Testing Framework/skills/utility/day-one-patch.md b/CCGS Skill Testing Framework/skills/utility/day-one-patch.md
new file mode 100644
index 0000000..f9c4881
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/day-one-patch.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /day-one-patch
+
+## Skill Summary
+
+`/day-one-patch` prepares a day-one patch plan for issues that are known at
+launch but deferred from the v1.0 release. It reads open bug reports in
+`production/bugs/`, deferred acceptance criteria from story files (stories
+marked `Status: Done` but with noted deferred ACs), and produces a prioritized
+patch plan with estimated fix timelines per issue.
+
+The patch plan is written to `production/releases/day-one-patch.md` after a
+"May I write" ask. If a P0 (critical post-ship) issue is discovered, the skill
+triggers guidance to run `/hotfix` before the patch. No director gates apply.
+The verdict is always COMPLETE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before writing the plan
+- [ ] Has a next-step handoff (e.g., `/hotfix` for P0 issues, `/release-checklist` for follow-up)
+
+---
+
+## Director Gate Checks
+
+None. `/day-one-patch` is a release planning utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — 3 Known Issues, Patch Plan With Fix Estimates
+
+**Fixture:**
+- `production/bugs/` contains 3 open bugs with severities: 1 MEDIUM, 2 LOW
+- No deferred ACs in sprint stories
+- All bugs have repro steps and system identifications
+
+**Input:** `/day-one-patch`
+
+**Expected behavior:**
+1. Skill reads all 3 open bugs
+2. Skill assigns fix effort estimates: MEDIUM bug = 1-2 days, LOW bugs = 4 hours each
+3. Skill produces a patch plan prioritizing MEDIUM bug first
+4. Plan includes: priority order, estimated timeline, responsible system, fix description
+5. Skill asks "May I write to `production/releases/day-one-patch.md`?"
+6. File written; verdict is COMPLETE
+
+**Assertions:**
+- [ ] All 3 bugs appear in the plan
+- [ ] Bugs are prioritized by severity (MEDIUM before LOW)
+- [ ] Fix estimates are provided per issue
+- [ ] "May I write" is asked before writing
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Critical Issue Discovered Post-Ship — P0, Triggers /hotfix Guidance
+
+**Fixture:**
+- A CRITICAL severity bug is found in `production/bugs/` after the v1.0 release
+- The bug causes data loss for all save files
+
+**Input:** `/day-one-patch`
+
+**Expected behavior:**
+1. Skill reads bugs and identifies the CRITICAL severity issue
+2. Skill escalates: "P0 ISSUE DETECTED — data loss bug requires immediate hotfix
+ before patch planning can proceed"
+3. Skill does NOT include the P0 issue in the patch plan timeline
+4. Skill explicitly directs: "Run `/hotfix` to resolve this issue first"
+5. After P0 guidance is issued: plan for remaining lower-severity bugs is still
+ generated and written; verdict is COMPLETE
+
+**Assertions:**
+- [ ] P0 escalation message appears prominently before the patch plan
+- [ ] `/hotfix` is explicitly directed for the P0 issue
+- [ ] P0 issue is NOT scheduled in the patch plan timeline (it needs immediate action)
+- [ ] Non-P0 issues are still planned; verdict is COMPLETE
+
+---
+
+### Case 3: Deferred AC From Story-Done — Pulled Into Patch Plan Automatically
+
+**Fixture:**
+- `production/sprints/sprint-008.md` has a story with `Status: Done` and a note:
+ "DEFERRED AC: Gamepad vibration on damage — deferred to post-launch patch"
+- No open bugs for the same system
+
+**Input:** `/day-one-patch`
+
+**Expected behavior:**
+1. Skill reads sprint stories and detects the deferred AC note
+2. Deferred AC is automatically included in the patch plan as a work item
+3. Plan entry: "Deferred from sprint-008: Gamepad vibration on damage"
+4. Fix estimate is assigned; patch plan written after "May I write" approval
+5. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] Deferred ACs from story files are automatically pulled into the plan
+- [ ] Deferred items are labeled by their source story (sprint-008)
+- [ ] Deferred AC gets a fix estimate like bug entries
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: No Known Issues — Empty Plan With Template Note
+
+**Fixture:**
+- `production/bugs/` is empty
+- No stories have deferred ACs
+
+**Input:** `/day-one-patch`
+
+**Expected behavior:**
+1. Skill reads bugs — none found
+2. Skill reads story deferred ACs — none found
+3. Skill produces an empty patch plan with a note: "No known issues at launch"
+4. Template structure is preserved (headers intact) for future use
+5. Skill asks "May I write to `production/releases/day-one-patch.md`?"
+6. File written; verdict is COMPLETE
+
+**Assertions:**
+- [ ] "No known issues at launch" note appears in the written file
+- [ ] Template headers are present in the empty plan
+- [ ] Skill does NOT error out when there are no issues to plan
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; day-one-patch is a planning utility
+
+**Fixture:**
+- Known issues present in production/bugs/
+
+**Input:** `/day-one-patch`
+
+**Expected behavior:**
+1. Skill generates and writes the patch plan
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads open bugs from `production/bugs/` before generating the plan
+- [ ] Scans story files for deferred AC notes
+- [ ] Escalates CRITICAL (P0) bugs with explicit `/hotfix` guidance
+- [ ] Produces an empty plan with note when no issues exist (not an error)
+- [ ] Asks "May I write to `production/releases/day-one-patch.md`?" before writing
+- [ ] Verdict is COMPLETE in all paths
+
+---
+
+## Coverage Notes
+
+- The case where multiple CRITICAL bugs exist is handled the same as Case 2;
+ all P0 issues are escalated together.
+- Timeline estimation for the patch (e.g., "patch available in 3 days")
+ requires manual QA and build time estimates; the skill uses rough estimates
+ based on severity, not actual team velocity.
+- The patch notes player communication document (`/patch-notes`) is a separate
+ skill invoked after the patch plan is executed.
diff --git a/CCGS Skill Testing Framework/skills/utility/help.md b/CCGS Skill Testing Framework/skills/utility/help.md
new file mode 100644
index 0000000..b7d127f
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/help.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /help
+
+## Skill Summary
+
+`/help` analyzes what has been done and what comes next in the project workflow.
+It runs on the Haiku model (read-only, formatting task) and reads `production/stage.txt`,
+the active sprint file, and recent session state to produce a concise situational
+guidance summary. The skill optionally accepts a context query (e.g., `/help testing`)
+to surface relevant skills for a specific topic.
+
+The output is always informational — no files are written and no director gates
+are invoked. The verdict is always HELP COMPLETE. The skill serves as a workflow
+navigator, suggesting 2-3 next skills based on the current project state.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: HELP COMPLETE
+- [ ] Does NOT contain "May I write" language (skill is read-only)
+- [ ] Has a next-step handoff (suggests 2-3 relevant skills based on state)
+
+---
+
+## Director Gate Checks
+
+None. `/help` is a read-only navigation skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Production stage with active sprint
+
+**Fixture:**
+- `production/stage.txt` contains `Production`
+- `production/sprints/sprint-004.md` exists with in-progress stories
+- `production/session-state/active.md` has a recent checkpoint
+
+**Input:** `/help`
+
+**Expected behavior:**
+1. Skill reads stage.txt and active sprint
+2. Skill identifies current sprint number and in-progress story count
+3. Skill outputs: current stage, sprint summary, and 3 suggested next skills
+ (e.g., `/sprint-status`, `/dev-story`, `/story-done`)
+4. Suggestions are ranked by relevance to current sprint state
+5. Verdict is HELP COMPLETE
+
+**Assertions:**
+- [ ] Current stage is shown (Production)
+- [ ] Active sprint number and story count are mentioned
+- [ ] Exactly 2-3 next-skill suggestions are given (not a list of all skills)
+- [ ] Suggestions are appropriate for Production stage
+- [ ] Verdict is HELP COMPLETE
+- [ ] No files are written
+
+---
+
+### Case 2: Concept Stage — Shows concept-to-systems-design workflow path
+
+**Fixture:**
+- `production/stage.txt` contains `Concept`
+- No sprint files, no GDD files
+- `technical-preferences.md` is configured (engine selected)
+
+**Input:** `/help`
+
+**Expected behavior:**
+1. Skill reads stage.txt — detects Concept stage
+2. Skill outputs the Concept-stage workflow: brainstorm → map-systems → design-system
+3. Suggested skills are: `/brainstorm`, `/map-systems` (if concept exists)
+4. Current progress is noted: "Engine configured, concept not yet created"
+
+**Assertions:**
+- [ ] Stage is identified as Concept
+- [ ] Workflow path shows the expected sequence for this stage
+- [ ] Suggestions do not include Production-stage skills (e.g., `/dev-story`)
+- [ ] Verdict is HELP COMPLETE
+
+---
+
+### Case 3: No stage.txt — Shows full workflow overview
+
+**Fixture:**
+- No `production/stage.txt`
+- No sprint files
+- `technical-preferences.md` has placeholders
+
+**Input:** `/help`
+
+**Expected behavior:**
+1. Skill cannot determine stage from stage.txt
+2. Skill runs project-stage-detect logic to infer stage from artifacts
+3. If stage cannot be inferred: outputs the full workflow overview from
+ Concept through Release as a reference map
+4. Primary suggestion is `/start` to begin configuration
+
+**Assertions:**
+- [ ] Skill does not crash when stage.txt is absent
+- [ ] Full workflow overview is shown when stage cannot be determined
+- [ ] `/start` or `/project-stage-detect` is a top suggestion
+- [ ] Verdict is HELP COMPLETE
+
+---
+
+### Case 4: Context Query — User asks for help with testing
+
+**Fixture:**
+- `production/stage.txt` contains `Production`
+- Active sprint has a story with `Status: In Review`
+
+**Input:** `/help testing`
+
+**Expected behavior:**
+1. Skill reads context query: "testing"
+2. Skill surfaces skills relevant to testing: `/qa-plan`, `/smoke-check`,
+ `/regression-suite`, `/test-setup`, `/test-evidence-review`
+3. Output is focused on testing workflow, not general sprint navigation
+4. Currently in-review story is highlighted as a testing candidate
+
+**Assertions:**
+- [ ] Context query is acknowledged in output ("Help topic: testing")
+- [ ] At least 3 testing-relevant skills are listed
+- [ ] General sprint skills (e.g., `/sprint-plan`) are not the primary suggestions
+- [ ] Verdict is HELP COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; help is read-only navigation
+
+**Fixture:**
+- Any project state
+
+**Input:** `/help`
+
+**Expected behavior:**
+1. Skill produces workflow guidance summary
+2. No director agents are spawned
+3. No gate IDs appear in output
+4. No write tool is called
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No write tool is called
+- [ ] No gate skip messages appear
+- [ ] Verdict is HELP COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads stage, sprint, and session state before generating suggestions
+- [ ] Suggestions are specific to the current project state (not generic)
+- [ ] Context query (if provided) narrows the suggestion set
+- [ ] Does not write any files
+- [ ] Verdict is HELP COMPLETE in all cases
+
+---
+
+## Coverage Notes
+
+- The case where the active sprint is complete (all stories Done) is not
+ separately tested; the skill would suggest `/sprint-plan` for the next sprint.
+- The `/help` skill does not validate whether suggested skills are available —
+ it assumes standard skill catalog availability.
+- Stage detection fallback (when stage.txt is absent) delegates to the same
+ logic as `/project-stage-detect` and is not re-tested here in detail.
diff --git a/CCGS Skill Testing Framework/skills/utility/hotfix.md b/CCGS Skill Testing Framework/skills/utility/hotfix.md
new file mode 100644
index 0000000..27d5df3
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/hotfix.md
@@ -0,0 +1,173 @@
+# Skill Test Spec: /hotfix
+
+## Skill Summary
+
+`/hotfix` manages an emergency fix workflow: it creates a hotfix branch from
+main, applies a targeted fix to the identified file(s), runs `/smoke-check` to
+validate the fix doesn't introduce regressions, and prompts the user to confirm
+merge back to main. Each code change requires a "May I write to [filepath]?" ask.
+Git operations (branch creation, merge) are presented as Bash commands for user
+confirmation before execution.
+
+The skill is time-sensitive — director review is optional post-hoc, not a
+blocking gate. Verdicts: HOTFIX COMPLETE (fix applied, smoke check passed, merged)
+or HOTFIX BLOCKED (fix introduced regression or user declined).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: HOTFIX COMPLETE, HOTFIX BLOCKED
+- [ ] Contains "May I write" language for code changes
+- [ ] Has a next-step handoff (e.g., `/bug-report` to document the issue, or version bump)
+
+---
+
+## Director Gate Checks
+
+None. Hotfixes are time-critical. Director review may follow separately as a
+post-hoc step. No gate is invoked within this skill.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Critical crash bug fixed, smoke check passes
+
+**Fixture:**
+- `main` branch is clean
+- Bug is identified in `src/gameplay/arena.gd` (crash on boss arena entry)
+- Repro steps are provided by user
+
+**Input:** `/hotfix` (user describes the crash and affected file)
+
+**Expected behavior:**
+1. Skill proposes creating a hotfix branch: `hotfix/boss-arena-crash`
+2. User confirms; Bash command for branch creation is shown and confirmed
+3. Skill identifies the fix location in `arena.gd` and drafts the change
+4. Skill asks "May I write to `src/gameplay/arena.gd`?" and applies fix on approval
+5. Skill runs `/smoke-check` — PASS
+6. Skill presents the merge command and asks user to confirm merge to `main`
+7. User confirms; merge executes; verdict is HOTFIX COMPLETE
+
+**Assertions:**
+- [ ] Hotfix branch is created before any code changes
+- [ ] "May I write" is asked before modifying any source file
+- [ ] `/smoke-check` runs after the fix is applied
+- [ ] Merge requires explicit user confirmation (not automatic)
+- [ ] Verdict is HOTFIX COMPLETE after successful merge
+
+---
+
+### Case 2: Smoke Check Fails — HOTFIX BLOCKED
+
+**Fixture:**
+- Fix has been applied to `src/gameplay/arena.gd`
+- `/smoke-check` returns FAIL: "Player health clamping regression detected"
+
+**Input:** `/hotfix`
+
+**Expected behavior:**
+1. Skill applies the fix and runs `/smoke-check`
+2. Smoke check returns FAIL with specific regression identified
+3. Skill reports: "HOTFIX BLOCKED — smoke check failed: [regression detail]"
+4. Skill presents options: attempt revised fix, revert changes, or merge with
+ known regression (user acknowledges risk)
+5. No automatic merge occurs when smoke check fails
+
+**Assertions:**
+- [ ] Verdict is HOTFIX BLOCKED
+- [ ] Smoke check failure is shown verbatim to user
+- [ ] Merge is NOT performed automatically when smoke check fails
+- [ ] User is given explicit options for how to proceed
+
+---
+
+### Case 3: Fix to Already-Released Build — Version tag noted, patch bump prompted
+
+**Fixture:**
+- Latest git tag is `v1.2.0`
+- Hotfix targets a bug in the v1.2.0 release
+
+**Input:** `/hotfix`
+
+**Expected behavior:**
+1. Skill detects that the current HEAD is a tagged release (v1.2.0)
+2. Skill notes: "Hotfix targeting tagged release v1.2.0"
+3. After smoke check passes, skill prompts: "Should version be bumped to v1.2.1?"
+4. If user confirms version bump: skill asks "May I write to VERSION or equivalent?"
+5. After version update and merge: verdict is HOTFIX COMPLETE with version noted
+
+**Assertions:**
+- [ ] Version tag context is detected and surfaced to user
+- [ ] Patch version bump is suggested (not required) after merge
+- [ ] Version bump requires its own "May I write" confirmation
+- [ ] Verdict is HOTFIX COMPLETE
+
+---
+
+### Case 4: No Repro Steps — Skill Asks Before Applying Fix
+
+**Fixture:**
+- User invokes `/hotfix` with a vague description: "something is broken on level 3"
+- No repro steps provided
+
+**Input:** `/hotfix` (vague description)
+
+**Expected behavior:**
+1. Skill detects insufficient information to identify the fix location
+2. Skill asks: "Please provide reproduction steps and the affected file or system"
+3. Skill does NOT create a branch or modify any file until repro steps are provided
+4. After user provides repro steps: normal hotfix flow begins
+
+**Assertions:**
+- [ ] No branch is created without repro steps
+- [ ] No code changes are made without a clearly identified fix location
+- [ ] Repro step request is specific (not a generic "please provide more info")
+- [ ] Normal hotfix flow resumes after user provides repro steps
+
+---
+
+### Case 5: Director Gate Check — No gate; hotfixes are time-critical
+
+**Fixture:**
+- Critical bug with repro steps identified
+
+**Input:** `/hotfix`
+
+**Expected behavior:**
+1. Skill completes the hotfix workflow
+2. No director agents are spawned during execution
+3. No gate IDs appear in output
+4. Post-hoc director review (if needed) is a manual follow-up, not invoked here
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is HOTFIX COMPLETE or HOTFIX BLOCKED — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Creates hotfix branch before making any code changes
+- [ ] Asks "May I write" before modifying any source files
+- [ ] Runs `/smoke-check` after applying the fix
+- [ ] Requires explicit user confirmation before merging
+- [ ] HOTFIX BLOCKED when smoke check fails — no automatic merge
+- [ ] Verdict is HOTFIX COMPLETE or HOTFIX BLOCKED
+
+---
+
+## Coverage Notes
+
+- The case where multiple files need to be modified for one fix follows the same
+ "May I write" per-file pattern and is not separately tested.
+- The post-hotfix steps (create bug report, update changelog) are suggested in
+ the handoff but not tested as part of this skill's execution.
+- Conflict resolution during the merge (if main has diverged) is not tested;
+ the skill would surface the conflict and ask the user to resolve it manually.
diff --git a/CCGS Skill Testing Framework/skills/utility/launch-checklist.md b/CCGS Skill Testing Framework/skills/utility/launch-checklist.md
new file mode 100644
index 0000000..0063495
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/launch-checklist.md
@@ -0,0 +1,180 @@
+# Skill Test Spec: /launch-checklist
+
+## Skill Summary
+
+`/launch-checklist` generates and evaluates a complete launch readiness checklist
+covering: legal compliance (EULA, privacy policy, ESRB/PEGI ratings), platform
+certification status, store page completeness (screenshots, description, metadata),
+build validation (version tag, reproducible build), analytics and crash reporting
+configuration, and first-run experience verification.
+
+The skill produces a checklist report written to `production/launch/launch-checklist-[date].md`
+after a "May I write" ask. If a previous launch checklist exists, it compares the
+new results against the old to highlight newly resolved and newly blocked items. No
+director gates apply — `/team-release` orchestrates the full release pipeline. Verdicts:
+LAUNCH READY, LAUNCH BLOCKED, or CONCERNS.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: LAUNCH READY, LAUNCH BLOCKED, CONCERNS
+- [ ] Contains "May I write" collaborative protocol language before writing the checklist
+- [ ] Has a next-step handoff (e.g., `/team-release` or `/day-one-patch`)
+
+---
+
+## Director Gate Checks
+
+None. `/launch-checklist` is a readiness audit utility. The full release pipeline
+is managed by `/team-release`.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All Checklist Items Verified, LAUNCH READY
+
+**Fixture:**
+- Legal docs present: EULA, privacy policy in `production/legal/`
+- Platform certification: marked as submitted and approved in production notes
+- Store page assets: screenshots, description, metadata all present in `production/store/`
+- Build: version tag `v1.0.0` exists, reproducible build confirmed
+- Crash reporting: configured in `technical-preferences.md`
+
+**Input:** `/launch-checklist`
+
+**Expected behavior:**
+1. Skill checks all checklist categories
+2. All items pass their verification checks
+3. Skill produces checklist report with all items marked PASS
+4. Skill asks "May I write to `production/launch/launch-checklist-2026-04-06.md`?"
+5. Report written on approval; verdict is LAUNCH READY
+
+**Assertions:**
+- [ ] All checklist categories are checked (legal, platform, store, build, analytics, UX)
+- [ ] All items appear in the report with PASS markers
+- [ ] Verdict is LAUNCH READY
+- [ ] "May I write" is asked with the correct dated filename
+
+---
+
+### Case 2: Platform Certification Not Submitted — LAUNCH BLOCKED
+
+**Fixture:**
+- All other checklist items pass
+- Platform certification section: "not submitted" (no submission record found)
+
+**Input:** `/launch-checklist`
+
+**Expected behavior:**
+1. Skill checks all items
+2. Platform certification check fails: no submission record
+3. Skill reports: "LAUNCH BLOCKED — Platform certification not submitted"
+4. Specific platform(s) missing certification are named
+5. Verdict is LAUNCH BLOCKED
+
+**Assertions:**
+- [ ] Verdict is LAUNCH BLOCKED (not CONCERNS)
+- [ ] Platform certification is identified as the blocking item
+- [ ] Missing platform names are specified
+- [ ] All other passing items are still shown in the report
+
+---
+
+### Case 3: Manual Check Required — CONCERNS Verdict
+
+**Fixture:**
+- All critical checklist items pass
+- First-run experience item: "MANUAL CHECK NEEDED — human must play the first 5
+ minutes and verify tutorial completion flow"
+- Store screenshots item: "MANUAL CHECK NEEDED — art team must verify screenshot
+ quality matches current build"
+
+**Input:** `/launch-checklist`
+
+**Expected behavior:**
+1. Skill checks all items
+2. 2 items are flagged as requiring human verification
+3. Skill reports: "CONCERNS — 2 items require manual verification before launch"
+4. Both items are listed with instructions for what to manually verify
+5. Verdict is CONCERNS (not LAUNCH BLOCKED, since these are advisory)
+
+**Assertions:**
+- [ ] Verdict is CONCERNS (not LAUNCH READY or LAUNCH BLOCKED)
+- [ ] Both manual check items are listed with verification instructions
+- [ ] Skill does not auto-block on MANUAL CHECK items
+
+---
+
+### Case 4: Previous Checklist Exists — Delta Comparison
+
+**Fixture:**
+- `production/launch/launch-checklist-2026-03-25.md` exists with previous results:
+ - 2 items were BLOCKED (platform cert, crash reporting)
+ - 1 item had a MANUAL CHECK
+- New checklist: platform cert is now PASS, crash reporting is now PASS,
+ manual check still open; 1 new item flagged (EULA last updated date)
+
+**Input:** `/launch-checklist`
+
+**Expected behavior:**
+1. Skill finds the previous checklist and loads it for comparison
+2. Skill produces the new checklist and compares:
+ - Newly resolved: "Platform cert — was BLOCKED, now PASS"
+ - Newly resolved: "Crash reporting — was BLOCKED, now PASS"
+ - Still open: manual check (unchanged)
+ - New issue: EULA last updated date (not in previous checklist)
+3. Delta is shown prominently in the report
+4. Verdict is CONCERNS (manual check + new EULA question)
+
+**Assertions:**
+- [ ] Delta section shows newly resolved items
+- [ ] Delta section shows new issues (not present in previous checklist)
+- [ ] Still-open items from the previous checklist are noted as persistent
+- [ ] Verdict reflects the current state (not the previous state)
+
+---
+
+### Case 5: Director Gate Check — No gate; launch-checklist is an audit utility
+
+**Fixture:**
+- All checklist dependencies present
+
+**Input:** `/launch-checklist`
+
+**Expected behavior:**
+1. Skill runs the full checklist and writes the report
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is LAUNCH READY, LAUNCH BLOCKED, or CONCERNS — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Checks all required categories (legal, platform, store, build, analytics, UX)
+- [ ] LAUNCH BLOCKED for hard failures (uncompleted certifications, missing legal docs)
+- [ ] CONCERNS for advisory items requiring manual verification
+- [ ] Compares against previous checklist when one exists
+- [ ] Asks "May I write" before creating the checklist report
+- [ ] Verdict is LAUNCH READY, LAUNCH BLOCKED, or CONCERNS
+
+---
+
+## Coverage Notes
+
+- Region-specific compliance (GDPR data handling, COPPA for under-13 audiences)
+ is checked but the specific requirements are not enumerated in test assertions.
+- The store page completeness check (screenshots, description) relies on the
+ presence of files in `production/store/`; it cannot verify visual quality.
+- Build reproducibility check validates the presence of a version tag and build
+ configuration but does not execute the build process.
diff --git a/CCGS Skill Testing Framework/skills/utility/localize.md b/CCGS Skill Testing Framework/skills/utility/localize.md
new file mode 100644
index 0000000..853220b
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/localize.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /localize
+
+## Skill Summary
+
+`/localize` manages the full localization pipeline: it extracts all player-facing
+strings from source files, manages translation files in `assets/localization/`,
+and validates completeness across all locale files. For new languages, it creates
+a locale file skeleton with all current strings as keys and empty values. For
+existing locale files, it produces a diff showing additions, removals, and
+changed keys.
+
+Translation files are written to `assets/localization/[locale-code].csv` (or
+engine-appropriate format) after a "May I write" ask. No director gates apply.
+Verdicts: LOCALIZATION COMPLETE (all locales are complete) or GAPS FOUND (at
+least one locale is missing string keys).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: LOCALIZATION COMPLETE, GAPS FOUND
+- [ ] Contains "May I write" collaborative protocol language before writing locale files
+- [ ] Has a next-step handoff (e.g., send locale skeletons to translators)
+
+---
+
+## Director Gate Checks
+
+None. `/localize` is a pipeline utility. No director gates apply. Localization
+lead agent may review separately but is not invoked within this skill.
+
+---
+
+## Test Cases
+
+### Case 1: New Language — String Extraction and Locale Skeleton Created
+
+**Fixture:**
+- Source code in `src/` contains player-facing strings (UI text, tutorial messages)
+- Existing locale: `assets/localization/en.csv`
+- No French locale exists
+
+**Input:** `/localize fr`
+
+**Expected behavior:**
+1. Skill extracts all player-facing strings from source files
+2. Skill finds the same strings in `en.csv` as a reference
+3. Skill generates `fr.csv` skeleton with all string keys and empty values
+4. Skill asks "May I write to `assets/localization/fr.csv`?"
+5. File written on approval; verdict is GAPS FOUND (file created but empty values)
+6. Skill notes: "fr.csv created — send to translator to fill values"
+
+**Assertions:**
+- [ ] All string keys from `en.csv` are present in `fr.csv`
+- [ ] All values in `fr.csv` are empty (not copied from English)
+- [ ] "May I write" is asked before creating the file
+- [ ] Verdict is GAPS FOUND (file is created but untranslated)
+
+---
+
+### Case 2: Existing Locale Diff — Additions, Removals, and Changes Listed
+
+**Fixture:**
+- `assets/localization/fr.csv` exists with 20 string keys translated
+- Source code has changed: 3 new strings added, 1 string removed, 2 strings
+ with changed English source text
+
+**Input:** `/localize fr`
+
+**Expected behavior:**
+1. Skill extracts current strings from source
+2. Skill diffs against existing `fr.csv`
+3. Skill produces diff report:
+ - 3 new keys (need translation — listed as empty in fr.csv)
+ - 1 removed key (marked as obsolete — suggest removal)
+ - 2 changed keys (English source changed — French may need update, flagged)
+4. Skill asks "May I update `assets/localization/fr.csv`?"
+5. File updated with new empty keys added, obsolete keys marked; verdict is GAPS FOUND
+
+**Assertions:**
+- [ ] New keys appear as empty in the updated file (not auto-translated)
+- [ ] Removed keys are flagged as obsolete (not silently deleted)
+- [ ] Changed source strings are flagged for translator review
+- [ ] Verdict is GAPS FOUND (new empty keys exist)
+
+---
+
+### Case 3: String Missing in One Locale — GAPS FOUND With Missing Key List
+
+**Fixture:**
+- 3 locale files exist: `en.csv`, `fr.csv`, `de.csv`
+- `de.csv` is missing 4 keys that exist in both `en.csv` and `fr.csv`
+
+**Input:** `/localize`
+
+**Expected behavior:**
+1. Skill reads all 3 locale files and cross-references keys
+2. `de.csv` is missing 4 keys
+3. Skill produces GAPS FOUND report listing the 4 missing keys by locale:
+ "de.csv missing: [key1], [key2], [key3], [key4]"
+4. Skill offers to add the missing keys as empty values to `de.csv`
+5. After approval: file updated; verdict remains GAPS FOUND (values still empty)
+
+**Assertions:**
+- [ ] Missing keys are listed explicitly (not just a count)
+- [ ] Missing keys are attributed to the specific locale file
+- [ ] Verdict is GAPS FOUND (not LOCALIZATION COMPLETE)
+- [ ] Missing keys are added as empty (not auto-translated from English)
+
+---
+
+### Case 4: Translation File Has Syntax Error — Error With Line Reference
+
+**Fixture:**
+- `assets/localization/fr.csv` has a malformed line at line 47
+ (missing quote closure)
+
+**Input:** `/localize fr`
+
+**Expected behavior:**
+1. Skill reads `fr.csv` and encounters a parse error at line 47
+2. Skill outputs: "Parse error in fr.csv at line 47: [error detail]"
+3. Skill cannot diff or validate the file until the error is fixed
+4. Skill does NOT attempt to overwrite or auto-fix the malformed file
+5. Skill suggests fixing the file manually and re-running `/localize`
+
+**Assertions:**
+- [ ] Error message includes line number (line 47)
+- [ ] Error detail describes the nature of the parse error
+- [ ] Skill does NOT overwrite or modify the malformed file
+- [ ] Manual fix + re-run is suggested as remediation
+
+---
+
+### Case 5: Director Gate Check — No gate; localization is a pipeline utility
+
+**Fixture:**
+- Source code with player-facing strings
+
+**Input:** `/localize fr`
+
+**Expected behavior:**
+1. Skill extracts strings and manages locale files
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is LOCALIZATION COMPLETE or GAPS FOUND — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Extracts strings from source before operating on locale files
+- [ ] Creates new locale files with all keys as empty values (not auto-translated)
+- [ ] Diffs existing locale files against current source strings
+- [ ] Flags missing keys by locale and by key name
+- [ ] Asks "May I write" before creating or updating any locale file
+- [ ] Verdict is LOCALIZATION COMPLETE (all locales fully translated) or GAPS FOUND
+
+---
+
+## Coverage Notes
+
+- LOCALIZATION COMPLETE is only achievable when all locale files have all keys
+ with non-empty values; new-language skeleton creation always results in GAPS FOUND.
+- Engine-specific locale formats (Godot `.translation`, Unity `.po` files) are
+ handled by the skill body; `.csv` is used as the canonical format in tests.
+- The case where source strings change at a very high rate (continuous integration
+ of new UI text) is not tested; the diff logic handles this case.
diff --git a/CCGS Skill Testing Framework/skills/utility/onboard.md b/CCGS Skill Testing Framework/skills/utility/onboard.md
new file mode 100644
index 0000000..3c1c3f4
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/onboard.md
@@ -0,0 +1,179 @@
+# Skill Test Spec: /onboard
+
+## Skill Summary
+
+`/onboard` generates a contextual project onboarding summary tailored for a new
+team member. It reads CLAUDE.md, `technical-preferences.md`, the active sprint
+file, recent git commits, and `production/stage.txt` to produce a structured
+orientation document. The skill runs on the Haiku model (read-only, formatting
+task) and produces no file writes — all output is conversational.
+
+The skill optionally accepts a role argument (e.g., `/onboard artist`) to tailor
+the summary to a specific discipline. When the project is in an early stage or
+unconfigured, the output adapts to reflect what little is known. The verdict is
+always ONBOARDING COMPLETE — the skill is purely informational.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: ONBOARDING COMPLETE
+- [ ] Does NOT contain "May I write" language (skill is read-only)
+- [ ] Has a next-step handoff suggesting a relevant follow-on skill
+
+---
+
+## Director Gate Checks
+
+None. `/onboard` is a read-only orientation skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Configured project in Production stage with active sprint
+
+**Fixture:**
+- `production/stage.txt` contains `Production`
+- `technical-preferences.md` has engine, language, and specialists populated
+- `production/sprints/sprint-005.md` exists with stories in progress
+- Git log contains 5 recent commits
+
+**Input:** `/onboard`
+
+**Expected behavior:**
+1. Skill reads stage.txt, technical-preferences.md, active sprint, and git log
+2. Skill produces an onboarding summary with sections: Project Overview, Tech Stack,
+ Current Stage, Active Sprint Summary, Recent Activity
+3. Summary is formatted for readability (headers, bullet points)
+4. Next-step suggestions are appropriate for Production stage (e.g., `/sprint-status`,
+ `/dev-story`)
+5. Verdict ONBOARDING COMPLETE is stated
+
+**Assertions:**
+- [ ] Output includes current stage name from stage.txt
+- [ ] Output includes engine and language from technical-preferences.md
+- [ ] Active sprint stories are summarized (not just the sprint file name)
+- [ ] Recent commit context is present
+- [ ] Verdict is ONBOARDING COMPLETE
+- [ ] No files are written
+
+---
+
+### Case 2: Fresh Project — No engine, no sprint, suggests /start
+
+**Fixture:**
+- `technical-preferences.md` contains only placeholders (`[TO BE CONFIGURED]`)
+- No `production/stage.txt`
+- No sprint files
+- No CLAUDE.md overrides beyond defaults
+
+**Input:** `/onboard`
+
+**Expected behavior:**
+1. Skill reads all config files and detects unconfigured state
+2. Skill produces a minimal summary: "This project has not been configured yet"
+3. Output explains the onboarding workflow: `/start` → `/setup-engine` → `/brainstorm`
+4. Skill suggests running `/start` as the immediate next step
+5. Verdict is ONBOARDING COMPLETE (informational, not a failure)
+
+**Assertions:**
+- [ ] Output explicitly mentions the project is not yet configured
+- [ ] `/start` is recommended as the next step
+- [ ] Skill does NOT error out — it gracefully handles an empty project state
+- [ ] Verdict is still ONBOARDING COMPLETE
+
+---
+
+### Case 3: No CLAUDE.md Found — Error with remediation
+
+**Fixture:**
+- `CLAUDE.md` file does not exist (deleted or never created)
+- All other files may or may not exist
+
+**Input:** `/onboard`
+
+**Expected behavior:**
+1. Skill attempts to read CLAUDE.md and fails
+2. Skill outputs an error: "CLAUDE.md not found — cannot generate onboarding summary"
+3. Skill provides remediation: "Run `/start` to initialize the project configuration"
+4. No partial summary is generated
+
+**Assertions:**
+- [ ] Error message clearly identifies the missing file as CLAUDE.md
+- [ ] Remediation step (`/start`) is explicitly named
+- [ ] Skill does NOT produce a partial output when the root config is missing
+- [ ] Verdict is ONBOARDING COMPLETE (with error context, not a crash)
+
+---
+
+### Case 4: Role-Specific Onboarding — User specifies "artist" role
+
+**Fixture:**
+- Fully configured project in Production stage
+- `art-bible.md` exists in `design/`
+- Active sprint has visual story types (animation, VFX)
+
+**Input:** `/onboard artist`
+
+**Expected behavior:**
+1. Skill reads all standard files plus any art-relevant docs (art bible, asset specs)
+2. Summary is tailored to the artist role: art bible overview, asset pipeline,
+ current visual stories in the active sprint
+3. Technical architecture details (code structure, ADRs) are de-emphasized
+4. Specialist agents for art/audio are highlighted in the summary
+5. Verdict is ONBOARDING COMPLETE
+
+**Assertions:**
+- [ ] Role argument is acknowledged in the output ("Onboarding for: Artist")
+- [ ] Art bible summary is included if the file exists
+- [ ] Current visual stories from the active sprint are shown
+- [ ] Technical implementation details are not the primary focus
+- [ ] Verdict is ONBOARDING COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; onboard is read-only orientation
+
+**Fixture:**
+- Any configured project state
+
+**Input:** `/onboard`
+
+**Expected behavior:**
+1. Skill completes the full onboarding summary
+2. No director agents are spawned at any point
+3. No gate IDs appear in the output
+4. No "May I write" prompts appear
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No write tool is called
+- [ ] No gate skip messages appear
+- [ ] Verdict is ONBOARDING COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads all source files before generating output (no hallucinated project state)
+- [ ] Adapts output to project stage (Production ≠ Concept)
+- [ ] Respects role argument when provided
+- [ ] Does not write any files
+- [ ] Ends with ONBOARDING COMPLETE verdict in all paths
+
+---
+
+## Coverage Notes
+
+- The case where `technical-preferences.md` is missing entirely (as opposed to
+ having placeholders) is not separately tested; behavior follows the graceful
+ error pattern of Case 3.
+- Git history reading is assumed available; offline/no-git scenarios are not
+ tested here.
+- Discipline roles beyond "artist" (e.g., programmer, designer, producer) follow
+ the same tailoring pattern as Case 4 and are not separately tested.
diff --git a/CCGS Skill Testing Framework/skills/utility/playtest-report.md b/CCGS Skill Testing Framework/skills/utility/playtest-report.md
new file mode 100644
index 0000000..9a11b05
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/playtest-report.md
@@ -0,0 +1,178 @@
+# Skill Test Spec: /playtest-report
+
+## Skill Summary
+
+`/playtest-report` generates a structured playtest report from session notes or
+user input. The report is organized into four sections: Feel/Accessibility,
+Bugs Observed, Design Feedback, and Next Steps. When multiple testers participated,
+the skill aggregates feedback and distinguishes majority opinions from minority
+ones. The skill links to existing bug reports when a reported bug matches a file
+in `production/bugs/`.
+
+Reports are written to `production/qa/playtest-[date].md` after a "May I write"
+ask. No director gates apply here — the CD-PLAYTEST director gate (if needed) is
+a separate invocation. The verdict is COMPLETE when the report is written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before writing the report
+- [ ] Has a next-step handoff (e.g., `/bug-report` for new issues found, `/design-review` for feedback)
+
+---
+
+## Director Gate Checks
+
+None. `/playtest-report` is a documentation utility. The CD-PLAYTEST gate is a
+separate invocation and not part of this skill.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — User provides playtest notes, structured report produced
+
+**Fixture:**
+- User provides typed playtest notes from a single session
+- Notes cover: game feel, one bug (framerate drop), and a design concern
+ (tutorial too long)
+- `production/bugs/` exists but is empty (bug not yet reported)
+
+**Input:** `/playtest-report` (user pastes session notes)
+
+**Expected behavior:**
+1. Skill reads the provided notes and structures them into the 4-section template
+2. Feel/Accessibility: extracts feel observations
+3. Bugs: notes the framerate drop with available repro details
+4. Design Feedback: notes the tutorial length concern
+5. Next Steps: suggests `/bug-report` for the framerate issue and `/design-review`
+ for the tutorial feedback
+6. Skill asks "May I write to `production/qa/playtest-2026-04-06.md`?"
+7. Report is written on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] All 4 sections are present in the report
+- [ ] Bug is listed in the Bugs section (not the Design Feedback section)
+- [ ] Next Steps are appropriate (bug report for crash, design review for feedback)
+- [ ] "May I write" is asked before writing
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Empty Input — Guided prompting through each section
+
+**Fixture:**
+- No notes provided by user at invocation
+
+**Input:** `/playtest-report`
+
+**Expected behavior:**
+1. Skill detects empty input
+2. Skill prompts through each section:
+ a. "Describe the overall feel and any accessibility observations"
+ b. "Were any bugs observed? Describe them"
+ c. "What design feedback did testers provide?"
+3. User answers each prompt
+4. Skill compiles report from answers and asks "May I write"
+5. Report written on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] At least 3 guiding questions are asked (one per main section)
+- [ ] Report is not created until all sections have input (or user explicitly skips one)
+- [ ] Verdict is COMPLETE after file is written
+
+---
+
+### Case 3: Multiple Testers — Aggregated feedback with majority/minority notes
+
+**Fixture:**
+- User provides notes from 3 testers
+- 2/3 testers found the controls "intuitive"
+- 1/3 tester found the UI font too small
+- All 3 noted the same bug (player stuck on ledge)
+
+**Input:** `/playtest-report` (3-tester session)
+
+**Expected behavior:**
+1. Skill identifies 3 distinct tester perspectives in the input
+2. Control intuitiveness → noted as "Majority (2/3): controls intuitive"
+3. Font size → noted as "Minority (1/3): UI font size concern"
+4. Stuck-on-ledge bug → noted as "All testers: player stuck on ledge (confirmed)"
+5. Skill generates aggregated report with majority/minority labels
+6. Report written after "May I write" approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Majority opinion (2/3) is labeled as majority
+- [ ] Minority opinion (1/3) is labeled as minority
+- [ ] Unanimously reported bug is noted as confirmed by all testers
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: Bug Matches Existing Report — Links to existing file
+
+**Fixture:**
+- `production/bugs/bug-2026-03-30-player-stuck-ledge.md` exists
+- User's playtest notes describe "player gets stuck on ledges near walls"
+
+**Input:** `/playtest-report`
+
+**Expected behavior:**
+1. Skill structures the report and identifies the stuck-on-ledge bug
+2. Skill scans `production/bugs/` and finds `bug-2026-03-30-player-stuck-ledge.md`
+3. In the Bugs section, the report includes: "See existing report:
+ production/bugs/bug-2026-03-30-player-stuck-ledge.md"
+4. Skill does NOT suggest creating a new bug report for this issue
+5. Report written; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Existing bug report is found and linked in the playtest report
+- [ ] `/bug-report` is NOT suggested for the already-reported issue
+- [ ] Cross-reference to existing file appears in the Bugs section
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; CD-PLAYTEST is a separate invocation
+
+**Fixture:**
+- Playtest notes provided
+
+**Input:** `/playtest-report`
+
+**Expected behavior:**
+1. Skill generates and writes the playtest report
+2. No director agents are spawned (CD-PLAYTEST is not invoked here)
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No CD-PLAYTEST gate skip message appears
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Structures output into all 4 sections (Feel, Bugs, Design Feedback, Next Steps)
+- [ ] Labels majority vs. minority opinions when multiple testers are involved
+- [ ] Cross-references existing bug reports when bugs match
+- [ ] Asks "May I write to `production/qa/playtest-[date].md`?" before writing
+- [ ] Verdict is COMPLETE when report is written
+
+---
+
+## Coverage Notes
+
+- The CD-PLAYTEST director gate (creative director reviews playtest insights
+ for design implications) is a separate invocation and is not tested here.
+- Video recording or screenshot attachments are not tested; the report is a
+ text-only document.
+- The case where a tester's identity is unknown (anonymous feedback) follows
+ the same aggregation pattern as Case 3 without tester labels.
diff --git a/CCGS Skill Testing Framework/skills/utility/project-stage-detect.md b/CCGS Skill Testing Framework/skills/utility/project-stage-detect.md
new file mode 100644
index 0000000..b5c4575
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/project-stage-detect.md
@@ -0,0 +1,183 @@
+# Skill Test Spec: /project-stage-detect
+
+## Skill Summary
+
+`/project-stage-detect` automatically analyzes project artifacts to determine
+the current development stage. It runs on the Haiku model (read-only) and
+examines `production/stage.txt` (if present), design documents in `design/`,
+source code in `src/`, sprint and milestone files in `production/`, and the
+presence of engine configuration to classify the project into one of seven
+stages: Concept, Systems Design, Technical Setup, Pre-Production, Production,
+Polish, or Release.
+
+The skill is advisory — it never writes `stage.txt`. That file is only updated
+when `/gate-check` passes and the user confirms advancement. The skill reports
+its confidence level (HIGH if stage.txt was read directly, MEDIUM if inferred
+from artifacts, LOW if conflicting signals were found).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains all seven stage names: Concept, Systems Design, Technical Setup, Pre-Production, Production, Polish, Release
+- [ ] Does NOT contain "May I write" language (skill is detection-only)
+- [ ] Has a next-step handoff (e.g., `/gate-check` to formally advance stage)
+
+---
+
+## Director Gate Checks
+
+None. `/project-stage-detect` is a read-only detection utility. No director
+gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: stage.txt Exists — Reads directly and cross-checks artifacts
+
+**Fixture:**
+- `production/stage.txt` contains `Production`
+- `design/gdd/` has 4 GDD files
+- `src/` has source code files
+- `production/sprints/sprint-002.md` exists
+
+**Input:** `/project-stage-detect`
+
+**Expected behavior:**
+1. Skill reads `production/stage.txt` — detects stage `Production`
+2. Skill cross-checks artifacts: GDDs present, source code present, sprint present
+3. Artifacts are consistent with Production stage
+4. Skill reports: Stage = Production, Confidence = HIGH (from stage.txt, confirmed by artifacts)
+5. Next step: continue with `/sprint-plan` or `/dev-story`
+
+**Assertions:**
+- [ ] Detected stage is Production
+- [ ] Confidence is reported as HIGH when stage.txt is present
+- [ ] Cross-check result (consistent vs. discrepant) is noted
+- [ ] No files are written
+- [ ] Verdict clearly states the detected stage
+
+---
+
+### Case 2: No stage.txt but GDDs and Epics Exist — Infers Production
+
+**Fixture:**
+- No `production/stage.txt`
+- `design/gdd/` has 3 GDD files
+- `production/epics/` has 2 epic files
+- `src/` has source code files
+- `production/sprints/sprint-001.md` exists
+
+**Input:** `/project-stage-detect`
+
+**Expected behavior:**
+1. Skill finds no stage.txt — switches to artifact inference mode
+2. Skill finds GDDs (Systems Design complete), epics (Pre-Production complete),
+ source code and sprints (Production active)
+3. Skill infers: Stage = Production
+4. Confidence is MEDIUM (inferred from artifacts, not from stage.txt)
+5. Skill recommends running `/gate-check` to formalize and write stage.txt
+
+**Assertions:**
+- [ ] Inferred stage is Production
+- [ ] Confidence is MEDIUM (not HIGH, since stage.txt is absent)
+- [ ] Recommendation to run `/gate-check` is present
+- [ ] No stage.txt is written by this skill
+
+---
+
+### Case 3: No stage.txt, No Docs, No Source — Infers Concept
+
+**Fixture:**
+- No `production/stage.txt`
+- `design/` directory exists but is empty
+- `src/` exists but contains no code files
+- `technical-preferences.md` has placeholders only
+
+**Input:** `/project-stage-detect`
+
+**Expected behavior:**
+1. Skill finds no stage.txt
+2. Artifact scan: no GDDs, no source, no epics, no sprints, engine unconfigured
+3. Skill infers: Stage = Concept
+4. Confidence is MEDIUM
+5. Skill suggests `/start` to begin the onboarding workflow
+
+**Assertions:**
+- [ ] Inferred stage is Concept
+- [ ] Output lists the artifacts that were checked (and found absent)
+- [ ] `/start` is suggested as the next step
+- [ ] No files are written
+
+---
+
+### Case 4: Discrepancy — stage.txt says Production but no source code
+
+**Fixture:**
+- `production/stage.txt` contains `Production`
+- `design/gdd/` has GDD files
+- `src/` directory exists but contains no source code files
+- No sprint files exist
+
+**Input:** `/project-stage-detect`
+
+**Expected behavior:**
+1. Skill reads stage.txt — detects `Production`
+2. Cross-check finds: no source code, no sprints — inconsistent with Production
+3. Skill flags discrepancy: "stage.txt says Production but no source code or sprints found"
+4. Skill reports detected stage as Production (honoring stage.txt) but
+ confidence drops to LOW due to artifact mismatch
+5. Skill suggests reviewing stage.txt manually or running `/gate-check`
+
+**Assertions:**
+- [ ] Discrepancy is flagged explicitly in the output
+- [ ] Confidence is LOW when artifacts contradict stage.txt
+- [ ] stage.txt value is not silently overridden
+- [ ] User is advised to verify the discrepancy manually
+
+---
+
+### Case 5: Director Gate Check — No gate; detection is advisory
+
+**Fixture:**
+- Any project state with or without stage.txt
+
+**Input:** `/project-stage-detect`
+
+**Expected behavior:**
+1. Skill completes full stage detection
+2. No director agents are spawned at any point
+3. No gate IDs appear in output
+4. No write tool is called
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No write tool is called
+- [ ] Detection output is purely advisory
+- [ ] Verdict names the detected stage without triggering any gate
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads stage.txt if present; falls back to artifact inference if absent
+- [ ] Always reports a confidence level (HIGH / MEDIUM / LOW)
+- [ ] Cross-checks stage.txt against artifacts and flags discrepancies
+- [ ] Does not write stage.txt (that is `/gate-check`'s responsibility)
+- [ ] Ends with a next-step recommendation appropriate to the detected stage
+
+---
+
+## Coverage Notes
+
+- The Technical Setup stage (engine configured, no GDDs yet) and Pre-Production
+ stage (GDDs complete, no epics yet) follow the same artifact-inference pattern
+ as Cases 2 and 3 and are not separately fixture-tested.
+- The Polish and Release stages are not fixture-tested here; they follow the
+ same high-confidence (stage.txt present) or inference logic.
+- Confidence levels are advisory — the skill does not gate any actions on them.
diff --git a/CCGS Skill Testing Framework/skills/utility/prototype.md b/CCGS Skill Testing Framework/skills/utility/prototype.md
new file mode 100644
index 0000000..9b83ebf
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/prototype.md
@@ -0,0 +1,178 @@
+# Skill Test Spec: /prototype
+
+## Skill Summary
+
+`/prototype` manages a rapid prototyping workflow for validating a game mechanic
+before committing to full production implementation. Prototypes are created in
+`prototypes/[mechanic-name]/` and are intentionally disposable — coding standards
+are relaxed (no ADR required, AC can be minimal, hardcoded values acceptable).
+After implementation, the skill produces a findings document summarizing what
+was learned and recommending next steps.
+
+The skill asks "May I write to `prototypes/[name]/`?" before creating files. If a
+prototype already exists, the skill offers to extend, replace, or archive. No
+director gates apply. Verdicts: PROTOTYPE COMPLETE (prototype built and findings
+documented) or PROTOTYPE ABANDONED (mechanic found to be unworkable).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: PROTOTYPE COMPLETE, PROTOTYPE ABANDONED
+- [ ] Contains "May I write" language before creating prototype files
+- [ ] Has a next-step handoff (e.g., `/design-system` to formalize, or archive)
+
+---
+
+## Director Gate Checks
+
+None. Prototypes are throwaway validation artifacts. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Mechanic concept prototyped, findings documented
+
+**Fixture:**
+- `prototypes/` directory exists
+- No existing prototype for "grapple-hook"
+
+**Input:** `/prototype grapple-hook`
+
+**Expected behavior:**
+1. Skill asks "May I write to `prototypes/grapple-hook/`?"
+2. After approval: creates `prototypes/grapple-hook/` directory and basic
+ implementation skeleton (main scene, player controller extension)
+3. Skill implements a minimal grapple hook mechanic (intentionally rough — no
+ polish, hardcoded values acceptable)
+4. Skill produces `prototypes/grapple-hook/findings.md` with:
+ - What was tested
+ - What worked
+ - What didn't work
+ - Recommendation (proceed / abandon / revise concept)
+5. Verdict is PROTOTYPE COMPLETE
+
+**Assertions:**
+- [ ] "May I write to `prototypes/grapple-hook/`?" is asked before any files are created
+- [ ] Implementation is isolated to `prototypes/` (not `src/`)
+- [ ] `findings.md` is created with at minimum: tested/worked/didn't-work/recommendation
+- [ ] Verdict is PROTOTYPE COMPLETE
+
+---
+
+### Case 2: Prototype Already Exists — Offers Extend, Replace, or Archive
+
+**Fixture:**
+- `prototypes/grapple-hook/` already exists from a previous prototype session
+- It contains a basic implementation and a findings.md
+
+**Input:** `/prototype grapple-hook`
+
+**Expected behavior:**
+1. Skill detects existing `prototypes/grapple-hook/` directory
+2. Skill reports: "Prototype already exists for grapple-hook"
+3. Skill presents 3 options:
+ - Extend: add new features to the existing prototype
+ - Replace: start fresh (asks "May I replace `prototypes/grapple-hook/`?")
+ - Archive: move to `prototypes/archive/grapple-hook/` and start fresh
+4. User selects; skill proceeds accordingly
+
+**Assertions:**
+- [ ] Existing prototype is detected and reported
+- [ ] Exactly 3 options are presented (extend, replace, archive)
+- [ ] Replace path includes a "May I replace" confirmation
+- [ ] Archive path moves (not deletes) the existing prototype
+
+---
+
+### Case 3: Prototype Validates Mechanic — Recommends Proceeding to Production
+
+**Fixture:**
+- Prototype implementation complete
+- Findings: grapple hook mechanic is fun and technically feasible
+
+**Input:** `/prototype grapple-hook` (prototype session complete)
+
+**Expected behavior:**
+1. After prototype is built and tested, findings are summarized
+2. Recommendation in findings.md: "Mechanic validated — recommend proceeding
+ to `/design-system` for full specification"
+3. Skill handoff message explicitly suggests `/design-system grapple-hook`
+4. Verdict is PROTOTYPE COMPLETE
+
+**Assertions:**
+- [ ] `findings.md` contains an explicit recommendation
+- [ ] Recommendation references `/design-system` when mechanic is validated
+- [ ] Handoff message echoes the recommendation
+- [ ] Verdict is PROTOTYPE COMPLETE (not PROTOTYPE ABANDONED)
+
+---
+
+### Case 4: Prototype Reveals Mechanic is Unworkable — PROTOTYPE ABANDONED
+
+**Fixture:**
+- Prototype implemented for "procedural-dialogue"
+- After testing: the mechanic creates incoherent dialogue trees and is
+ frustrating to play
+
+**Input:** `/prototype procedural-dialogue`
+
+**Expected behavior:**
+1. Prototype is built
+2. Findings document the failure: incoherent output, player confusion, technical complexity
+3. Recommendation in findings.md: "Mechanic not viable — abandoning"
+4. `findings.md` documents the specific reasons the mechanic failed
+5. Skill suggests alternatives in the handoff (e.g., curated dialogue instead)
+6. Verdict is PROTOTYPE ABANDONED
+
+**Assertions:**
+- [ ] Verdict is PROTOTYPE ABANDONED (not PROTOTYPE COMPLETE)
+- [ ] `findings.md` documents specific failure reasons (not vague)
+- [ ] Alternative approaches are suggested in the handoff
+- [ ] Prototype files are retained (not deleted) for reference
+
+---
+
+### Case 5: Director Gate Check — No gate; prototypes are validation artifacts
+
+**Fixture:**
+- Mechanic concept provided
+
+**Input:** `/prototype wall-jump`
+
+**Expected behavior:**
+1. Skill creates and documents the prototype
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is PROTOTYPE COMPLETE or PROTOTYPE ABANDONED — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Asks "May I write to `prototypes/[name]/`?" before creating any files
+- [ ] Creates all files under `prototypes/` (not `src/`)
+- [ ] Produces `findings.md` with tested/worked/didn't-work/recommendation
+- [ ] Notes that production coding standards are intentionally relaxed
+- [ ] Offers extend/replace/archive when prototype already exists
+- [ ] Verdict is PROTOTYPE COMPLETE or PROTOTYPE ABANDONED
+
+---
+
+## Coverage Notes
+
+- Prototype implementation quality (code style) is intentionally not tested —
+ prototypes are throwaway artifacts and quality standards do not apply.
+- The archiving mechanism is mentioned in Case 2 but the archive format is
+ not assertion-tested in detail.
+- Engine-specific prototype scaffolding (GDScript scenes vs. C# MonoBehaviour)
+ follows the same flow with engine-appropriate file types.
diff --git a/CCGS Skill Testing Framework/skills/utility/qa-plan.md b/CCGS Skill Testing Framework/skills/utility/qa-plan.md
new file mode 100644
index 0000000..0b0ec22
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/qa-plan.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /qa-plan
+
+## Skill Summary
+
+`/qa-plan` generates a structured QA test plan for a feature or sprint milestone.
+It reads story files for the specified sprint, extracts acceptance criteria from
+each story, cross-references test standards from `coding-standards.md` to assign
+the appropriate test type (unit, integration, visual, UI, or config/data), and
+produces a prioritized QA plan document.
+
+The skill asks "May I write to `production/qa/qa-plan-sprint-NNN.md`?" before
+persisting the output. If an existing test plan for the same sprint is found, the
+skill offers to update rather than replace. The verdict is COMPLETE when the plan
+is written. No director gates are used — gate-level story readiness is handled by
+`/story-readiness`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before writing the plan
+- [ ] Has a next-step handoff (e.g., `/smoke-check` or `/story-readiness`)
+
+---
+
+## Director Gate Checks
+
+None. `/qa-plan` is a planning utility. Story readiness gates are separate.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Sprint with 4 stories generates full test plan
+
+**Fixture:**
+- `production/sprints/sprint-003.md` lists 4 stories with defined acceptance criteria
+- Stories span types: 1 logic (formula), 1 integration, 1 visual, 1 UI
+- `coding-standards.md` is present with test evidence table
+
+**Input:** `/qa-plan sprint-003`
+
+**Expected behavior:**
+1. Skill reads sprint-003.md and identifies 4 stories
+2. Skill reads each story's acceptance criteria
+3. Skill assigns test types per coding-standards.md table:
+ - Logic story → Unit test (BLOCKING)
+ - Integration story → Integration test (BLOCKING)
+ - Visual story → Screenshot + lead sign-off (ADVISORY)
+ - UI story → Manual walkthrough doc (ADVISORY)
+4. Skill drafts QA plan with story-by-story test type breakdown
+5. Skill asks "May I write to `production/qa/qa-plan-sprint-003.md`?"
+6. File is written on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] All 4 stories are included in the plan
+- [ ] Test type is assigned per coding-standards.md (not guessed)
+- [ ] Gate level (BLOCKING vs ADVISORY) is noted for each story
+- [ ] "May I write" is asked with the correct file path
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Story With No Acceptance Criteria — Flagged as UNTESTABLE
+
+**Fixture:**
+- `production/sprints/sprint-004.md` lists 3 stories; one story has empty
+ acceptance criteria section
+
+**Input:** `/qa-plan sprint-004`
+
+**Expected behavior:**
+1. Skill reads all 3 stories
+2. Skill detects the story with no AC
+3. Story is flagged as `UNTESTABLE — Acceptance Criteria required` in the plan
+4. Other 2 stories receive normal test type assignments
+5. Plan is written with the UNTESTABLE story flagged; verdict is COMPLETE
+
+**Assertions:**
+- [ ] UNTESTABLE label appears for the story with no AC
+- [ ] Plan is not blocked — the other stories are still planned
+- [ ] Output suggests adding AC to the flagged story (next step)
+- [ ] Verdict is COMPLETE (the plan is still generated)
+
+---
+
+### Case 3: Existing Test Plan Found — Offers update rather than replace
+
+**Fixture:**
+- `production/qa/qa-plan-sprint-003.md` already exists from a previous run
+- Sprint-003 has 2 new stories added since the last plan
+
+**Input:** `/qa-plan sprint-003`
+
+**Expected behavior:**
+1. Skill reads sprint-003.md and detects 2 stories not in the existing plan
+2. Skill reports: "Existing QA plan found for sprint-003 — offering to update"
+3. Skill presents the 2 new stories and their proposed test assignments
+4. Skill asks "May I update `production/qa/qa-plan-sprint-003.md`?" (not overwrite)
+5. Updated plan is written on approval
+
+**Assertions:**
+- [ ] Skill detects the existing plan file
+- [ ] "update" language is used (not "overwrite")
+- [ ] Only new stories are proposed for addition — existing entries preserved
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: No Stories Found for Sprint — Error with guidance
+
+**Fixture:**
+- `production/sprints/sprint-007.md` does not exist
+- No other sprint file matching sprint-007
+
+**Input:** `/qa-plan sprint-007`
+
+**Expected behavior:**
+1. Skill attempts to read sprint-007.md — file not found
+2. Skill outputs: "No sprint file found for sprint-007"
+3. Skill suggests running `/sprint-plan` to create the sprint first
+4. No plan is written; no "May I write" is asked
+
+**Assertions:**
+- [ ] Error message names the missing sprint file
+- [ ] `/sprint-plan` is suggested as the remediation step
+- [ ] No write tool is called
+- [ ] Verdict is not COMPLETE (error state)
+
+---
+
+### Case 5: Director Gate Check — No gate; QA planning is a utility
+
+**Fixture:**
+- Sprint with valid stories and AC
+
+**Input:** `/qa-plan sprint-003`
+
+**Expected behavior:**
+1. Skill generates and writes QA plan
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Skill reaches COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads coding-standards.md test evidence table before assigning test types
+- [ ] Assigns BLOCKING or ADVISORY gate level per story type
+- [ ] Flags stories with no AC as UNTESTABLE (does not silently skip them)
+- [ ] Detects existing plan and offers update path
+- [ ] Asks "May I write" before creating or updating the plan file
+- [ ] Verdict is COMPLETE when plan is written
+
+---
+
+## Coverage Notes
+
+- The case where `coding-standards.md` is missing (skill cannot assign test types)
+ is not fixture-tested; behavior would follow the BLOCKED pattern with a note
+ to restore the standards file.
+- Multi-sprint planning (spanning 2 sprints) is not tested; the skill is designed
+ for one sprint at a time.
+- Config/data story type (balance tuning → smoke check) follows the same
+ assignment pattern as other types in Case 1 and is not separately tested.
diff --git a/CCGS Skill Testing Framework/skills/utility/regression-suite.md b/CCGS Skill Testing Framework/skills/utility/regression-suite.md
new file mode 100644
index 0000000..1a339fb
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/regression-suite.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /regression-suite
+
+## Skill Summary
+
+`/regression-suite` maps test coverage to GDD requirements: it reads the
+acceptance criteria from story files in the current sprint (or a specified epic),
+then scans `tests/` for corresponding test files and checks whether each AC has
+a matching assertion. It produces a coverage report identifying which ACs are
+fully covered, partially covered, or untested, and which test files have no
+matching AC (orphan tests).
+
+The skill may write a coverage report to `production/qa/` after a "May I write"
+ask. No director gates apply. Verdicts: FULL COVERAGE (all ACs have tests),
+GAPS FOUND (some ACs are untested), or CRITICAL GAPS (a critical-priority AC
+has no test).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: FULL COVERAGE, GAPS FOUND, CRITICAL GAPS
+- [ ] Contains "May I write" language (skill may write coverage report)
+- [ ] Has a next-step handoff (e.g., `/test-setup` if framework missing, `/qa-plan` if plan missing)
+
+---
+
+## Director Gate Checks
+
+None. `/regression-suite` is a QA analysis utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Full Coverage — All ACs in sprint have corresponding tests
+
+**Fixture:**
+- `production/sprints/sprint-004.md` lists 3 stories with 2 ACs each (6 total)
+- `tests/unit/` and `tests/integration/` contain test files that match all 6 ACs
+ (by system name and scenario description)
+
+**Input:** `/regression-suite sprint-004`
+
+**Expected behavior:**
+1. Skill reads all 6 ACs from sprint-004 stories
+2. Skill scans test files and matches each AC to at least one test assertion
+3. All 6 ACs have coverage
+4. Skill produces coverage report: "6/6 ACs covered"
+5. Skill asks "May I write to `production/qa/regression-sprint-004.md`?"
+6. File is written on approval; verdict is FULL COVERAGE
+
+**Assertions:**
+- [ ] All 6 ACs appear in the coverage report
+- [ ] Each AC is marked as covered with the matching test file referenced
+- [ ] Verdict is FULL COVERAGE
+- [ ] "May I write" is asked before writing the report
+
+---
+
+### Case 2: Gaps Found — 3 ACs have no tests
+
+**Fixture:**
+- Sprint has 5 stories with 8 total ACs
+- Tests exist for 5 of the 8 ACs; 3 ACs have no corresponding test file or assertion
+
+**Input:** `/regression-suite`
+
+**Expected behavior:**
+1. Skill reads all 8 ACs
+2. Skill scans tests — 5 matched, 3 unmatched
+3. Coverage report lists the 3 untested ACs by story and AC text
+4. Skill asks "May I write to `production/qa/regression-[sprint]-[date].md`?"
+5. Report is written; verdict is GAPS FOUND
+
+**Assertions:**
+- [ ] The 3 untested ACs are listed by name in the report
+- [ ] Matched ACs are also shown (not only the gaps)
+- [ ] Verdict is GAPS FOUND (not FULL COVERAGE)
+- [ ] Report is written after "May I write" approval
+
+---
+
+### Case 3: Critical AC Untested — CRITICAL GAPS verdict, flagged prominently
+
+**Fixture:**
+- Sprint has 4 stories; one story is Priority: Critical with 2 ACs
+- One of the critical-priority ACs has no test
+
+**Input:** `/regression-suite`
+
+**Expected behavior:**
+1. Skill reads all stories and ACs, noting which stories are critical priority
+2. Skill scans tests — the critical AC has no match
+3. Report prominently flags: "CRITICAL GAP: [AC text] — no test found (Critical priority story)"
+4. Skill recommends blocking story completion until test is added
+5. Verdict is CRITICAL GAPS
+
+**Assertions:**
+- [ ] Verdict is CRITICAL GAPS (not GAPS FOUND)
+- [ ] Critical priority AC is flagged more prominently than normal gaps
+- [ ] Recommendation to block story completion is included
+- [ ] Non-critical gaps (if any) are also listed
+
+---
+
+### Case 4: Orphan Tests — Test file has no matching AC
+
+**Fixture:**
+- `tests/unit/save_system_test.gd` exists with assertions for scenarios
+ not present in any current story's AC list
+- Current sprint stories do not reference save system
+
+**Input:** `/regression-suite`
+
+**Expected behavior:**
+1. Skill scans tests and cross-references ACs
+2. `save_system_test.gd` assertions do not match any current AC
+3. Test file is flagged as ORPHAN TEST in the coverage report
+4. Report notes: "Orphan tests may belong to a past or future sprint, or AC was renamed"
+5. Verdict is FULL COVERAGE or GAPS FOUND depending on overall AC coverage
+ (orphan tests do not affect verdict, they are advisory)
+
+**Assertions:**
+- [ ] Orphan test is flagged in the report
+- [ ] Orphan flag includes the filename and suggestion (past sprint / renamed AC)
+- [ ] Orphan tests do not cause a GAPS FOUND verdict on their own
+- [ ] Overall verdict reflects AC coverage only
+
+---
+
+### Case 5: Director Gate Check — No gate; regression-suite is a QA utility
+
+**Fixture:**
+- Sprint with stories and test files
+
+**Input:** `/regression-suite`
+
+**Expected behavior:**
+1. Skill produces coverage report and writes it
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is FULL COVERAGE, GAPS FOUND, or CRITICAL GAPS — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads story ACs from sprint files before scanning tests
+- [ ] Matches ACs to tests by system name and scenario (not file name alone)
+- [ ] Flags critical-priority untested ACs as CRITICAL GAPS
+- [ ] Flags orphan tests (exist in tests/ but no AC matches)
+- [ ] Asks "May I write" before persisting the coverage report
+- [ ] Verdict is FULL COVERAGE, GAPS FOUND, or CRITICAL GAPS
+
+---
+
+## Coverage Notes
+
+- The heuristic for matching an AC to a test (by system name + scenario keywords)
+ is approximate; exact matching logic is defined in the skill body.
+- Integration test coverage is mapped the same way as unit test coverage; no
+ distinction in verdicts is made between the two.
+- This skill does not run the tests — it maps AC text to test assertions. Test
+ execution is handled by the CI pipeline.
diff --git a/CCGS Skill Testing Framework/skills/utility/release-checklist.md b/CCGS Skill Testing Framework/skills/utility/release-checklist.md
new file mode 100644
index 0000000..8581985
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/release-checklist.md
@@ -0,0 +1,177 @@
+# Skill Test Spec: /release-checklist
+
+## Skill Summary
+
+`/release-checklist` generates an internal release readiness checklist covering:
+sprint story completion, open bug severity, QA sign-off status, build stability,
+and changelog readiness. It is an internal gate — not a platform/store checklist
+(that is `/launch-checklist`). When a previous release checklist exists, it shows
+a delta of resolved and newly introduced issues.
+
+The skill writes its checklist report to `production/releases/release-checklist-[date].md`
+after a "May I write" ask. No director gates apply — `/gate-check` handles
+formal phase gate logic. Verdicts: RELEASE READY, RELEASE BLOCKED, or CONCERNS.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: RELEASE READY, RELEASE BLOCKED, CONCERNS
+- [ ] Contains "May I write" collaborative protocol language before writing the report
+- [ ] Has a next-step handoff (e.g., `/launch-checklist` for external or `/gate-check` for phase)
+
+---
+
+## Director Gate Checks
+
+None. `/release-checklist` is an internal audit utility. Formal phase advancement
+is managed by `/gate-check`.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All Sprint Stories Complete, QA Passed, RELEASE READY
+
+**Fixture:**
+- `production/sprints/sprint-008.md` — all stories are `Status: Done`
+- No open bugs with severity HIGH or CRITICAL in `production/bugs/`
+- `production/qa/qa-plan-sprint-008.md` has QA sign-off annotation
+- Changelog entry for this version exists
+- `production/stage.txt` contains `Polish`
+
+**Input:** `/release-checklist`
+
+**Expected behavior:**
+1. Skill reads sprint-008: all stories Done
+2. Skill reads bugs: no HIGH or CRITICAL open bugs
+3. Skill confirms QA plan has sign-off
+4. Skill confirms changelog entry exists
+5. All checks pass; skill asks "May I write to
+ `production/releases/release-checklist-2026-04-06.md`?"
+6. Report written; verdict is RELEASE READY
+
+**Assertions:**
+- [ ] All 4 check categories are evaluated (stories, bugs, QA, changelog)
+- [ ] All items appear with PASS markers
+- [ ] Verdict is RELEASE READY
+- [ ] "May I write" is asked before writing
+
+---
+
+### Case 2: Open HIGH Severity Bugs — RELEASE BLOCKED
+
+**Fixture:**
+- All sprint stories are Done
+- `production/bugs/` contains 2 open bugs with severity HIGH
+
+**Input:** `/release-checklist`
+
+**Expected behavior:**
+1. Skill reads sprint — stories complete
+2. Skill reads bugs — 2 HIGH severity bugs open
+3. Skill reports: "RELEASE BLOCKED — 2 open HIGH severity bugs must be resolved"
+4. Both bug filenames are listed in the report
+5. Verdict is RELEASE BLOCKED
+
+**Assertions:**
+- [ ] Verdict is RELEASE BLOCKED (not CONCERNS)
+- [ ] Both bug filenames are listed explicitly
+- [ ] Skill makes clear HIGH severity bugs are blocking (not advisory)
+
+---
+
+### Case 3: Changelog Not Generated — CONCERNS
+
+**Fixture:**
+- All stories Done, no HIGH/CRITICAL bugs
+- No changelog entry found for the current version/sprint
+
+**Input:** `/release-checklist`
+
+**Expected behavior:**
+1. Skill checks all items
+2. Changelog check fails: no changelog entry found
+3. Skill reports: "CONCERNS — Changelog not generated for this release"
+4. Skill suggests running `/changelog` to generate it
+5. Verdict is CONCERNS (advisory — not a hard block)
+
+**Assertions:**
+- [ ] Verdict is CONCERNS (not RELEASE BLOCKED — changelog is advisory)
+- [ ] `/changelog` is suggested as the remediation
+- [ ] Other passing checks are shown in the report
+- [ ] Missing changelog is described as advisory, not blocking
+
+---
+
+### Case 4: Previous Release Checklist Exists — Delta From Last Release
+
+**Fixture:**
+- `production/releases/release-checklist-2026-03-20.md` exists
+- Previous: 1 story was incomplete, 1 HIGH bug open
+- Current: all stories Done, HIGH bug resolved, but now 1 MEDIUM bug appeared
+
+**Input:** `/release-checklist`
+
+**Expected behavior:**
+1. Skill finds the previous checklist and loads it
+2. New checklist is generated and compared:
+ - Newly resolved: "Story [X] — was open, now Done"
+ - Newly resolved: "HIGH bug [filename] — was open, now closed"
+ - New item: "1 MEDIUM bug appeared (advisory)"
+3. Delta section shows all changes prominently
+4. Verdict is CONCERNS (MEDIUM bug is advisory, not blocking)
+
+**Assertions:**
+- [ ] Delta section appears in the report with resolved and new items
+- [ ] Newly resolved items from the previous checklist are noted
+- [ ] New items not present in the previous checklist are highlighted
+- [ ] Verdict reflects current state (not previous state)
+
+---
+
+### Case 5: Director Gate Check — No gate; release-checklist is an internal audit
+
+**Fixture:**
+- Active sprint with stories and bug reports
+
+**Input:** `/release-checklist`
+
+**Expected behavior:**
+1. Skill runs the full checklist and writes the report
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is RELEASE READY, RELEASE BLOCKED, or CONCERNS — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Checks sprint story completion status
+- [ ] Checks open bug severity (CRITICAL/HIGH = BLOCKED; MEDIUM/LOW = CONCERNS)
+- [ ] Checks QA plan sign-off status
+- [ ] Checks changelog existence
+- [ ] Compares against previous checklist when one exists
+- [ ] Asks "May I write" before writing the report
+- [ ] Verdict is RELEASE READY, RELEASE BLOCKED, or CONCERNS
+
+---
+
+## Coverage Notes
+
+- Build stability verification (no failed CI runs) is listed as a check category
+ but relies on external CI system state; the skill notes this as a MANUAL CHECK
+ if CI integration is not configured.
+- CRITICAL bugs always result in RELEASE BLOCKED regardless of other items;
+ this is equivalent to the HIGH severity case in Case 2.
+- Stories with `Status: In Review` (not Done) are treated as incomplete
+ and result in RELEASE BLOCKED; this edge case follows the same pattern
+ as the HIGH bug case.
diff --git a/CCGS Skill Testing Framework/skills/utility/reverse-document.md b/CCGS Skill Testing Framework/skills/utility/reverse-document.md
new file mode 100644
index 0000000..8f9ca90
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/reverse-document.md
@@ -0,0 +1,180 @@
+# Skill Test Spec: /reverse-document
+
+## Skill Summary
+
+`/reverse-document` generates design or architecture documentation from existing
+source code. It reads the specified source file(s), infers design intent from
+class structure, method names, constants, and comments, and produces either a
+GDD skeleton (for gameplay systems) or an architecture overview (for technical
+systems). The output is a best-effort inference — magic numbers and undocumented
+logic may result in a PARTIAL verdict.
+
+The skill asks "May I write to [inferred path]?" before creating the document.
+No director gates apply. Verdicts: COMPLETE (clean inference), PARTIAL (some
+fields are ambiguous and need human review).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, PARTIAL
+- [ ] Contains "May I write" collaborative protocol language before writing the doc
+- [ ] Has a next-step handoff (e.g., `/design-review` to validate the generated doc)
+
+---
+
+## Director Gate Checks
+
+None. `/reverse-document` is a documentation utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Well-Structured Source — Accurate design doc skeleton produced
+
+**Fixture:**
+- `src/gameplay/health_system.gd` exists with:
+ - `@export var max_health: int = 100`
+ - `func take_damage(amount: int)` with clamping logic
+ - `signal health_changed(new_value: int)`
+ - Docstrings on all public methods
+
+**Input:** `/reverse-document src/gameplay/health_system.gd`
+
+**Expected behavior:**
+1. Skill reads the source file and identifies the health system
+2. Skill infers design intent: max health, take_damage behavior, health signal
+3. Skill produces GDD skeleton for health system with 8 required sections:
+ Overview, Player Fantasy, Detailed Rules, Formulas, Edge Cases, Dependencies,
+ Tuning Knobs, Acceptance Criteria
+4. Formulas section includes the inferred clamping formula
+5. Tuning Knobs notes `max_health = 100` as a configurable value
+6. Skill asks "May I write to `design/gdd/health-system.md`?"
+7. File written; verdict is COMPLETE
+
+**Assertions:**
+- [ ] All 8 required GDD sections are present in the output
+- [ ] `max_health = 100` appears as a Tuning Knob
+- [ ] Clamping formula is captured in the Formulas section
+- [ ] "May I write" is asked with the inferred path
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Ambiguous Source — Magic Numbers, PARTIAL Verdict
+
+**Fixture:**
+- `src/gameplay/enemy_ai.gd` exists with:
+ - Inline magic numbers: `if distance < 150:`, `speed = 3.5`
+ - No comments or docstrings
+ - Complex state machine logic that is not self-explanatory
+
+**Input:** `/reverse-document src/gameplay/enemy_ai.gd`
+
+**Expected behavior:**
+1. Skill reads the file and detects magic numbers with no context
+2. Skill produces a GDD skeleton with notes: "AMBIGUOUS VALUE: 150 (unknown units —
+ is this pixels, world units, or tiles?)"
+3. Skill marks the Formulas and Tuning Knobs sections as requiring human review
+4. Skill asks "May I write to `design/gdd/enemy-ai.md`?" with PARTIAL advisory
+5. File written with PARTIAL markers; verdict is PARTIAL
+
+**Assertions:**
+- [ ] AMBIGUOUS VALUE annotations appear for magic numbers
+- [ ] Sections needing human review are marked explicitly
+- [ ] Verdict is PARTIAL (not COMPLETE)
+- [ ] File is still written — PARTIAL is not a blocking failure
+
+---
+
+### Case 3: Multiple Interdependent Files — Cross-System Overview Produced
+
+**Fixture:**
+- User provides 2 source files: `combat_system.gd` and `damage_resolver.gd`
+- The files reference each other (combat calls damage_resolver)
+
+**Input:** `/reverse-document src/gameplay/combat_system.gd src/gameplay/damage_resolver.gd`
+
+**Expected behavior:**
+1. Skill reads both files and detects the dependency relationship
+2. Skill produces a cross-system architecture overview (not individual GDDs)
+3. Overview describes: Combat System → Damage Resolver interaction, shared
+ interfaces, data flow between the two
+4. Skill asks "May I write to `docs/architecture/combat-damage-overview.md`?"
+5. Overview written after approval; verdict is COMPLETE (or PARTIAL if ambiguous)
+
+**Assertions:**
+- [ ] Both files are analyzed together (not as two separate docs)
+- [ ] Cross-system dependency is documented in the output
+- [ ] Output file is written to `docs/architecture/` (not `design/gdd/`)
+- [ ] Verdict is COMPLETE or PARTIAL
+
+---
+
+### Case 4: Source File Not Found — Error
+
+**Fixture:**
+- `src/gameplay/inventory_system.gd` does not exist
+
+**Input:** `/reverse-document src/gameplay/inventory_system.gd`
+
+**Expected behavior:**
+1. Skill attempts to read the specified file — not found
+2. Skill outputs: "Source file not found: src/gameplay/inventory_system.gd"
+3. Skill suggests checking the path or running `/map-systems` to identify
+ the correct source file
+4. No document is created
+
+**Assertions:**
+- [ ] Error message names the missing file with the full path
+- [ ] Alternative suggestion (check path or `/map-systems`) is provided
+- [ ] No write tool is called
+- [ ] No verdict is issued (error state)
+
+---
+
+### Case 5: Director Gate Check — No gate; reverse-document is a utility
+
+**Fixture:**
+- Well-structured source file exists
+
+**Input:** `/reverse-document src/gameplay/health_system.gd`
+
+**Expected behavior:**
+1. Skill generates and writes the design doc
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE or PARTIAL — no gate verdict involved
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads source file(s) before generating any content
+- [ ] Produces all 8 required GDD sections when target is a gameplay system
+- [ ] Annotates ambiguous values with AMBIGUOUS VALUE markers
+- [ ] Produces cross-system overview (not individual GDDs) for multiple files
+- [ ] Asks "May I write" before creating any output file
+- [ ] Verdict is COMPLETE (clean inference) or PARTIAL (ambiguous fields)
+
+---
+
+## Coverage Notes
+
+- Architecture overview format (for technical/infrastructure systems) differs
+ from GDD format; the inferred output type is determined by the nature of the
+ source file (gameplay logic → GDD; engine/infra code → architecture doc).
+- The case where a source file is readable but contains only auto-generated
+ boilerplate with no meaningful logic is not tested; skill would likely produce
+ a near-empty skeleton with a PARTIAL verdict.
+- C# and Blueprint source files follow the same inference pattern as GDScript;
+ language-specific differences are handled in the skill body.
diff --git a/CCGS Skill Testing Framework/skills/utility/setup-engine.md b/CCGS Skill Testing Framework/skills/utility/setup-engine.md
new file mode 100644
index 0000000..0f5254c
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/setup-engine.md
@@ -0,0 +1,182 @@
+# Skill Test Spec: /setup-engine
+
+## Skill Summary
+
+`/setup-engine` configures the project's engine, language, rendering backend,
+physics engine, specialist agent assignments, and naming conventions by
+populating `technical-preferences.md`. It accepts an optional engine argument
+(e.g., `/setup-engine godot`) to skip the engine-selection step. For each
+section of `technical-preferences.md`, the skill presents a draft and asks
+"May I write to `technical-preferences.md`?" before updating.
+
+The skill also populates the specialist routing table (file extension → agent
+mappings) based on the chosen engine. It has no director gates — configuration
+is a technical utility task. The verdict is always COMPLETE when the file is
+fully written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before updating technical-preferences.md
+- [ ] Has a next-step handoff (e.g., `/brainstorm` or `/start` depending on flow)
+
+---
+
+## Director Gate Checks
+
+None. `/setup-engine` is a technical configuration skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Godot 4 + GDScript — Full engine configuration
+
+**Fixture:**
+- `technical-preferences.md` contains only placeholders
+- Engine argument provided: `godot`
+
+**Input:** `/setup-engine godot`
+
+**Expected behavior:**
+1. Skill skips engine-selection step (argument provided)
+2. Skill presents language options for Godot: GDScript or C#
+3. User selects GDScript
+4. Skill drafts all engine sections: engine/language/rendering/physics fields,
+ naming conventions (snake_case for GDScript), specialist assignments
+ (godot-specialist, gdscript-specialist, godot-shader-specialist, etc.)
+5. Skill populates the routing table: `.gd` → gdscript-specialist, `.gdshader` →
+ godot-shader-specialist, `.tscn` → godot-specialist
+6. Skill asks "May I write to `technical-preferences.md`?"
+7. File is written after approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Engine field is set to Godot 4 (not a placeholder)
+- [ ] Language field is set to GDScript
+- [ ] Naming conventions are GDScript-appropriate (snake_case)
+- [ ] Routing table includes `.gd`, `.gdshader`, and `.tscn` entries
+- [ ] Specialists are assigned (not placeholders)
+- [ ] "May I write" is asked before writing
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Unity + C# — Unity-specific configuration
+
+**Fixture:**
+- `technical-preferences.md` contains only placeholders
+- Engine argument provided: `unity`
+
+**Input:** `/setup-engine unity`
+
+**Expected behavior:**
+1. Skill sets engine to Unity, language to C#
+2. Naming conventions are C#-appropriate (PascalCase for classes, camelCase for fields)
+3. Specialist assignments reference unity-specialist, csharp-specialist
+4. Routing table: `.cs` → csharp-specialist, `.asmdef` → unity-specialist,
+ `.unity` (scene) → unity-specialist
+5. Skill asks "May I write to `technical-preferences.md`?" and writes on approval
+
+**Assertions:**
+- [ ] Engine field is set to Unity (not Godot or Unreal)
+- [ ] Language field is set to C#
+- [ ] Naming conventions reflect C# conventions
+- [ ] Routing table includes `.cs` and `.unity` entries
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 3: Unreal + Blueprint — Unreal-specific configuration
+
+**Fixture:**
+- `technical-preferences.md` contains only placeholders
+- Engine argument provided: `unreal`
+
+**Input:** `/setup-engine unreal`
+
+**Expected behavior:**
+1. Skill sets engine to Unreal Engine 5, primary language to Blueprint (Visual Scripting)
+2. Specialist assignments reference unreal-specialist, blueprint-specialist
+3. Routing table: `.uasset` → blueprint-specialist or unreal-specialist,
+ `.umap` → unreal-specialist
+4. Performance budgets are pre-set with Unreal defaults (e.g., higher draw call budget)
+5. Skill asks "May I write" and writes on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Engine field is set to Unreal Engine 5
+- [ ] Routing table includes `.uasset` and `.umap` entries
+- [ ] Blueprint specialist is assigned
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: Engine Already Configured — Offers to reconfigure specific sections
+
+**Fixture:**
+- `technical-preferences.md` has engine set to Godot 4 with all fields populated
+- No engine argument provided
+
+**Input:** `/setup-engine`
+
+**Expected behavior:**
+1. Skill reads `technical-preferences.md` and detects fully configured engine (Godot 4)
+2. Skill reports: "Engine already configured as Godot 4 + GDScript"
+3. Skill presents options: reconfigure all, reconfigure specific section only
+ (Engine/Language, Naming Conventions, Specialists, Performance Budgets)
+4. User selects "Reconfigure Performance Budgets only"
+5. Only the performance budget section is updated; all other fields unchanged
+6. Skill asks "May I write to `technical-preferences.md`?" and writes on approval
+
+**Assertions:**
+- [ ] Skill does NOT overwrite all fields when only a section update was requested
+- [ ] User is offered section-specific reconfiguration
+- [ ] Only the selected section is modified in the written file
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; setup-engine is a utility skill
+
+**Fixture:**
+- Fresh project with no engine configured
+
+**Input:** `/setup-engine godot`
+
+**Expected behavior:**
+1. Skill completes full engine configuration
+2. No director agents are spawned at any point
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Presents draft configuration before asking to write
+- [ ] Asks "May I write to `technical-preferences.md`?" before writing
+- [ ] Respects engine argument when provided (skips selection step)
+- [ ] Detects existing config and offers partial reconfigure
+- [ ] Routing table is populated for all key file types for the chosen engine
+- [ ] Verdict is COMPLETE after file is written
+
+---
+
+## Coverage Notes
+
+- Godot 4 + C# (instead of GDScript) follows the same flow as Case 1 with
+ different naming conventions and the godot-csharp-specialist assignment.
+ This variant is not separately tested.
+- The engine-version-specific guidance (e.g., Godot 4.6 knowledge gap warning
+ from VERSION.md) is surfaced by the skill but not assertion-tested here.
+- Performance budget defaults per engine are noted as engine-specific but
+ exact default values are not assertion-tested.
diff --git a/CCGS Skill Testing Framework/skills/utility/skill-improve.md b/CCGS Skill Testing Framework/skills/utility/skill-improve.md
new file mode 100644
index 0000000..459aff1
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/skill-improve.md
@@ -0,0 +1,185 @@
+# Skill Test Spec: /skill-improve
+
+## Skill Summary
+
+`/skill-improve` runs an automated test-fix-retest improvement loop on a skill
+file. It invokes `/skill-test static` (and optionally `/skill-test category`) to
+establish a baseline score, diagnoses the failing checks, proposes targeted fixes
+to the SKILL.md file, asks "May I write the improvements to [skill path]?", applies
+the fixes, and re-runs the tests to confirm improvement.
+
+If the proposed fix makes the skill worse (regression), the fix is reverted (with
+user confirmation) rather than applied. If the skill is already perfect (0 failures),
+the skill exits immediately without making changes. No director gates apply. Verdicts:
+IMPROVED (score went up), NO CHANGE (no improvements possible or user declined), or
+REVERTED (fix was applied but caused regression and was reverted).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: IMPROVED, NO CHANGE, REVERTED
+- [ ] Contains "May I write" collaborative protocol language before applying fixes
+- [ ] Has a next-step handoff (e.g., run `/skill-test spec` to validate behavioral compliance)
+
+---
+
+## Director Gate Checks
+
+None. `/skill-improve` is a meta-utility skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Skill With 2 Static Failures, Both Fixed, IMPROVED
+
+**Fixture:**
+- `.claude/skills/some-skill/SKILL.md` has 2 static failures:
+ - Check 4: no "May I write" language despite having Write in allowed-tools
+ - Check 5: no next-step handoff at the end
+
+**Input:** `/skill-improve some-skill`
+
+**Expected behavior:**
+1. Skill runs `/skill-test static some-skill` — baseline: 5/7 checks pass
+2. Skill diagnoses the 2 failing checks (4 and 5)
+3. Skill proposes fixes:
+ - Add "May I write" language to the appropriate phase
+ - Add a next-step handoff section at the end
+4. Skill asks "May I write improvements to `.claude/skills/some-skill/SKILL.md`?"
+5. Fixes applied; `/skill-test static some-skill` re-run — now 7/7 checks pass
+6. Verdict is IMPROVED (5→7)
+
+**Assertions:**
+- [ ] Baseline score is established before any changes (5/7)
+- [ ] Both failing checks are diagnosed and addressed in the proposed fix
+- [ ] "May I write" is asked before applying the fix
+- [ ] Re-test confirms improvement (7/7)
+- [ ] Verdict is IMPROVED with before/after score shown
+
+---
+
+### Case 2: Fix Causes Regression — Score Comparison Shows Regression, REVERTED
+
+**Fixture:**
+- `.claude/skills/some-skill/SKILL.md` has 1 static failure (missing handoff)
+- Proposed fix inadvertently removes the verdict keywords section
+ (introducing a new failure)
+
+**Input:** `/skill-improve some-skill`
+
+**Expected behavior:**
+1. Baseline: 6/7 checks pass (1 failure: missing handoff)
+2. Skill proposes fix and asks "May I write improvements?"
+3. Fix is applied; re-test runs
+4. Re-test result: 5/7 (fixed the handoff but broke verdict keywords)
+5. Skill detects regression: score went DOWN
+6. Skill asks user: "Fix caused a regression (6→5). May I revert the changes?"
+7. User confirms; changes are reverted; verdict is REVERTED
+
+**Assertions:**
+- [ ] Re-test score is compared to baseline before finalizing
+- [ ] Regression is detected when score decreases
+- [ ] User is asked to confirm revert (not automatic)
+- [ ] File is reverted on user confirmation
+- [ ] Verdict is REVERTED
+
+---
+
+### Case 3: Skill With Category Assignment — Baseline Captures Both Scores
+
+**Fixture:**
+- `.claude/skills/gate-check/SKILL.md` is a gate skill with 1 static failure
+ and 2 category (G-criteria) failures
+- `tests/skills/quality-rubric.md` has Gate Skills section
+
+**Input:** `/skill-improve gate-check`
+
+**Expected behavior:**
+1. Skill runs both static and category tests for the baseline:
+ - Static: 6/7 checks pass
+ - Category: 3/5 G-criteria pass
+2. Combined baseline: 9/12
+3. Skill diagnoses all 3 failures and proposes fixes
+4. "May I write improvements to `.claude/skills/gate-check/SKILL.md`?"
+5. Fixes applied; both test types re-run
+6. Re-test: static 7/7, category 5/5 = 12/12
+7. Verdict is IMPROVED (9→12)
+
+**Assertions:**
+- [ ] Both static and category scores are captured in the baseline
+- [ ] Combined score is used for comparison (not just one type)
+- [ ] All 3 failures are addressed in the proposed fix
+- [ ] Re-test confirms improvement in both score types
+- [ ] Verdict is IMPROVED with combined before/after
+
+---
+
+### Case 4: Skill Already Perfect — No Improvements Needed
+
+**Fixture:**
+- `.claude/skills/brainstorm/SKILL.md` has no static failures
+- Category score is also 5/5 (if applicable)
+
+**Input:** `/skill-improve brainstorm`
+
+**Expected behavior:**
+1. Skill runs `/skill-test static brainstorm` — 7/7 checks pass
+2. If category applies: 5/5 criteria pass
+3. Skill outputs: "No improvements needed — brainstorm is fully compliant"
+4. Skill exits without proposing any changes
+5. No "May I write" is asked; no files are modified
+6. Verdict is NO CHANGE
+
+**Assertions:**
+- [ ] Skill exits immediately after confirming 0 failures
+- [ ] "No improvements needed" message is shown
+- [ ] No changes are proposed
+- [ ] No "May I write" is asked
+- [ ] Verdict is NO CHANGE
+
+---
+
+### Case 5: Director Gate Check — No gate; skill-improve is a meta utility
+
+**Fixture:**
+- Skill with at least 1 static failure
+
+**Input:** `/skill-improve some-skill`
+
+**Expected behavior:**
+1. Skill runs the test-fix-retest loop
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is IMPROVED, NO CHANGE, or REVERTED — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Always establishes a baseline score before proposing any changes
+- [ ] Shows before/after score comparison in the output
+- [ ] Asks "May I write" before applying any fix
+- [ ] Detects regressions by comparing re-test score to baseline
+- [ ] Asks for user confirmation before reverting (not automatic)
+- [ ] Ends with IMPROVED, NO CHANGE, or REVERTED verdict
+
+---
+
+## Coverage Notes
+
+- The improvement loop is designed to run only one fix-retest cycle per
+ invocation; running multiple iterations requires re-invoking `/skill-improve`.
+- Behavioral compliance (spec-mode test results) is not included in the
+ improvement loop — only structural (static) and category scores are automated.
+- The case where the skill file cannot be read (permissions error or missing file)
+ is not tested; this would result in an error before the baseline is established.
diff --git a/CCGS Skill Testing Framework/skills/utility/skill-test.md b/CCGS Skill Testing Framework/skills/utility/skill-test.md
new file mode 100644
index 0000000..9687bae
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/skill-test.md
@@ -0,0 +1,188 @@
+# Skill Test Spec: /skill-test
+
+## Skill Summary
+
+`/skill-test` validates skill files for structural correctness, behavioral
+compliance, and category-rubric scoring. It operates in three modes:
+
+- **static**: Checks a single skill file for structural requirements
+ (frontmatter fields, phase headings, verdict keywords, "May I write" language,
+ next-step handoff) without needing a fixture. Produces a per-check PASS/FAIL
+ table.
+- **spec**: Reads a test spec file from `tests/skills/` and evaluates the skill
+ against each test case assertion, producing a case-by-case verdict.
+- **audit**: Produces a coverage table of all skills in `.claude/skills/` and
+ all agents in `.claude/agents/`, showing which have spec files and which do not.
+
+An additional **category** mode reads the quality rubric for a skill category
+(e.g., gate skills) and scores the skill against rubric criteria. The verdict
+system differs by mode.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdicts: COMPLIANT, NON-COMPLIANT, WARNINGS (static mode); PASS, FAIL, PARTIAL (spec mode); COMPLETE (audit mode)
+- [ ] Does NOT contain "May I write" language (skill is read-only in all modes)
+- [ ] Has a next-step handoff (e.g., `/skill-improve` to fix issues found)
+
+---
+
+## Director Gate Checks
+
+None. `/skill-test` is a meta-utility skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Static Mode — Well-formed skill, all 7 checks pass, COMPLIANT
+
+**Fixture:**
+- `.claude/skills/brainstorm/SKILL.md` exists and is well-formed:
+ - Has all required frontmatter fields
+ - Has ≥2 phase headings
+ - Has verdict keywords
+ - Has "May I write" language
+ - Has a next-step handoff
+ - Documents director gates
+ - Documents gate mode behavior (lean/solo skips)
+
+**Input:** `/skill-test static brainstorm`
+
+**Expected behavior:**
+1. Skill reads `.claude/skills/brainstorm/SKILL.md`
+2. Skill runs all 7 structural checks
+3. All 7 checks pass
+4. Skill outputs a PASS/FAIL table with all 7 checks marked PASS
+5. Verdict is COMPLIANT
+
+**Assertions:**
+- [ ] Exactly 7 structural checks are reported
+- [ ] All 7 are marked PASS
+- [ ] Verdict is COMPLIANT
+- [ ] No files are written
+
+---
+
+### Case 2: Static Mode — Skill Missing "May I Write" Despite Write Tool in allowed-tools
+
+**Fixture:**
+- `.claude/skills/some-skill/SKILL.md` has `Write` in `allowed-tools` frontmatter
+- The skill body has no "May I write" or "May I update" language
+
+**Input:** `/skill-test static some-skill`
+
+**Expected behavior:**
+1. Skill reads `some-skill/SKILL.md`
+2. Check 4 (collaborative write protocol) fails: `Write` in allowed-tools but no
+ "May I write" language found
+3. All other checks may pass
+4. Verdict is NON-COMPLIANT with Check 4 as the failing assertion
+5. Output lists Check 4 as FAIL with explanation
+
+**Assertions:**
+- [ ] Check 4 is marked FAIL
+- [ ] Explanation identifies the specific mismatch (Write tool without "May I write" language)
+- [ ] Verdict is NON-COMPLIANT
+- [ ] Other passing checks are shown (not only the failure)
+
+---
+
+### Case 3: Spec Mode — gate-check Skill Evaluated Against Spec
+
+**Fixture:**
+- `tests/skills/gate-check.md` exists with 5 test cases
+- `.claude/skills/gate-check/SKILL.md` exists
+
+**Input:** `/skill-test spec gate-check`
+
+**Expected behavior:**
+1. Skill reads both the skill file and the spec file
+2. Skill evaluates each of the 5 test case assertions against the skill's behavior
+3. For each case: PASS if skill behavior matches spec assertions, FAIL if not
+4. Skill produces a case-by-case result table
+5. Overall verdict: PASS (all 5), PARTIAL (some), or FAIL (majority failing)
+
+**Assertions:**
+- [ ] All 5 test cases from the spec are evaluated
+- [ ] Each case has an individual PASS/FAIL result
+- [ ] Overall verdict is PASS, PARTIAL, or FAIL based on case results
+- [ ] No files are written
+
+---
+
+### Case 4: Audit Mode — Coverage Table of All Skills and Agents
+
+**Fixture:**
+- `.claude/skills/` contains 72+ skill directories
+- `.claude/agents/` contains 49+ agent files
+- `tests/skills/` contains spec files for a subset of skills
+
+**Input:** `/skill-test audit`
+
+**Expected behavior:**
+1. Skill enumerates all skills in `.claude/skills/` and all agents in `.claude/agents/`
+2. Skill checks `tests/skills/` for a corresponding spec file for each
+3. Skill produces a coverage table:
+ - Each skill/agent listed
+ - "Has Spec" column: YES or NO
+ - Summary: "X of Y skills have specs; A of B agents have specs"
+4. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] All skill directories are enumerated (not just a sample)
+- [ ] "Has Spec" column is accurate for each entry
+- [ ] Summary counts are correct
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Category Mode — Gate Skill Evaluated Against Quality Rubric
+
+**Fixture:**
+- `tests/skills/quality-rubric.md` exists with a "Gate Skills" section defining
+ criteria G1-G5 (e.g., G1: has mode guard, G2: has verdict table, etc.)
+- `.claude/skills/gate-check/SKILL.md` is a gate skill
+
+**Input:** `/skill-test category gate-check`
+
+**Expected behavior:**
+1. Skill reads `quality-rubric.md` and identifies the Gate Skills section
+2. Skill evaluates `gate-check/SKILL.md` against criteria G1-G5
+3. Each criterion is scored: PASS, PARTIAL, or FAIL
+4. Overall category score is computed (e.g., 4/5 criteria pass)
+5. Verdict is COMPLIANT (all pass), WARNINGS (some partial), or NON-COMPLIANT (failures)
+
+**Assertions:**
+- [ ] All gate criteria (G1-G5) from quality-rubric.md are evaluated
+- [ ] Each criterion has an individual score
+- [ ] Overall verdict reflects the score distribution
+- [ ] No files are written
+
+---
+
+## Protocol Compliance
+
+- [ ] Static mode checks exactly 7 structural assertions
+- [ ] Spec mode evaluates each test case from the spec file individually
+- [ ] Audit mode covers all skills AND agents (not just one category)
+- [ ] Category mode reads quality-rubric.md to get criteria (not hardcoded)
+- [ ] Does not write any files in any mode
+- [ ] Suggests `/skill-improve` as the next step when issues are found
+
+---
+
+## Coverage Notes
+
+- The skill-test skill is self-referential (it can test itself). The static
+ mode case for skill-test's own SKILL.md is not separately fixture-tested to
+ avoid infinite recursion in test design.
+- The specific 7 structural checks are defined in the skill body; only Check 4
+ (May I write) is individually tested here because it has the most nuanced logic.
+- Audit mode counts are approximate — the exact number of skills and agents will
+ change as the system grows; assertions use "all" rather than fixed counts.
diff --git a/CCGS Skill Testing Framework/skills/utility/smoke-check.md b/CCGS Skill Testing Framework/skills/utility/smoke-check.md
new file mode 100644
index 0000000..89b510a
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/smoke-check.md
@@ -0,0 +1,169 @@
+# Skill Test Spec: /smoke-check
+
+## Skill Summary
+
+`/smoke-check` runs the critical path smoke test checklist for a build. It reads
+the QA plan from `production/qa/` and checks each critical path item against the
+acceptance criteria defined in the current sprint's stories. Items that can be
+evaluated analytically are assessed; items that require runtime verification or
+visual inspection are flagged as NEEDS MANUAL CHECK.
+
+The skill produces no file writes — output is conversational. No director gates
+apply. Verdicts: PASS (all critical items verified), FAIL (at least one critical
+item fails), or NEEDS MANUAL CHECK (critical items exist that require human verification).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: PASS, FAIL, NEEDS MANUAL CHECK
+- [ ] Does NOT contain "May I write" language (skill is read-only)
+- [ ] Has a next-step handoff (e.g., `/bug-report` on FAIL, `/release-checklist` on PASS)
+
+---
+
+## Director Gate Checks
+
+None. `/smoke-check` is a QA utility skill. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All critical path items verifiable, PASS
+
+**Fixture:**
+- `production/qa/qa-plan-sprint-005.md` exists with 4 critical path items
+- All 4 items are logic or integration type (analytically assessable)
+- Corresponding story ACs are defined and met per sprint stories
+
+**Input:** `/smoke-check`
+
+**Expected behavior:**
+1. Skill reads the QA plan and identifies 4 critical path items
+2. Skill evaluates each item against the story's acceptance criteria
+3. All 4 items pass
+4. Skill outputs a checklist: each item with a PASS marker
+5. Verdict is PASS with summary: "4/4 critical path items verified"
+
+**Assertions:**
+- [ ] All 4 items appear in the checklist output
+- [ ] Each item is marked PASS
+- [ ] Verdict is PASS
+- [ ] No files are written
+
+---
+
+### Case 2: Failure Path — One critical item fails, FAIL verdict
+
+**Fixture:**
+- QA plan has 3 critical path items
+- Item 2 ("Player health does not go below 0") fails — story AC indicates
+ clamping logic was not implemented
+
+**Input:** `/smoke-check`
+
+**Expected behavior:**
+1. Skill evaluates all 3 items
+2. Item 1 and Item 3 pass; Item 2 fails
+3. Skill outputs checklist with specific failure: "Item 2 FAIL — Health clamping not verified"
+4. Verdict is FAIL
+5. Skill suggests running `/bug-report` for the failing item
+
+**Assertions:**
+- [ ] Verdict is FAIL (not PARTIAL or NEEDS MANUAL CHECK)
+- [ ] Failing item is identified by name/description
+- [ ] Passing items are also shown (not hidden)
+- [ ] `/bug-report` is suggested for the failure
+
+---
+
+### Case 3: Visual Item Cannot Be Auto-Verified — NEEDS MANUAL CHECK
+
+**Fixture:**
+- QA plan has 3 items: 2 logic items (PASS) and 1 visual item
+ ("Explosion VFX triggers correctly on enemy death" — ADVISORY, visual type)
+
+**Input:** `/smoke-check`
+
+**Expected behavior:**
+1. Skill evaluates the 2 logic items — both pass
+2. Skill evaluates the visual item — cannot be verified analytically
+3. Visual item is marked NEEDS MANUAL CHECK with a note: "Visual quality requires
+ human verification — see production/qa/evidence/"
+4. Verdict is NEEDS MANUAL CHECK (not PASS, because human action is required)
+5. Guidance on how to perform manual check is provided
+
+**Assertions:**
+- [ ] Verdict is NEEDS MANUAL CHECK (not PASS or FAIL)
+- [ ] Visual item is marked with explicit NEEDS MANUAL CHECK tag
+- [ ] Guidance for manual verification process is included
+- [ ] Logic items are still shown as PASS
+
+---
+
+### Case 4: No Smoke Test Plan — Guidance to run /qa-plan
+
+**Fixture:**
+- `production/qa/` directory exists but contains no QA plan file for the
+ current sprint
+- Current sprint is sprint-006
+
+**Input:** `/smoke-check`
+
+**Expected behavior:**
+1. Skill looks for QA plan for the current sprint — not found
+2. Skill outputs: "No smoke test plan found for sprint-006"
+3. Skill suggests running `/qa-plan sprint-006` first
+4. No checklist is produced
+
+**Assertions:**
+- [ ] Error message names the missing sprint's plan
+- [ ] `/qa-plan` is suggested with the correct sprint argument
+- [ ] Skill does not produce a checklist with no plan
+- [ ] Verdict is not PASS (error state, no checklist evaluated)
+
+---
+
+### Case 5: Director Gate Check — No gate; smoke-check is a QA utility
+
+**Fixture:**
+- Valid QA plan with assessable items
+
+**Input:** `/smoke-check`
+
+**Expected behavior:**
+1. Skill runs the smoke check and produces a verdict
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No write tool is called
+- [ ] Verdict is PASS, FAIL, or NEEDS MANUAL CHECK — no gate verdict involved
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads QA plan before evaluating any items
+- [ ] Evaluates each item explicitly (no silent skips)
+- [ ] Visual/feel items are always flagged NEEDS MANUAL CHECK (not auto-passed)
+- [ ] FAIL verdict triggers on first critical failure (not advisory)
+- [ ] Verdict is PASS, FAIL, or NEEDS MANUAL CHECK — no other verdicts
+
+---
+
+## Coverage Notes
+
+- The case where the QA plan exists but has no critical path items (all items
+ are ADVISORY) is not tested; PASS would be returned with a note that no
+ critical items were checked.
+- The distinction between BLOCKING and ADVISORY gate levels from coding-standards.md
+ is relied upon to determine which items can produce a FAIL.
+- Build-specific failures (runtime crashes) that occur during manual testing are
+ outside the scope of this skill — use `/bug-report` for those.
diff --git a/CCGS Skill Testing Framework/skills/utility/soak-test.md b/CCGS Skill Testing Framework/skills/utility/soak-test.md
new file mode 100644
index 0000000..adc3ff5
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/soak-test.md
@@ -0,0 +1,178 @@
+# Skill Test Spec: /soak-test
+
+## Skill Summary
+
+`/soak-test` generates a structured soak test protocol — an extended runtime
+test plan designed to surface memory leaks, performance drift, and stability
+issues that only appear under sustained gameplay. The skill produces a document
+specifying the test duration, system under test, monitoring checkpoints (e.g.,
+memory sample every 30 minutes), pass/fail thresholds, and conditions for early
+termination.
+
+The skill asks "May I write to `production/qa/soak-[slug]-[date].md`?" before
+persisting. If a previous soak test for the same system exists, the skill offers
+to extend the duration or add new conditions. No director gates apply. The verdict
+is COMPLETE when the soak test protocol is written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before writing the protocol
+- [ ] Has a next-step handoff (e.g., `/regression-suite` or `/release-checklist`)
+
+---
+
+## Director Gate Checks
+
+None. `/soak-test` is a QA planning utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Online gameplay feature, 2-hour soak protocol
+
+**Fixture:**
+- User specifies: system = "online multiplayer lobby", duration = "2 hours"
+- `technical-preferences.md` has engine configured
+
+**Input:** `/soak-test online-lobby 2h`
+
+**Expected behavior:**
+1. Skill generates a 2-hour soak test protocol for the online lobby system
+2. Protocol includes: monitoring checkpoints every 30 minutes, metrics to track
+ (memory usage, connection count, packet loss), pass thresholds, early termination
+ conditions (crash or >20% memory growth)
+3. Networking-specific checks are included (session drop rate, reconnect handling)
+4. Skill asks "May I write to `production/qa/soak-online-lobby-2026-04-06.md`?"
+5. File is written on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Protocol duration matches the requested 2 hours
+- [ ] Monitoring checkpoints are at reasonable intervals (e.g., every 30 minutes)
+- [ ] Network-specific checks are included (not just generic memory checks)
+- [ ] "May I write" is asked with the correct file path
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: No Target Defined — Prompts for system, duration, and conditions
+
+**Fixture:**
+- No arguments provided
+- No soak test config in session state
+
+**Input:** `/soak-test`
+
+**Expected behavior:**
+1. Skill detects no target system or duration specified
+2. Skill asks: "What system or feature should be soak-tested?"
+3. After user responds with system: Skill asks: "What duration? (e.g., 1h, 4h, 8h)"
+4. After user responds with duration: Skill asks for specific conditions or
+ uses defaults (normal gameplay loop, default player count)
+5. Skill generates protocol from collected inputs and asks "May I write"
+
+**Assertions:**
+- [ ] At minimum 2 follow-up questions are asked (system + duration)
+- [ ] Default conditions are applied when user doesn't specify custom ones
+- [ ] Protocol is not generated until system and duration are known
+- [ ] Verdict is COMPLETE after file is written
+
+---
+
+### Case 3: Previous Soak Test Exists — Offers to extend or add conditions
+
+**Fixture:**
+- `production/qa/soak-online-lobby-2026-03-15.md` exists with a 1-hour protocol
+- User wants to extend to 4 hours with new memory threshold conditions
+
+**Input:** `/soak-test online-lobby 4h`
+
+**Expected behavior:**
+1. Skill finds existing soak test for online-lobby
+2. Skill reports: "Previous soak test found: soak-online-lobby-2026-03-15.md (1h)"
+3. Skill presents options: create new protocol (4h standalone), or extend the
+ existing protocol to 4h and add new conditions
+4. User selects extend; existing checkpoints are preserved, new ones added
+5. Skill asks "May I write to `production/qa/soak-online-lobby-2026-04-06.md`?"
+ (new file, not overwriting old one)
+
+**Assertions:**
+- [ ] Existing soak test is surfaced and referenced
+- [ ] User is offered extend vs. new options
+- [ ] New file is created (old file is not overwritten)
+- [ ] Extended protocol includes both old and new checkpoints
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: Mobile Target Platform — Memory-specific checkpoints added
+
+**Fixture:**
+- `technical-preferences.md` specifies target platform: Mobile
+- User requests soak test for "gameplay session" at 30 minutes
+
+**Input:** `/soak-test gameplay 30m`
+
+**Expected behavior:**
+1. Skill reads `technical-preferences.md` and detects mobile target platform
+2. Soak test protocol includes mobile-specific memory checkpoints:
+ - Check heap memory growth vs. device baseline
+ - Check texture memory at checkpoint intervals
+ - Add warning threshold at 300MB (mobile ceiling)
+3. Protocol also includes thermal/battery drain advisory notes
+4. Skill asks "May I write?" and writes on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Mobile platform is detected from technical-preferences.md
+- [ ] Memory checkpoints include mobile-appropriate thresholds (not desktop)
+- [ ] Thermal/battery notes are present in the protocol
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; soak-test is a planning utility
+
+**Fixture:**
+- Valid system and duration provided
+
+**Input:** `/soak-test combat 1h`
+
+**Expected behavior:**
+1. Skill generates and writes the soak test protocol
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Skill reaches COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Collects system, duration, and conditions before generating protocol
+- [ ] Includes monitoring checkpoints at regular intervals
+- [ ] Includes pass/fail thresholds and early termination conditions
+- [ ] Adapts checkpoints to target platform (mobile vs. desktop)
+- [ ] Asks "May I write" before creating the protocol file
+- [ ] Verdict is COMPLETE when file is written
+
+---
+
+## Coverage Notes
+
+- Soak tests for specific engine subsystems (rendering pipeline, physics
+ simulation) follow the same protocol structure and are not separately tested.
+- The case where the user provides a duration shorter than the minimum useful
+ soak period (e.g., 5 minutes) is not tested; the skill would note this is
+ too short for meaningful results.
+- Automated execution of the soak test protocol is outside this skill's scope —
+ this skill generates the plan, not the runner.
diff --git a/CCGS Skill Testing Framework/skills/utility/start.md b/CCGS Skill Testing Framework/skills/utility/start.md
new file mode 100644
index 0000000..a3f19b3
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/start.md
@@ -0,0 +1,173 @@
+# Skill Test Spec: /start
+
+## Skill Summary
+
+`/start` is the first-time onboarding skill for new projects. It guides the
+user through naming the project, choosing a game engine, and setting up the
+initial directory structure. It creates stub configuration files (CLAUDE.md,
+technical-preferences.md) and then routes to `/setup-engine` with the chosen
+engine as an argument. Each file or directory created is gated behind a
+"May I write" ask, following the collaborative protocol.
+
+The skill detects whether a project is already configured and whether a
+partial setup exists, offering to resume or restart as appropriate. It has
+no director gates — it is a utility setup skill that runs before any agent
+hierarchy exists.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language for each config file
+- [ ] Has a next-step handoff at the end (routes to `/setup-engine`)
+
+---
+
+## Director Gate Checks
+
+None. `/start` is a utility setup skill. No director agents exist yet at the
+point this skill runs.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Fresh repo, no engine, full onboarding flow
+
+**Fixture:**
+- Empty repository: no CLAUDE.md overrides, no `production/stage.txt`, no
+ `technical-preferences.md` content beyond placeholders
+- No existing design docs or source code
+
+**Input:** `/start`
+
+**Expected behavior:**
+1. Skill detects no existing configuration and begins fresh onboarding
+2. Skill asks for project name
+3. Skill presents 3 engine options: Godot 4, Unity, Unreal Engine 5
+4. User selects an engine
+5. Skill asks "May I write the initial directory structure?"
+6. Skill creates all directories defined in `directory-structure.md`
+7. Skill asks "May I write CLAUDE.md stub?" and writes it on approval
+8. Skill routes to `/setup-engine [chosen-engine]` to complete technical config
+
+**Assertions:**
+- [ ] Project name is captured before any file is written
+- [ ] Exactly 3 engine options are presented
+- [ ] "May I write" is asked for each config file individually
+- [ ] No file is written without explicit user approval
+- [ ] Handoff to `/setup-engine` occurs at the end with the chosen engine argument
+- [ ] Verdict is COMPLETE after all files are written and handoff is issued
+
+---
+
+### Case 2: Already Configured — Detects existing config, offers to skip or reconfigure
+
+**Fixture:**
+- `technical-preferences.md` has engine already set (not placeholder)
+- `production/stage.txt` exists with `Concept`
+
+**Input:** `/start`
+
+**Expected behavior:**
+1. Skill reads `technical-preferences.md` and detects configured engine
+2. Skill reports: "This project is already configured with [engine]"
+3. Skill presents options: skip (exit), reconfigure engine, or reconfigure specific sections
+4. If user selects skip: skill exits cleanly with a summary of current config
+5. If user selects reconfigure: skill proceeds to the engine-selection step
+
+**Assertions:**
+- [ ] Skill does NOT overwrite existing config without user choosing reconfigure
+- [ ] Detected engine name is shown to the user in the status message
+- [ ] User is offered at least 2 options (skip or reconfigure)
+- [ ] Verdict is COMPLETE whether user skips or reconfigures
+
+---
+
+### Case 3: Engine Choice — User picks Godot 4, routes to /setup-engine godot
+
+**Fixture:**
+- Fresh repo — no existing configuration
+
+**Input:** `/start`
+
+**Expected behavior:**
+1. Skill presents engine options and user selects Godot 4
+2. Skill writes initial stubs (directory structure, CLAUDE.md) after approval
+3. Skill explicitly routes to `/setup-engine godot` as the next step
+4. Handoff message clearly names the engine and the next skill invocation
+
+**Assertions:**
+- [ ] Handoff command is `/setup-engine godot` (not generic `/setup-engine`)
+- [ ] Handoff is issued after all initial stubs are written, not before
+- [ ] Engine choice is echoed back to user before writing begins
+
+---
+
+### Case 4: Interrupted Setup — Partial config detected, offers resume or restart
+
+**Fixture:**
+- Directory structure exists (was created) but `technical-preferences.md` is
+ still all placeholders (engine was never chosen — setup was interrupted)
+- No `production/stage.txt`
+
+**Input:** `/start`
+
+**Expected behavior:**
+1. Skill detects partial state: directories exist but engine is unconfigured
+2. Skill reports: "A partial setup was detected — directories exist but engine is not configured"
+3. Skill offers: resume from engine selection, or restart from scratch
+4. If resume: skill skips directory creation, proceeds to engine choice
+5. If restart: skill asks "May I overwrite existing structure?" before proceeding
+
+**Assertions:**
+- [ ] Partial state is correctly identified (directories present, engine absent)
+- [ ] User is offered resume vs. restart choice — not forced into one path
+- [ ] Resume path skips re-creating directories (no redundant "May I write" for structure)
+- [ ] Restart path asks for permission to overwrite before touching any files
+
+---
+
+### Case 5: Director Gate Check — No gate; start is a utility setup skill
+
+**Fixture:**
+- Any fixture
+
+**Input:** `/start`
+
+**Expected behavior:**
+1. Skill completes full onboarding flow
+2. No director agents are spawned at any point
+3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in the output
+
+**Assertions:**
+- [ ] No director gate is invoked during the skill execution
+- [ ] No gate skip messages appear (gates are absent, not suppressed)
+- [ ] Skill reaches COMPLETE without any gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Asks for project name before any file is written
+- [ ] Presents engine options as a structured choice (not free text)
+- [ ] Asks "May I write" separately for directory structure and for CLAUDE.md stub
+- [ ] Ends with a handoff to `/setup-engine` with the engine name as argument
+- [ ] Verdict is clearly stated (COMPLETE or BLOCKED) at end of output
+
+---
+
+## Coverage Notes
+
+- The case where the user rejects all engine options and provides a custom
+ engine name is not tested — the skill is designed for the three supported
+ engines only.
+- Git initialization (if any) is not tested here; that is an infrastructure
+ concern outside the skill boundary.
+- Solo vs. lean mode behavior is not applicable — this skill has no gates and
+ mode selection is irrelevant.
diff --git a/CCGS Skill Testing Framework/skills/utility/test-helpers.md b/CCGS Skill Testing Framework/skills/utility/test-helpers.md
new file mode 100644
index 0000000..a79c1a9
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/test-helpers.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /test-helpers
+
+## Skill Summary
+
+`/test-helpers` generates engine-specific test helper utilities for the project's
+test suite. Helpers include factory functions (for creating test entities with
+known state), fixture loaders, assertion helpers, and mock stubs for external
+dependencies. Generated helpers follow the naming and structure conventions in
+`coding-standards.md` and are written to `tests/helpers/`.
+
+Each helper file is gated behind a "May I write" ask. If a helper file already
+exists, the skill offers to extend it rather than replace. No director gates
+apply. The verdict is COMPLETE when helper files are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before writing helpers
+- [ ] Has a next-step handoff (e.g., write a test using the generated helper)
+
+---
+
+## Director Gate Checks
+
+None. `/test-helpers` is a scaffolding utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Player factory helper generated for Godot/GDScript
+
+**Fixture:**
+- `technical-preferences.md` has engine Godot 4, language GDScript
+- `tests/` directory exists (test-setup has been run)
+- `design/gdd/player.md` exists with defined player properties
+- No existing helpers in `tests/helpers/`
+
+**Input:** `/test-helpers player-factory`
+
+**Expected behavior:**
+1. Skill reads engine (Godot 4 / GDScript) and player GDD for property context
+2. Skill generates a deterministic `PlayerFactory` helper in GDScript:
+ - `create_player(health: int = 100, speed: float = 200.0)` function
+ - Returns a player node pre-configured to a known state
+ - Uses dependency injection (no singletons)
+3. Skill asks "May I write to `tests/helpers/player_factory.gd`?"
+4. File is written on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Generated helper is in GDScript (not C# or Blueprint)
+- [ ] Factory function parameters use defaults matching GDD values
+- [ ] Helper uses dependency injection (no Autoload/singleton references)
+- [ ] Filename follows snake_case convention for GDScript
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: No Test Setup Exists — Redirects to /test-setup
+
+**Fixture:**
+- `tests/` directory does not exist
+
+**Input:** `/test-helpers player-factory`
+
+**Expected behavior:**
+1. Skill checks for `tests/` directory — not found
+2. Skill reports: "Test directory not found — test framework must be set up first"
+3. Skill suggests running `/test-setup` before generating helpers
+4. No helper file is created
+
+**Assertions:**
+- [ ] Error message identifies the missing tests/ directory
+- [ ] `/test-setup` is suggested as the prerequisite step
+- [ ] No write tool is called
+- [ ] Verdict is not COMPLETE (blocked state)
+
+---
+
+### Case 3: Helper Already Exists — Offers to extend rather than replace
+
+**Fixture:**
+- `tests/helpers/player_factory.gd` already exists with a `create_player()` function
+- User requests a new `create_enemy()` function be added to the factory
+
+**Input:** `/test-helpers enemy-factory`
+
+**Expected behavior:**
+1. Skill finds an existing `player_factory.gd` and checks if it's the right file
+ to extend (or if a separate `enemy_factory.gd` should be created)
+2. Skill presents options: add `create_enemy()` to existing factory or create
+ `tests/helpers/enemy_factory.gd`
+3. User selects extend; skill drafts the `create_enemy()` function
+4. Skill asks "May I extend `tests/helpers/player_factory.gd`?"
+5. Function is added on approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Existing helper is detected and surfaced
+- [ ] User is given extend vs. new file choice
+- [ ] "May I extend" language is used (not "May I write" for replacement)
+- [ ] Existing `create_player()` is preserved in the extended file
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: System Has No GDD — Notes missing design context in helper
+
+**Fixture:**
+- `technical-preferences.md` has Godot 4 / GDScript
+- `tests/` exists
+- User requests a helper for the "inventory system" but no `design/gdd/inventory.md` exists
+
+**Input:** `/test-helpers inventory-factory`
+
+**Expected behavior:**
+1. Skill looks for `design/gdd/inventory.md` — not found
+2. Skill notes: "No GDD found for inventory — generating helper with placeholder defaults"
+3. Skill generates an `inventory_factory.gd` with generic placeholder values
+ (item_count = 0, max_capacity = 20) and a comment: "# TODO: align defaults
+ with inventory GDD when written"
+4. Skill asks "May I write to `tests/helpers/inventory_factory.gd`?"
+5. File is written; verdict is COMPLETE with advisory note
+
+**Assertions:**
+- [ ] Skill proceeds without GDD (does not block)
+- [ ] Generated helper has placeholder defaults with TODO comment
+- [ ] Missing GDD is noted in the output (advisory warning)
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Director Gate Check — No gate; test-helpers is a scaffolding utility
+
+**Fixture:**
+- Engine configured, tests/ exists
+
+**Input:** `/test-helpers player-factory`
+
+**Expected behavior:**
+1. Skill generates and writes the helper file
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads engine before generating any helper (helpers are engine-specific)
+- [ ] Reads GDD for default values when available
+- [ ] Notes missing GDD context rather than blocking
+- [ ] Detects existing helper files and offers extend rather than replace
+- [ ] Asks "May I write" (or "May I extend") before any file operation
+- [ ] Verdict is COMPLETE when helper is written
+
+---
+
+## Coverage Notes
+
+- Mock/stub helper generation (for dependencies like save systems or audio buses)
+ follows the same pattern as factory helpers and is not separately tested.
+- Unity C# helper generation (using NSubstitute or custom mocks) follows the
+ same logic as Case 1 with language-appropriate output.
+- The case where the requested helper type is not recognized is not tested;
+ the skill would ask the user to clarify the helper type.
diff --git a/CCGS Skill Testing Framework/skills/utility/test-setup.md b/CCGS Skill Testing Framework/skills/utility/test-setup.md
new file mode 100644
index 0000000..60f62d1
--- /dev/null
+++ b/CCGS Skill Testing Framework/skills/utility/test-setup.md
@@ -0,0 +1,173 @@
+# Skill Test Spec: /test-setup
+
+## Skill Summary
+
+`/test-setup` scaffolds the test framework for the project based on the
+configured engine. It creates the `tests/` directory structure defined in
+`coding-standards.md` (unit/, integration/, performance/, playtest/) and
+generates the appropriate test runner configuration for the detected engine:
+GdUnit4 config for Godot, Unity Test Runner asmdef for Unity, or Unreal headless
+runner for Unreal Engine.
+
+Each file or directory created is gated behind a "May I write" ask. If the test
+framework already exists, the skill verifies the configuration rather than
+reinitializing. No director gates apply. The verdict is COMPLETE when the
+scaffold is in place.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" collaborative protocol language before creating files
+- [ ] Has a next-step handoff (e.g., `/test-helpers` to generate helper utilities)
+
+---
+
+## Director Gate Checks
+
+None. `/test-setup` is a scaffolding utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Godot project, scaffolds GdUnit4 test structure
+
+**Fixture:**
+- `technical-preferences.md` has engine set to Godot 4, language GDScript
+- `tests/` directory does not exist yet
+
+**Input:** `/test-setup`
+
+**Expected behavior:**
+1. Skill reads engine from `technical-preferences.md` → Godot 4 + GDScript
+2. Skill drafts the test directory structure: tests/unit/, tests/integration/,
+ tests/performance/, tests/playtest/, and a GdUnit4 runner config file
+3. Skill asks "May I write the tests/ directory structure?"
+4. Directories and GdUnit4 runner script created on approval
+5. Skill confirms the runner script matches the CI command in coding-standards.md:
+ `godot --headless --script tests/gdunit4_runner.gd`
+6. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] All 4 subdirectories (unit/, integration/, performance/, playtest/) are created
+- [ ] GdUnit4 runner config is generated
+- [ ] Runner script path matches coding-standards.md CI command
+- [ ] "May I write" is asked before creating any files
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Unity Project — Scaffolds Unity Test Runner with asmdef
+
+**Fixture:**
+- `technical-preferences.md` has engine set to Unity, language C#
+- `tests/` directory does not exist
+
+**Input:** `/test-setup`
+
+**Expected behavior:**
+1. Skill reads engine → Unity + C#
+2. Skill creates `Tests/` directory with Unity conventions (capitalized)
+3. Skill generates `Tests/Tests.asmdef` and `Tests/Editor/EditorTests.asmdef`
+4. EditMode and PlayMode test runner modes are configured
+5. Skill asks "May I write the Tests/ directory structure?"
+6. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] Unity-specific `Tests/` structure is created (not the Godot structure)
+- [ ] `.asmdef` files are generated
+- [ ] EditMode and PlayMode runner config is present
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 3: Test Framework Already Exists — Verifies config, not re-initialized
+
+**Fixture:**
+- `tests/unit/`, `tests/integration/` exist
+- GdUnit4 runner script exists (Godot project)
+
+**Input:** `/test-setup`
+
+**Expected behavior:**
+1. Skill detects existing tests/ structure
+2. Skill reports: "Test framework already exists — verifying configuration"
+3. Skill checks: runner script path, directory completeness, CI command alignment
+4. If all checks pass: reports "Configuration verified — no changes needed"
+5. If checks fail (e.g., missing tests/performance/): reports specific gap and
+ asks "May I add the missing directories?"
+
+**Assertions:**
+- [ ] Skill does NOT reinitialize when framework exists
+- [ ] Verification checks are performed on existing structure
+- [ ] Only missing parts trigger a "May I write" ask
+- [ ] Verdict is COMPLETE whether everything was OK or gaps were fixed
+
+---
+
+### Case 4: No Engine Configured — Redirects to /setup-engine
+
+**Fixture:**
+- `technical-preferences.md` contains only placeholders (engine not set)
+
+**Input:** `/test-setup`
+
+**Expected behavior:**
+1. Skill reads `technical-preferences.md` and finds engine placeholder
+2. Skill reports: "Engine not configured — cannot scaffold engine-specific test framework"
+3. Skill suggests running `/setup-engine` first
+4. No directories or files are created
+
+**Assertions:**
+- [ ] Error message explicitly states engine is not configured
+- [ ] `/setup-engine` is suggested as the next step
+- [ ] No write tool is called
+- [ ] Verdict is not COMPLETE (blocked state)
+
+---
+
+### Case 5: Director Gate Check — No gate; test-setup is a scaffolding utility
+
+**Fixture:**
+- Engine configured, tests/ does not exist
+
+**Input:** `/test-setup`
+
+**Expected behavior:**
+1. Skill scaffolds and writes all test framework files
+2. No director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads engine from `technical-preferences.md` before generating any scaffold
+- [ ] Generates engine-appropriate test runner config (not generic)
+- [ ] Creates all 4 subdirectories from coding-standards.md
+- [ ] Asks "May I write" before creating files
+- [ ] Detects existing framework and offers verification (not reinitialization)
+- [ ] Verdict is COMPLETE when scaffold is in place
+
+---
+
+## Coverage Notes
+
+- Unreal Engine test scaffolding (headless runner with `-nullrhi`) follows the
+ same pattern as Cases 1 and 2 and is not separately fixture-tested.
+- CI integration file generation (e.g., `.github/workflows/test.yml`) is
+ referenced but not assertion-tested here — it may be a separate skill concern.
+- The case where tests/ exists but is from a different engine (e.g., Unity tests
+ in a now-Godot project) is not tested; the skill would detect the mismatch
+ and offer to reconcile.
diff --git a/CCGS Skill Testing Framework/templates/agent-test-spec.md b/CCGS Skill Testing Framework/templates/agent-test-spec.md
new file mode 100644
index 0000000..875edbd
--- /dev/null
+++ b/CCGS Skill Testing Framework/templates/agent-test-spec.md
@@ -0,0 +1,150 @@
+# Agent Spec: [agent-name]
+
+> **Tier**: [directors | leads | specialists | godot | unity | unreal | operations | creative]
+> **Category**: [director | lead | specialist | engine | operations | creative]
+> **Spec written**: [YYYY-MM-DD]
+
+## Agent Summary
+
+[One paragraph describing this agent's domain, what decisions it owns, and what it
+delegates vs. handles directly. Include which gates it triggers (if any).]
+
+**Domain**: [files/directories this agent owns]
+**Escalates to**: [parent agent — e.g., creative-director for design conflicts]
+**Delegates to**: [sub-agents this agent typically spawns]
+
+---
+
+## Static Assertions
+
+- [ ] Agent file exists at `.claude/agents/[name].md`
+- [ ] Frontmatter has `name`, `description`, `model`, `tools` fields
+- [ ] Domain clearly stated
+- [ ] Escalation path documented
+- [ ] Does not make decisions outside its domain
+
+---
+
+## Test Cases
+
+### Case 1: In-Domain Request — [brief name]
+
+**Scenario**: A request that is clearly within this agent's domain.
+
+**Fixture**:
+- [relevant project state]
+- [input provided to agent]
+
+**Expected behavior**:
+1. Agent accepts the request
+2. Agent produces [specific output type]
+3. Agent asks before writing files (if applicable)
+
+**Assertions**:
+- [ ] Agent handles request within its domain without escalating
+- [ ] Output format matches expected structure
+- [ ] Collaborative protocol followed (ask → draft → approve)
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 2: Out-of-Domain Redirect — [brief name]
+
+**Scenario**: A request that falls outside this agent's domain.
+
+**Fixture**:
+- [request that belongs to a different agent]
+
+**Expected behavior**:
+1. Agent identifies the request is out of domain
+2. Agent redirects to the correct agent
+3. Agent does NOT attempt to handle it
+
+**Assertions**:
+- [ ] Agent declines and redirects (does not silently handle cross-domain work)
+- [ ] Correct agent named in redirect
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 3: Gate Verdict — [brief name]
+
+**Scenario**: Agent is invoked as part of a director gate check.
+
+**Fixture**:
+- [project state presented for review]
+- [gate ID: e.g., CD-PHASE-GATE]
+
+**Expected behavior**:
+1. Agent reads the relevant documents
+2. Agent produces a PASS / CONCERNS / FAIL verdict
+3. Agent does not auto-advance on CONCERNS or FAIL
+
+**Assertions**:
+- [ ] Verdict keyword present in output (PASS, CONCERNS, FAIL)
+- [ ] Reasoning provided for verdict
+- [ ] On CONCERNS/FAIL: work is blocked, not silently continued
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 4: Conflict Escalation — [brief name]
+
+**Scenario**: This agent's domain conflicts with another agent's decision.
+
+**Fixture**:
+- [conflicting decisions from two agents at same tier]
+
+**Expected behavior**:
+1. Agent identifies the conflict
+2. Agent escalates to the shared parent (or creative-director / technical-director)
+3. Agent does NOT unilaterally resolve cross-domain conflicts
+
+**Assertions**:
+- [ ] Conflict surfaced explicitly
+- [ ] Correct escalation path followed
+- [ ] No unilateral cross-domain changes made
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 5: Context Pass-Through — [brief name]
+
+**Scenario**: Agent receives a task with full context from a parent agent.
+
+**Fixture**:
+- [context block passed from parent]
+- [specific sub-task to execute]
+
+**Expected behavior**:
+1. Agent reads and uses the provided context
+2. Agent completes the sub-task
+3. Agent returns result to parent (does not prompt user unnecessarily)
+
+**Assertions**:
+- [ ] Agent uses provided context rather than re-asking for it
+- [ ] Result is scoped to the sub-task, not expanded beyond it
+- [ ] Output format suitable for parent agent consumption
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain — no unilateral cross-domain changes
+- [ ] Escalates conflicts to correct parent
+- [ ] Uses `"May I write"` before file writes (or is read-only)
+- [ ] Presents findings before requesting approval
+- [ ] Does not skip tiers in the delegation hierarchy
+
+---
+
+## Coverage Notes
+
+[Any gaps in coverage, known edge cases not tested, or behaviors that require
+a live agent invocation to verify.]
diff --git a/CCGS Skill Testing Framework/templates/skill-test-spec.md b/CCGS Skill Testing Framework/templates/skill-test-spec.md
new file mode 100644
index 0000000..f2342c7
--- /dev/null
+++ b/CCGS Skill Testing Framework/templates/skill-test-spec.md
@@ -0,0 +1,142 @@
+# Skill Spec: /[skill-name]
+
+> **Category**: [gate | review | authoring | readiness | pipeline | analysis | team | sprint | utility]
+> **Priority**: [critical | high | medium | low]
+> **Spec written**: [YYYY-MM-DD]
+
+## Skill Summary
+
+[One paragraph describing what this skill does, what inputs it takes, and what outputs it produces.]
+
+---
+
+## Static Assertions
+
+These should pass before any behavioral testing:
+
+- [ ] Frontmatter has all required fields (`name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`)
+- [ ] 2+ phase headings found
+- [ ] At least one verdict keyword present (`PASS`, `FAIL`, `CONCERNS`, `APPROVED`, `BLOCKED`, `COMPLETE`, `READY`)
+- [ ] If `allowed-tools` includes Write/Edit: `"May I write"` language present
+- [ ] Next-step handoff section present at end
+
+---
+
+## Director Gate Checks
+
+[Describe which director gates this skill triggers (if any), and under what review mode conditions.]
+
+- **Full mode**: [gates triggered — e.g., CD-PHASE-GATE, TD-PHASE-GATE, PR-PHASE-GATE, AD-PHASE-GATE]
+- **Lean mode**: [phase gates only — e.g., CD-PHASE-GATE only, or none]
+- **Solo mode**: [no gates — skill runs without director review]
+- **N/A**: [if this skill never triggers gates, explain why]
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — [brief name]
+
+**Fixture** (assumed project state):
+- [file/condition 1]
+- [file/condition 2]
+
+**Expected behavior**:
+1. [Step 1]
+2. [Step 2]
+3. [Step 3]
+
+**Assertions**:
+- [ ] [Assertion 1]
+- [ ] [Assertion 2]
+- [ ] [Assertion 3]
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 2: Failure / Blocked — [brief name]
+
+**Fixture**:
+- [missing or invalid condition]
+
+**Expected behavior**:
+1. [Skill detects the problem]
+2. [Skill reports FAIL/BLOCKED]
+3. [Skill does NOT proceed]
+
+**Assertions**:
+- [ ] Skill stops early and does not produce output
+- [ ] Correct error/block message displayed
+- [ ] No files written without user approval
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 3: Mode Variant — [brief name]
+
+**Fixture**:
+- [standard project state]
+- [specific mode or flag set]
+
+**Expected behavior**:
+1. [Behavior differs from happy path because of mode]
+
+**Assertions**:
+- [ ] [Mode-specific assertion]
+- [ ] [Output differs correctly from Case 1]
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 4: Edge Case — [brief name]
+
+**Fixture**:
+- [unusual or boundary condition]
+
+**Expected behavior**:
+1. [Skill handles gracefully]
+
+**Assertions**:
+- [ ] [Edge case handled without crash or silent failure]
+- [ ] [Correct output or message]
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+### Case 5: Director Gate — [brief name]
+
+**Fixture**:
+- [project state that triggers a gate check]
+- Review mode: [full | lean | solo]
+
+**Expected behavior**:
+1. [Gate fires / does not fire based on mode]
+2. [Correct director agents spawned or skipped]
+
+**Assertions**:
+- [ ] In full mode: [specific gates spawn]
+- [ ] In lean mode: [phase gates only, or skip]
+- [ ] In solo mode: no director gates spawn
+- [ ] Skill does not auto-advance past a CONCERNS or FAIL verdict
+
+**Case Verdict**: PASS / FAIL / PARTIAL
+
+---
+
+## Protocol Compliance
+
+- [ ] Uses `"May I write"` before any file writes (or is read-only and skips this)
+- [ ] Presents findings/draft to user before requesting approval
+- [ ] Ends with a recommended next step or follow-up action
+- [ ] Does not auto-create files without user approval
+
+---
+
+## Coverage Notes
+
+[Any gaps in coverage, known edge cases not tested, or conditions that would require
+a live skill run to verify.]
diff --git a/README.md b/README.md
index 1bee058..682f9de 100644
--- a/README.md
+++ b/README.md
@@ -3,14 +3,14 @@
Turn a single Claude Code session into a full game development studio.
- 48 agents. 67 skills. One coordinated AI team.
+ 49 agents. 70 skills. One coordinated AI team.
-
-
+
+
@@ -50,11 +50,11 @@ The result: you still make every decision, but now you have a team that asks the
| Category | Count | Description |
|----------|-------|-------------|
-| **Agents** | 48 | Specialized subagents across design, programming, art, audio, narrative, QA, and production |
-| **Skills** | 68 | Slash commands for every workflow phase (`/start`, `/design-system`, `/create-epics`, `/create-stories`, `/dev-story`, `/story-done`, etc.) |
+| **Agents** | 49 | Specialized subagents across design, programming, art, audio, narrative, QA, and production |
+| **Skills** | 70 | Slash commands for every workflow phase (`/start`, `/design-system`, `/create-epics`, `/create-stories`, `/dev-story`, `/story-done`, etc.) |
| **Hooks** | 12 | Automated validation on commits, pushes, asset changes, session lifecycle, agent audit trail, and gap detection |
| **Rules** | 11 | Path-scoped coding standards enforced when editing gameplay, engine, AI, UI, network code, and more |
-| **Templates** | 37 | Document templates for GDDs, UX specs, ADRs, sprint plans, HUD design, accessibility, and more |
+| **Templates** | 39 | Document templates for GDDs, UX specs, ADRs, sprint plans, HUD design, accessibility, and more |
## Studio Hierarchy
@@ -92,7 +92,7 @@ The template includes agent sets for all three major engines. Use the set that m
## Slash Commands
-Type `/` in Claude Code to access all 67 skills:
+Type `/` in Claude Code to access all 70 skills:
**Onboarding & Navigation**
`/start` `/help` `/project-stage-detect` `/setup-engine` `/adopt`
@@ -100,6 +100,9 @@ Type `/` in Claude Code to access all 67 skills:
**Game Design**
`/brainstorm` `/map-systems` `/design-system` `/quick-design` `/review-all-gdds` `/propagate-design-change`
+**Art & Assets**
+`/art-bible` `/asset-spec` `/asset-audit`
+
**UX & Interface Design**
`/ux-design` `/ux-review`
@@ -110,10 +113,10 @@ Type `/` in Claude Code to access all 67 skills:
`/create-epics` `/create-stories` `/dev-story` `/sprint-plan` `/sprint-status` `/story-readiness` `/story-done` `/estimate`
**Reviews & Analysis**
-`/design-review` `/code-review` `/balance-check` `/asset-audit` `/content-audit` `/scope-check` `/perf-profile` `/tech-debt` `/gate-check` `/consistency-check`
+`/design-review` `/code-review` `/balance-check` `/content-audit` `/scope-check` `/perf-profile` `/tech-debt` `/gate-check` `/consistency-check`
**QA & Testing**
-`/qa-plan` `/smoke-check` `/soak-test` `/regression-suite` `/test-setup` `/test-helpers` `/test-evidence-review` `/test-flakiness` `/skill-test`
+`/qa-plan` `/smoke-check` `/soak-test` `/regression-suite` `/test-setup` `/test-helpers` `/test-evidence-review` `/test-flakiness` `/skill-test` `/skill-improve`
**Production**
`/milestone-review` `/retrospective` `/bug-report` `/bug-triage` `/reverse-document` `/playtest-report`
@@ -171,12 +174,13 @@ CLAUDE.md # Master configuration
.claude/
settings.json # Hooks, permissions, safety rules
agents/ # 48 agent definitions (markdown + YAML frontmatter)
- skills/ # 68 slash commands (subdirectory per skill)
+ skills/ # 70 slash commands (subdirectory per skill)
hooks/ # 12 hook scripts (bash, cross-platform)
rules/ # 11 path-scoped coding standards
+ statusline.sh # Status line script (context%, model, stage, epic breadcrumb)
docs/
workflow-catalog.yaml # 7-phase pipeline definition (read by /help)
- templates/ # 37 document templates
+ templates/ # 39 document templates
src/ # Game source code
assets/ # Art, audio, VFX, shaders, data files
design/ # GDDs, narrative docs, level designs
@@ -217,18 +221,20 @@ You stay in control. The agents provide structure and expertise, not autonomy.
| Hook | Trigger | What It Does |
|------|---------|--------------|
-| `validate-commit.sh` | `git commit` | Checks for hardcoded values, TODO format, JSON validity, design doc sections |
-| `validate-push.sh` | `git push` | Warns on pushes to protected branches |
-| `validate-assets.sh` | File writes in `assets/` | Validates naming conventions and JSON structure |
-| `session-start.sh` | Session open | Loads sprint context and recent git activity |
-| `detect-gaps.sh` | Session open | Detects fresh projects (suggests `/start`) and missing documentation when code/prototypes exist |
+| `validate-commit.sh` | PreToolUse (Bash) | Checks for hardcoded values, TODO format, JSON validity, design doc sections — exits early if the command is not `git commit` |
+| `validate-push.sh` | PreToolUse (Bash) | Warns on pushes to protected branches — exits early if the command is not `git push` |
+| `validate-assets.sh` | PostToolUse (Write/Edit) | Validates naming conventions and JSON structure — exits early if the file is not in `assets/` |
+| `session-start.sh` | Session open | Shows current branch and recent commits for orientation |
+| `detect-gaps.sh` | Session open | Detects fresh projects (suggests `/start`) and missing design docs when code or prototypes exist |
| `pre-compact.sh` | Before compaction | Preserves session progress notes |
| `post-compact.sh` | After compaction | Reminds Claude to restore session state from `active.md` |
| `notify.sh` | Notification event | Shows Windows toast notification via PowerShell |
-| `session-stop.sh` | Session close | Logs accomplishments |
+| `session-stop.sh` | Session close | Archives `active.md` to session log and records git activity |
| `log-agent.sh` | Agent spawned | Audit trail start — logs subagent invocation |
| `log-agent-stop.sh` | Agent stops | Audit trail stop — completes subagent record |
-| `validate-skill-change.sh` | Skill file written | Advises running `/skill-test` after any `.claude/skills/` change |
+| `validate-skill-change.sh` | PostToolUse (Write/Edit) | Advises running `/skill-test` after any `.claude/skills/` change |
+
+> **Note**: `validate-commit.sh`, `validate-assets.sh`, and `validate-skill-change.sh` fire on every Bash/Write tool call and exit immediately (exit 0) when the command or file path is not relevant. This is normal hook behavior — not a performance concern.
**Permission rules** in `settings.json` auto-allow safe operations (git status, test runs) and block dangerous ones (force push, `rm -rf`, reading `.env` files).
@@ -280,7 +286,7 @@ Tested on **Windows 10** with Git Bash. All hooks use POSIX-compatible patterns
---
-*This project is under active development. The agent architecture, skills, and coordination system are solid and usable today — but there's more coming.*
+*Built for Claude Code. Maintained and extended — contributions welcome via [GitHub Discussions](https://github.com/Donchitos/Claude-Code-Game-Studios/discussions).*
## License
diff --git a/UPGRADING.md b/UPGRADING.md
index 2d77bea..406d624 100644
--- a/UPGRADING.md
+++ b/UPGRADING.md
@@ -14,6 +14,7 @@ Or check `README.md` for the version badge.
## Table of Contents
- [Upgrade Strategies](#upgrade-strategies)
+- [v0.4.x → v1.0](#v04x--v10)
- [v0.4.0 → v0.4.1](#v040--v041)
- [v0.3.0 → v0.4.0](#v030--v040)
- [v0.2.0 → v0.3.0](#v020--v030)
@@ -79,6 +80,152 @@ Best when: you didn't use git to set up the template (just downloaded a zip).
---
+## v0.4.1
+
+**Released:** 2026-04-02
+**Key themes:** Art direction integration, asset specification pipeline
+
+### What Changed
+
+| Category | Changes |
+|----------|---------|
+| **New skill** | `/art-bible` — guided section-by-section visual identity authoring (9 sections). Mandatory art-director Task spawn per section. AD-ART-BIBLE sign-off gate. Required at Technical Setup phase. |
+| **New skill** | `/asset-spec` — per-asset visual spec and AI generation prompt generator. Reads art bible + GDD/level/character docs. Writes `design/assets/specs/` files and `design/assets/asset-manifest.md`. Full/lean/solo modes. |
+| **New director gates (3)** | `AD-CONCEPT-VISUAL` (brainstorm Phase 4), `AD-ART-BIBLE` (art bible sign-off), `AD-PHASE-GATE` (gate-check panel) |
+| **`/brainstorm` update** | Added `Task` to allowed-tools (was missing — blocked all director spawning). Art-director now spawns in parallel with creative-director after pillars lock. Visual Identity Anchor written to game-concept.md. |
+| **`/gate-check` update** | Art-director added as 4th parallel director (AD-PHASE-GATE). Visual artifact checks: Visual Identity Anchor (Concept gate), art bible (Technical Setup gate), AD-ART-BIBLE sign-off + character visual profiles (Pre-Production gate). |
+| **`/team-level` update** | Art-director added to Step 1 parallel spawn (visual direction before layout). Level-designer now receives art-director targets as explicit constraints. Step 4 art-director role corrected to production-concepts only. |
+| **`/team-narrative` update** | Art-director added to Phase 2 parallel spawn (character visual design, environmental storytelling, cinematic tone). |
+| **`/design-system` update** | Routing table expanded with art-director + technical-artist for Combat, UI, Dialogue, Animation/VFX, Character categories. Visual/Audio section now mandatory (with art-director Task spawn) for 7 system categories. |
+| **`workflow-catalog.yaml`** | `/art-bible` added to Technical Setup (required). `/asset-spec` added to Pre-Production (optional, repeatable). |
+
+### Files: Safe to Overwrite
+
+**New files to add:**
+```
+.claude/skills/art-bible/SKILL.md
+.claude/skills/asset-spec/SKILL.md
+.claude/docs/director-gates.md
+```
+
+**Existing files to overwrite (no user content):**
+```
+.claude/skills/brainstorm/SKILL.md
+.claude/skills/gate-check/SKILL.md
+.claude/skills/team-level/SKILL.md
+.claude/skills/team-narrative/SKILL.md
+.claude/skills/design-system/SKILL.md
+.claude/docs/workflow-catalog.yaml
+README.md
+UPGRADING.md
+```
+
+### Files: Merge Carefully
+
+None — all changes are to infrastructure files with no user content.
+
+---
+
+## v0.4.x → v1.0
+
+**Released:** 2026-03-29
+**Commit range:** `6c041ac..HEAD`
+**Key themes:** Director gates system, gate intensity modes, Godot C# specialist
+
+### What Changed
+
+| Category | Changes |
+|----------|---------|
+| **New system** | Director gates — named review checkpoints shared across all workflow skills. Defined in `.claude/docs/director-gates.md` |
+| **New feature** | Gate intensity modes: `full` (all director gates), `lean` (phase gates only), `solo` (no directors). Set globally via `production/review-mode.txt` during `/start`, or override per-run with `--review [mode]` on any gate-using skill |
+| **New agent** | `godot-csharp-specialist` — C# code quality in Godot 4 projects |
+| **Skill updates (13)** | All gate-using skills now parse `--review [full\|lean\|solo]` and include it in their argument-hint: `brainstorm`, `map-systems`, `design-system`, `architecture-decision`, `create-architecture`, `create-epics`, `create-stories`, `sprint-plan`, `milestone-review`, `playtest-report`, `prototype`, `story-done`, `gate-check` |
+| **`/start` update** | Added Phase 3b — sets review mode during onboarding, writes `production/review-mode.txt` |
+| **`/setup-engine` update** | Language selection step for Godot (GDScript vs C#) |
+| **Docs** | `director-gates.md` — full gate catalog; `WORKFLOW-GUIDE.md` — Director Review Modes section; `README.md` — review intensity customization |
+
+---
+
+### Files: Safe to Overwrite
+
+**New files to add:**
+```
+.claude/agents/godot-csharp-specialist.md
+.claude/docs/director-gates.md
+```
+
+**Existing files to overwrite (no user content):**
+```
+.claude/skills/brainstorm/SKILL.md
+.claude/skills/map-systems/SKILL.md
+.claude/skills/design-system/SKILL.md
+.claude/skills/architecture-decision/SKILL.md
+.claude/skills/create-architecture/SKILL.md
+.claude/skills/create-epics/SKILL.md
+.claude/skills/create-stories/SKILL.md
+.claude/skills/sprint-plan/SKILL.md
+.claude/skills/milestone-review/SKILL.md
+.claude/skills/playtest-report/SKILL.md
+.claude/skills/prototype/SKILL.md
+.claude/skills/story-done/SKILL.md
+.claude/skills/gate-check/SKILL.md
+.claude/skills/start/SKILL.md
+.claude/skills/quick-design/SKILL.md
+.claude/skills/setup-engine/SKILL.md
+README.md
+docs/WORKFLOW-GUIDE.md
+UPGRADING.md
+```
+
+---
+
+### Files: Merge Carefully
+
+No files require manual merging in this release. All changes are to infrastructure files with no user content.
+
+---
+
+### New Features
+
+#### Director Gates System
+
+All major workflow skills now reference named gate checkpoints defined in
+`.claude/docs/director-gates.md`. Gates are identified by domain prefix and name
+(e.g., `CD-CONCEPT`, `TD-ARCHITECTURE`, `LP-CODE-REVIEW`). Each gate defines
+which director to spawn, what inputs to pass, what verdicts mean, and how
+lean/solo modes affect it.
+
+Skills spawn gates using `Task` with the gate ID and documented inputs, rather
+than embedding director prompts inline. This keeps skill bodies clean and makes
+gate behavior consistent across all workflow phases.
+
+#### Gate Intensity Modes
+
+Three modes let you control how much director review you get:
+
+- **`full`** (default) — all director gates run at every review checkpoint
+- **`lean`** — per-skill director reviews are skipped; phase gates at `/gate-check` still run
+- **`solo`** — no director gates anywhere; `/gate-check` checks artifact existence only
+
+Set globally during `/start` (writes `production/review-mode.txt`). Override any
+individual run with `--review [mode]` on any gate-using skill:
+
+```
+/design-system combat --review lean
+/gate-check concept --review full
+/brainstorm my-game-idea --review solo
+```
+
+---
+
+### After Upgrading
+
+1. Run `/start` once to set your preferred review mode — or create `production/review-mode.txt` manually with `full`, `lean`, or `solo`.
+2. If you're mid-project, review `.claude/docs/director-gates.md` to understand which gates apply to your current phase.
+3. Run `/skill-test static all` to verify all skills pass structural checks.
+
+---
+
## v0.4.0 → v0.4.1
**Released:** 2026-03-26
diff --git a/tests/skills/_fixtures/incomplete-gdd.md b/tests/skills/_fixtures/incomplete-gdd.md
deleted file mode 100644
index 4764657..0000000
--- a/tests/skills/_fixtures/incomplete-gdd.md
+++ /dev/null
@@ -1,51 +0,0 @@
-# GDD: Light Manipulation System
-
-## Overview
-
-The light manipulation system allows players to interact with bioluminescent
-organisms and ancient light conduits to redirect beams of light. Light beams
-illuminate dark areas, power ancient mechanisms, and reveal hidden surfaces.
-
-## Player Fantasy
-
-The player should feel like a puzzle archaeologist — discovering the logic of
-an alien but internally consistent technology. The "aha" moment when a complex
-light path clicks into place should feel earned and satisfying.
-
-## Detailed Rules
-
-- Players can pick up portable light sources (max 3 carried at once)
-- Stationary conduits redirect beams at fixed angles (45°/90°/135°/180°)
-- Light beams are blocked by solid terrain and most objects
-- Living bioluminescent organisms pulse light on a 3-second cycle
-- Ancient mirrors rotate freely and redirect any light beam that touches them
-- A beam must reach a receptor to activate a mechanism
-
-## Formulas
-
-[SECTION MISSING — not yet authored]
-
-## Edge Cases
-
-[SECTION MISSING — not yet authored]
-
-## Dependencies
-
-- **Oxygen System**: Light sources consume no oxygen but picking them up takes
- time (opportunity cost with oxygen drain)
-- **Cave Navigation**: Illuminated paths reveal branching routes not visible
- in darkness
-- Player Inventory System (not yet designed)
-
-## Tuning Knobs
-
-[SECTION MISSING — not yet authored]
-
-## Acceptance Criteria
-
-[SECTION MISSING — not yet authored]
-
----
-
-*Status: Draft — 4/8 required sections populated*
-*Last updated: 2026-03-13*
diff --git a/tests/skills/_fixtures/minimal-game-concept.md b/tests/skills/_fixtures/minimal-game-concept.md
deleted file mode 100644
index ea86346..0000000
--- a/tests/skills/_fixtures/minimal-game-concept.md
+++ /dev/null
@@ -1,62 +0,0 @@
-# Game Concept: Echoes of the Deep
-
-## Overview
-
-Echoes of the Deep is a single-player atmospheric puzzle-platformer set in
-a bioluminescent underwater cave network. Players control a deep-sea diver
-exploring ancient ruins while managing oxygen supplies and manipulating light
-sources to reveal hidden paths and solve environmental puzzles.
-
-## Player Fantasy
-
-The player should feel like a lone explorer uncovering a lost civilization,
-experiencing wonder at beautiful environments, and the satisfying "aha" moment
-when a clever puzzle clicks into place. The oxygen mechanic creates gentle
-pressure without punishing failure harshly.
-
-## Core Loop
-
-1. **Explore** — navigate branching cave sections using light and movement
-2. **Discover** — find oxygen caches, light sources, and ancient mechanisms
-3. **Solve** — manipulate light and environment to unlock new areas
-4. **Progress** — unlock deeper cave sections with escalating complexity
-
-## Game Pillars
-
-1. **Wonder** — every area should contain something visually or mechanically surprising
-2. **Accessibility** — the game should be completable without frustration; oxygen
- manages pacing, not punishment
-3. **Environmental Storytelling** — the ruins tell a story without text exposition
-
-## Target Audience
-
-Casual-to-midcore players who enjoy relaxed exploration games (Subnautica,
-Journey, ABZÛ) and puzzle games that reward observation over reflexes.
-Target age: 16+. Target sessions: 30–90 minutes.
-
-## Unique Selling Points
-
-- Bioluminescent light manipulation as the core puzzle mechanic
-- No enemies — tension comes from environment and resource management
-- Procedurally decorated (handcrafted levels, procedural detail pass)
-
-## Technical Scope
-
-- **Engine**: Godot 4.6
-- **Platform**: PC (Steam), with console ports post-launch
-- **Team size**: Solo developer
-- **Target completion**: 12-month development cycle
-- **Scope**: 4–6 hours main story, 8–12 hours completionist
-
-## Art Direction
-
-Darkly atmospheric with vibrant bioluminescence providing the primary color
-palette. Deep blues, purples, and blacks punctuated by greens, teals, and
-ambers from living organisms and ancient technology.
-
-## Fun Hypothesis
-
-Players will feel rewarded by the combination of visual beauty and the
-satisfying moment of discovering how light manipulation solves each puzzle.
-The oxygen system will create just enough pressure to make exploration feel
-meaningful without making death feel punishing.
diff --git a/tests/skills/catalog.yaml b/tests/skills/catalog.yaml
deleted file mode 100644
index 9134de0..0000000
--- a/tests/skills/catalog.yaml
+++ /dev/null
@@ -1,548 +0,0 @@
-version: 1
-last_updated: "2026-03-26"
-skills:
- # Critical — gate skills that control phase transitions
- - name: gate-check
- spec: tests/skills/gate-check.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: critical
-
- - name: design-review
- spec: tests/skills/design-review.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: critical
-
- - name: story-readiness
- spec: tests/skills/story-readiness.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: critical
-
- - name: story-done
- spec: tests/skills/story-done.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: critical
-
- - name: review-all-gdds
- spec: tests/skills/review-all-gdds.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: critical
-
- - name: architecture-review
- spec: tests/skills/architecture-review.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: critical
-
- # High — pipeline-critical skills
- - name: create-epics
- spec: tests/skills/create-epics.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
-
- - name: create-stories
- spec: tests/skills/create-stories.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
-
- - name: dev-story
- spec: tests/skills/dev-story.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: high
-
- - name: create-control-manifest
- spec: tests/skills/create-control-manifest.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: high
-
- - name: propagate-design-change
- spec: tests/skills/propagate-design-change.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: high
-
- - name: architecture-decision
- spec: tests/skills/architecture-decision.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: high
-
- - name: map-systems
- spec: tests/skills/map-systems.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: high
-
- - name: design-system
- spec: tests/skills/design-system.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: high
-
- - name: consistency-check
- spec: tests/skills/consistency-check.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: high
-
- # Medium — team and sprint management skills
- - name: sprint-plan
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: sprint-status
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-ui
- spec: tests/skills/team-ui.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-combat
- spec: tests/skills/team-combat.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-narrative
- spec: tests/skills/team-narrative.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-audio
- spec: tests/skills/team-audio.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-level
- spec: tests/skills/team-level.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-polish
- spec: tests/skills/team-polish.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-release
- spec: tests/skills/team-release.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: team-live-ops
- spec: tests/skills/team-live-ops.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- # Low — analysis, reporting, utility skills
- - name: skill-test
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium
-
- - name: start
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: help
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: brainstorm
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: project-stage-detect
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: setup-engine
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: quick-design
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: ux-design
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: ux-review
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: code-review
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: balance-check
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: asset-audit
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: reverse-document
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: create-architecture
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: content-audit
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: bug-report
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: hotfix
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: prototype
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: playtest-report
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: perf-profile
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: tech-debt
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: scope-check
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: estimate
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: milestone-review
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: retrospective
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: changelog
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: patch-notes
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: onboard
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: localize
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: launch-checklist
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: release-checklist
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: adopt
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: smoke-check
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: soak-test
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: skill-improve
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: test-setup
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: test-evidence-review
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: test-flakiness
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: test-helpers
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: regression-suite
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: qa-plan
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: bug-triage
- spec: ""
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: low
-
- - name: team-qa
- spec: tests/skills/team-qa.md
- last_static: ""
- last_static_result: ""
- last_spec: ""
- last_spec_result: ""
- priority: medium