diff --git a/.claude/skills/architecture-review/SKILL.md b/.claude/skills/architecture-review/SKILL.md index 0a907e4..993e5bd 100644 --- a/.claude/skills/architecture-review/SKILL.md +++ b/.claude/skills/architecture-review/SKILL.md @@ -21,6 +21,11 @@ and Pre-Production. - **`consistency`**: Cross-ADR conflict detection only - **`engine`**: Engine compatibility audit only - **`single-gdd [path]`**: Review architecture coverage for one specific GDD +- **`rtm`**: Requirements Traceability Matrix — extends the standard matrix + to include story file paths and test file paths; outputs + `docs/architecture/requirements-traceability.md` with the full + GDD requirement → ADR → Story → Test chain. Use in Production phase when + stories and tests exist. --- @@ -154,6 +159,60 @@ Count the totals: X covered, Y partial, Z gaps. --- +## Phase 3b: Story and Test Linkage (RTM mode only) + +*Skip this phase unless the argument is `rtm` or `full` with stories present.* + +This phase extends the Phase 3 matrix to include the story that implements +each requirement and the test that verifies it — producing the full +Requirements Traceability Matrix (RTM). + +### Step 3b-1 — Load stories + +Glob `production/epics/**/*.md` (excluding EPIC.md index files). For each +story file: +- Extract `TR-ID` from the story's Context section +- Extract story file path, title, Status +- Extract `## Test Evidence` section — the stated test file path + +### Step 3b-2 — Load test files + +Glob `tests/unit/**/*_test.*` and `tests/integration/**/*_test.*`. +Build an index: system → [test file paths]. + +For each test file path from Step 3b-1, confirm via Glob whether the file +actually exists. Note MISSING if the stated path does not exist. + +### Step 3b-3 — Build the extended RTM + +For each TR-ID in the Phase 3 matrix, add: +- **Story**: the story file path(s) that reference this TR-ID (may be multiple) +- **Test File**: the test file path stated in the story's Test Evidence section +- **Test Status**: COVERED (test file exists) / MISSING (path stated but not + found) / NONE (no test path stated, story type may be Visual/Feel/UI) / + NO STORY (requirement has no story yet — pre-production gap) + +Extended matrix format: + +``` +## Requirements Traceability Matrix (RTM) + +| TR-ID | GDD | Requirement | ADR | Story | Test File | Test Status | +|-------|-----|-------------|-----|-------|-----------|-------------| +| TR-combat-001 | combat.md | Hitbox < 1 frame | ADR-0003 | story-001-hitbox.md | tests/unit/combat/hitbox_test.gd | COVERED | +| TR-combat-002 | combat.md | Combo window | — | story-002-combo.md | — | NONE (Visual/Feel) | +| TR-inventory-001 | inventory.md | Persistent storage | ADR-0005 | — | — | NO STORY | +``` + +RTM coverage summary: +- COVERED: [N] — requirements with ADR + story + passing test +- MISSING test: [N] — story exists but test file not found +- NO STORY: [N] — requirements with ADR but no story yet +- NO ADR: [N] — requirements without architectural coverage (from Phase 3 gaps) +- Full chain complete (COVERED): [N/total] ([%]) + +--- + ## Phase 4: Cross-ADR Conflict Detection Compare every ADR against every other ADR to detect contradictions. A conflict @@ -392,6 +451,67 @@ Ask: "May I write this review to `docs/architecture/architecture-review-[date].m Also ask: "May I update `docs/architecture/architecture-traceability.md` with the current matrix? This is the living index that future reviews update incrementally." +### RTM Output (rtm mode only) + +For `rtm` mode, additionally ask: "May I write the full Requirements Traceability +Matrix to `docs/architecture/requirements-traceability.md`?" + +RTM file format: + +```markdown +# Requirements Traceability Matrix (RTM) + +> Last Updated: [date] +> Mode: /architecture-review rtm +> Coverage: [N]% full chain complete (GDD → ADR → Story → Test) + +## How to read this matrix + +| Column | Meaning | +|--------|---------| +| TR-ID | Stable requirement ID from tr-registry.yaml | +| GDD | Source design document | +| ADR | Architectural decision governing implementation | +| Story | Story file that implements this requirement | +| Test File | Automated test file path | +| Test Status | COVERED / MISSING / NONE / NO STORY | + +## Full Traceability Matrix + +| TR-ID | GDD | Requirement | ADR | Story | Test File | Status | +|-------|-----|-------------|-----|-------|-----------|--------| +[Full matrix rows from Phase 3b] + +## Coverage Summary + +| Status | Count | % | +|--------|-------|---| +| COVERED — full chain complete | [N] | [%] | +| MISSING test — story exists, no test | [N] | [%] | +| NO STORY — ADR exists, not yet implemented | [N] | [%] | +| NO ADR — architectural gap | [N] | [%] | +| **Total requirements** | **[N]** | **100%** | + +## Uncovered Requirements (Priority Fix List) + +Requirements where the full chain is broken, prioritised by layer: + +### Foundation layer gaps +[list with suggested action per gap] + +### Core layer gaps +[list] + +### Feature / Presentation layer gaps +[list — lower priority] + +## History + +| Date | Full Chain % | Notes | +|------|-------------|-------| +| [date] | [%] | Initial RTM | +``` + ### TR Registry Update Also ask: "May I update `docs/architecture/tr-registry.yaml` with new requirement diff --git a/.claude/skills/bug-triage/SKILL.md b/.claude/skills/bug-triage/SKILL.md new file mode 100644 index 0000000..fb7333a --- /dev/null +++ b/.claude/skills/bug-triage/SKILL.md @@ -0,0 +1,242 @@ +--- +name: bug-triage +description: "Read all open bugs in production/qa/bugs/, re-evaluate priority vs. severity, assign to sprints, surface systemic trends, and produce a triage report. Run at sprint start or when the bug count grows enough to need re-prioritization." +argument-hint: "[sprint | full | trend]" +user-invocable: true +allowed-tools: Read, Glob, Grep, Write, Edit +context: fork +--- + +# Bug Triage + +This skill processes the open bug backlog into a prioritised, sprint-assigned +action list. It distinguishes between **severity** (how bad is the impact?) and +**priority** (how urgently must we fix it?), detects systemic trends, and +ensures no critical bug is lost between sprints. + +**Output:** `production/qa/bug-triage-[date].md` + +**When to run:** +- Sprint start — assign open bugs to the new sprint or backlog +- After `/team-qa` completes and new bugs have been filed +- When the bug count crosses 10+ open items + +--- + +## 1. Parse Arguments + +**Modes:** +- `/bug-triage sprint` — triage against the current sprint; assign fixable bugs + to the sprint backlog; defer the rest +- `/bug-triage full` — full triage of all bugs regardless of sprint scope +- `/bug-triage trend` — trend analysis only (no assignment); read-only report +- No argument — run sprint mode if a current sprint exists, else full mode + +--- + +## 2. Load Bug Backlog + +### Step 2a — Discover bug files + +Glob for bug reports in priority order: +1. `production/qa/bugs/*.md` — individual bug report files (preferred format) +2. `production/qa/bugs.md` — single consolidated bug log (fallback) +3. Any `production/qa/qa-plan-*.md` "Bugs Found" table (last resort) + +If no bug files found: +> "No bug files found in `production/qa/bugs/`. If bugs are tracked in a +> different location, adjust the glob pattern. If no bugs exist yet, there is +> nothing to triage." + +Stop and report. Do not proceed if no bugs exist. + +### Step 2b — Load sprint context + +Read the most recently modified file in `production/sprints/` to understand: +- Current sprint number / name +- Stories in scope (for assignment target) +- Sprint capacity constraints (if noted) + +If no sprint file exists: note "No sprint plan found — assigning to backlog only." + +### Step 2c — Load severity reference + +Read `.claude/docs/coding-standards.md` for severity/priority definitions if they +exist. If they do not exist, use the standard definitions in Step 3. + +--- + +## 3. Classify Each Bug + +For each bug, extract or infer: + +### Severity (impact of the bug) + +| Severity | Definition | +|----------|-----------| +| **S1 — Critical** | Game crashes, data loss, or complete feature failure. Cannot proceed past this point. | +| **S2 — High** | Major feature broken but game is still playable. Significant wrong behaviour. | +| **S3 — Medium** | Feature degraded but a workaround exists. Minor wrong behaviour. | +| **S4 — Low** | Visual glitch, cosmetic issue, typo. No gameplay impact. | + +### Priority (urgency of the fix) + +| Priority | Definition | +|----------|-----------| +| **P1 — Fix this sprint** | Blocks QA, blocks release, or is regression from last sprint | +| **P2 — Fix soon** | Should be resolved before the next major milestone | +| **P3 — Backlog** | Would be good to fix, but no active blocking impact | +| **P4 — Won't fix / Deferred** | Accepted risk or out of scope for current product scope | + +### Assignment + +For each P1/P2 bug in `sprint` mode: +- Identify which story or epic the fix belongs to +- Check whether the current sprint has remaining capacity +- If capacity exists: assign to sprint (`Sprint: [current]`) +- If capacity is full: flag as `Priority overflow — consider pulling from sprint` + +For `full` mode: assign all P1 to current sprint, P2 to next sprint estimate, +P3+ to backlog. + +### Deviation check + +Flag bugs that suggest **systematic problems**: +- 3+ bugs from the same system in the same sprint → "Potential design or + implementation quality issue in [system]" +- 2+ S1/S2 bugs in the same story → "Story may need to be reopened and + re-reviewed before shipping" +- Bug filed against a story marked Complete → "Regression in completed story — + story should be re-opened in sprint tracking" + +--- + +## 4. Trend Analysis + +After classifying all bugs, generate trend metrics: + +### Volume trends +- Total open bugs: [N] +- Opened this sprint: [N] +- Closed this sprint: [N] +- Net change: [+N / -N] + +### System hot spots +- Which system has the most open bugs? +- Which system has the highest S1/S2 ratio? + +### Age analysis +- How many bugs are older than 2 sprints? +- Are any S1/S2 bugs un-assigned (sprint = none)? + +### Regression indicator +- Any bugs filed against previously-completed stories? +- Count: [N] regression bugs (story reopened implied) + +--- + +## 5. Generate Triage Report + +```markdown +# Bug Triage Report + +> **Date**: [date] +> **Mode**: [sprint | full | trend] +> **Generated by**: /bug-triage +> **Open bugs processed**: [N] +> **Sprint in scope**: [sprint name, or "N/A"] + +--- + +## Triage Summary + +| Priority | Count | Notes | +|----------|-------|-------| +| P1 — Fix this sprint | [N] | [N] assigned to sprint, [N] overflow | +| P2 — Fix soon | [N] | Scheduled for next sprint | +| P3 — Backlog | [N] | Deferred | +| P4 — Won't fix | [N] | Accepted risk | + +**Critical (S1/S2) unfixed count**: [N] + +--- + +## P1 Bugs — Fix This Sprint + +| ID | System | Severity | Summary | Assigned to | Story | +|----|--------|----------|---------|-------------|-------| +| BUG-NNN | [system] | S[1-4] | [one-line description] | [sprint] | [story path] | + +--- + +## P2 Bugs — Fix Soon + +| ID | System | Severity | Summary | Target Sprint | +|----|--------|----------|---------|---------------| +| BUG-NNN | [system] | S[1-4] | [one-line description] | Sprint [N+1] | + +--- + +## P3/P4 Bugs — Backlog / Won't Fix + +| ID | System | Severity | Summary | Disposition | +|----|--------|----------|---------|-------------| +| BUG-NNN | [system] | S4 | [one-line description] | Backlog | + +--- + +## Systemic Issues Flagged + +[List any patterns from Step 3 deviation check, or "None identified."] + +--- + +## Trend Analysis + +**Volume**: [N] open / [+N] net change this sprint +**Hot spot**: [system with most bugs] +**Regressions**: [N] bugs against completed stories +**Aged bugs (>2 sprints old)**: [N] + +[If N aged S1/S2 bugs > 0:] +> ⚠️ [N] high-severity bugs have been open for more than 2 sprints without +> assignment. These represent accepted risk that should be explicitly reviewed. + +--- + +## Recommended Actions + +1. [Most urgent action — usually "fix P1 bugs before QA hand-off"] +2. [Second action — usually "investigate [hot spot system] quality"] +3. [Third action — optional improvement] +``` + +--- + +## 6. Write and Gate + +Present the report in conversation, then ask: + +"May I write this triage report to `production/qa/bug-triage-[date].md`?" + +Write only after approval. + +After writing: +- If any S1 bugs are unassigned: "S1 bugs must be assigned before the sprint + can be considered healthy. Run `/sprint-status` to see current capacity." +- If regression bugs exist: "Regressions found — consider re-opening the + affected stories in sprint tracking and running `/smoke-check` to re-gate." +- If no P1 bugs exist: "No P1 bugs — build is in good shape for QA hand-off." + +--- + +## Collaborative Protocol + +- **Never close or mark bugs Won't Fix without user approval** — surface them + as P4 candidates and ask: "Are these acceptable as Won't Fix?" +- **Never auto-assign to a sprint at capacity** — flag overflow and let the + sprint owner decide what to pull +- **Severity is objective; priority is a team decision** — present severity + classifications as recommendations, not mandates +- **Trend data is informational** — do not block work on trend findings alone; + surface them as observations diff --git a/.claude/skills/regression-suite/SKILL.md b/.claude/skills/regression-suite/SKILL.md new file mode 100644 index 0000000..749fd2f --- /dev/null +++ b/.claude/skills/regression-suite/SKILL.md @@ -0,0 +1,249 @@ +--- +name: regression-suite +description: "Map test coverage to GDD critical paths, identify fixed bugs without regression tests, flag coverage drift from new features, and maintain tests/regression-suite.md. Run after implementing a bug fix or before a release gate." +argument-hint: "[update | audit | report]" +user-invocable: true +allowed-tools: Read, Glob, Grep, Write, Edit +context: fork +--- + +# Regression Suite + +This skill ensures that every bug fix is backed by a test that would have +caught the original bug — and that the regression suite stays current as the +game evolves. It also detects when new features have been added without +corresponding regression coverage. + +A regression suite is not a new test category — it is a **curated list of +tests already in `tests/`** that collectively cover the game's critical paths +and known failure points. This skill maintains that list. + +**Output:** `tests/regression-suite.md` + +**When to run:** +- After fixing a bug (confirm a regression test was written or identify gap) +- Before a release gate (`/gate-check polish` requires regression suite exists) +- As part of sprint close to detect coverage drift + +--- + +## 1. Parse Arguments + +**Modes:** +- `/regression-suite update` — scan new bug fixes this sprint and check + for regression test presence; add new tests to the suite manifest +- `/regression-suite audit` — full audit of all GDD critical paths vs. + existing test coverage; flag paths with no regression test +- `/regression-suite report` — read-only status report (no writes); suitable + for sprint reviews +- No argument — run `update` if a sprint is active, else `audit` + +--- + +## 2. Load Context + +### Step 2a — Load existing regression suite + +Read `tests/regression-suite.md` if it exists. Extract: +- Total registered regression tests +- Last updated date +- Any tests flagged as `STALE` or `QUARANTINED` + +If it does not exist: note "No regression suite found — will create one." + +### Step 2b — Load test inventory + +Glob all test files: +``` +tests/unit/**/*_test.* +tests/integration/**/*_test.* +tests/regression/**/* +``` + +For each file, note the system (from directory path) and file name. +Do not read test file contents unless needed for name-to-test mapping. + +### Step 2c — Load GDD critical paths + +For `audit` mode: read `design/gdd/systems-index.md` to get all systems. +For each MVP-tier system, read its GDD and extract: +- Acceptance Criteria (these define the critical paths) +- Formulas section (formulas must have regression tests) +- Edge Cases section (known edge cases should have regression tests) + +For `update` mode: skip full GDD scan. Instead read the current sprint plan +and story files to find stories with Status: Complete this sprint. + +### Step 2d — Load closed bugs + +Glob `production/qa/bugs/*.md` and filter for bugs with a `Status: Closed` +or `Status: Fixed` field. Note: +- Which story or system the bug was in +- Whether a regression test was mentioned in the fix description + +--- + +## 3. Map Coverage — Critical Paths + +For `audit` mode only: + +For each GDD acceptance criterion, determine whether a test exists: + +1. Grep `tests/unit/[system]/` and `tests/integration/[system]/` for file names + and function names related to the criterion's key noun/verb +2. Assign coverage: + +| Status | Meaning | +|--------|---------| +| **COVERED** | A test file exists that targets this criterion's logic | +| **PARTIAL** | A test exists but doesn't cover all cases (e.g. happy path only) | +| **MISSING** | No test found for this critical path | +| **EXEMPT** | Visual/Feel or UI criterion — not automatable by design | + +3. Elevate MISSING items that correspond to formulas or state machines to + **HIGH PRIORITY** gap — these are the most likely regression sources. + +--- + +## 4. Map Coverage — Fixed Bugs + +For each closed bug: + +1. Extract the system slug from the bug's metadata +2. Grep `tests/unit/[system]/` and `tests/integration/[system]/` for a test + that references the bug ID or the specific failure scenario +3. Assign: + - **HAS REGRESSION TEST** — a test was found that would catch this bug + - **MISSING REGRESSION TEST** — bug was fixed but no test guards against recurrence + +For MISSING REGRESSION TEST items: +- Flag them as regression gaps +- Suggest the test file path: `tests/unit/[system]/[bug-slug]_regression_test.[ext]` +- Note: "Without this test, this bug can silently return in a future sprint." + +--- + +## 5. Detect Coverage Drift + +Coverage drift occurs when the game grows but the regression suite doesn't. + +Check for drift indicators: +- Stories completed this sprint with no corresponding test files in `tests/` +- New systems added to `systems-index.md` since the last regression-suite update +- GDD sections added or revised since the regression suite was last updated + (use Grep on GDD file modification hints if available, or ask the user) +- `tests/regression-suite.md` last-updated date vs. current date — if gap > + 2 sprints, flag as likely stale + +--- + +## 6. Generate Report and Suite Manifest + +### Report format (in conversation) + +``` +## Regression Suite Status + +**Mode**: [update | audit | report] +**Existing registered tests**: [N] +**Test files scanned**: [N] + +### Critical Path Coverage (audit mode only) +| System | Total ACs | Covered | Partial | Missing | Exempt | +|--------|-----------|---------|---------|---------|--------| +| [name] | [N] | [N] | [N] | [N] | [N] | + +**Coverage rate (non-exempt)**: [N]% + +### Bug Regression Coverage +| Bug ID | System | Severity | Has Regression Test? | +|--------|--------|----------|----------------------| +| BUG-NNN | [system] | S[N] | YES / NO ⚠ | + +**Bugs without regression tests**: [N] + +### Coverage Drift Indicators +[List new systems or stories with no test coverage, or "None detected."] + +### Recommended New Regression Tests +| Priority | System | Suggested Test File | Covers | +|----------|--------|---------------------|--------| +| HIGH | [system] | `tests/unit/[system]/[slug]_regression_test.[ext]` | BUG-NNN / AC-[N] | +| MEDIUM | [system] | `tests/unit/[system]/[slug]_test.[ext]` | [criterion] | +``` + +### Suite manifest format (`tests/regression-suite.md`) + +The manifest is a curated index — not the tests themselves, but a registry +of which tests should always pass before a release: + +```markdown +# Regression Suite Manifest + +> Last Updated: [date] +> Total registered tests: [N] +> Coverage: [N]% of GDD critical paths + +## How to run + +[Engine-specific command to run all regression tests] + +## Registered Regression Tests + +### [System Name] + +| Test File | Test Function (if known) | Covers | Added | +|-----------|--------------------------|--------|-------| +| `tests/unit/[system]/[file]_test.[ext]` | `test_[scenario]` | AC-N / BUG-NNN | [date] | + +## Known Gaps + +Tests that should exist but don't yet: + +| Priority | System | Suggested Path | Covers | Reason Not Yet Written | +|----------|--------|----------------|--------|------------------------| +| HIGH | [system] | `tests/unit/[system]/[path]` | BUG-NNN | Bug fixed without test | + +## Quarantined Tests + +Tests that are flaky or disabled (do not run in CI): + +| Test File | Function | Reason | Quarantined Since | +|-----------|----------|--------|-------------------| +| (none) | | | | +``` + +--- + +## 7. Write Output + +Ask: "May I write/update `tests/regression-suite.md` with the current +regression suite manifest?" + +For `update` mode: append new entries; never remove existing entries +(use `Edit` with targeted insertions). +For `audit` mode: rewrite the full manifest with updated coverage data. +For `report` mode: do not write anything. + +After writing (if approved): + +- For each HIGH priority gap: "Consider creating the missing regression test + before the next sprint. Run `/test-helpers` to scaffold the test file." +- If bug regression gaps > 0: "These bugs can silently return without regression + tests. The next sprint should include a story to write the missing tests." +- If coverage drift detected: "Regression suite may be drifting. Consider + running `/regression-suite audit` at the next sprint boundary." + +--- + +## Collaborative Protocol + +- **Never remove existing regression tests from the manifest** without + explicit user approval — removing a test that was deliberately written is a + regression risk itself +- **Gaps are advisory, not blocking** — surface them clearly but do not prevent + other work from proceeding (except at release gate where regression suite is required) +- **Quarantine is not deletion** — tests with intermittent failures should be + quarantined (noted in manifest) but not removed; they should be fixed by + `/test-flakiness` +- **Ask before writing** — always confirm before creating or updating the manifest diff --git a/.claude/skills/smoke-check/SKILL.md b/.claude/skills/smoke-check/SKILL.md index f1df419..0c0a0c7 100644 --- a/.claude/skills/smoke-check/SKILL.md +++ b/.claude/skills/smoke-check/SKILL.md @@ -1,7 +1,7 @@ --- name: smoke-check description: "Run the critical path smoke test gate before QA hand-off. Executes the automated test suite, verifies core functionality, and produces a PASS/FAIL report. Run after a sprint's stories are implemented and before manual QA begins. A failed smoke check means the build is not ready for QA." -argument-hint: "[sprint | quick]" +argument-hint: "[sprint | quick | --platform pc|console|mobile|all]" user-invocable: true allowed-tools: Read, Glob, Grep, Bash, Write --- @@ -20,6 +20,27 @@ Handing a broken build to QA wastes their time and demoralises the team. --- +## Parse Arguments + +Arguments can be combined: `/smoke-check sprint --platform console` + +**Base mode** (first argument, default: `sprint`): +- `sprint` — full smoke check against the current sprint's stories +- `quick` — skip coverage scan (Phase 3) and Batch 3; use for rapid re-checks + +**Platform flag** (`--platform`, default: none): +- `--platform pc` — add PC-specific checks (keyboard, mouse, windowed mode) +- `--platform console` — add console-specific checks (gamepad, TV safe zones, + platform certification requirements) +- `--platform mobile` — add mobile-specific checks (touch, portrait/landscape, + battery/thermal behaviour) +- `--platform all` — add all platform variants; output per-platform verdict table + +If `--platform` is provided, Phase 4 adds platform-specific batches and +Phase 5 outputs a per-platform verdict table in addition to the overall verdict. + +--- + ## Phase 1: Detect Test Setup Before running anything, understand the environment: @@ -196,6 +217,50 @@ options: Record each response verbatim for the Phase 5 report. +**Platform Batches** *(run only if `--platform` argument was provided)*: + +**PC platform** (`--platform pc` or `--platform all`): +``` +question: "Smoke check — PC Platform: Verify platform-specific behaviour:" +options: + - "Keyboard controls work correctly across all menus and gameplay — PASS" + - "Keyboard controls — FAIL: [describe issue]" + - "Mouse input and cursor visibility correct in all states — PASS" + - "Mouse input — FAIL: [describe issue]" + - "Windowed and fullscreen modes function without graphical issues — PASS" + - "Windowed/fullscreen — FAIL: [describe issue]" + - "Resolution changes apply correctly — PASS" + - "Resolution changes — FAIL: [describe issue]" +``` + +**Console platform** (`--platform console` or `--platform all`): +``` +question: "Smoke check — Console Platform: Verify platform-specific behaviour:" +options: + - "Gamepad input works correctly for all actions — PASS" + - "Gamepad input — FAIL: [describe issue]" + - "UI fits within TV safe zone margins (no text clipped) — PASS" + - "TV safe zone — FAIL: [describe what is clipped]" + - "No keyboard/mouse-only fallbacks shown to gamepad user — PASS" + - "Input prompt inconsistency — FAIL: [describe]" + - "Game boots correctly from cold start (no prior save) — PASS" + - "Cold start — FAIL: [describe issue]" +``` + +**Mobile platform** (`--platform mobile` or `--platform all`): +``` +question: "Smoke check — Mobile Platform: Verify platform-specific behaviour:" +options: + - "Touch controls work correctly for all primary actions — PASS" + - "Touch controls — FAIL: [describe issue]" + - "Game handles orientation change (portrait ↔ landscape) correctly — PASS" + - "Orientation change — FAIL: [describe what breaks]" + - "Background / foreground transitions (home button) handled gracefully — PASS" + - "Background/foreground — FAIL: [describe issue]" + - "No visible performance issues on target device (no thermal throttling signs) — PASS" + - "Mobile performance — FAIL: [describe issue]" +``` + --- ## Phase 5: Generate Report @@ -262,6 +327,20 @@ Stories that must have test evidence before they can be marked COMPLETE via --- +### Platform-Specific Results *(only if `--platform` was provided)* + +| Platform | Checks Run | Passed | Failed | Platform Verdict | +|----------|-----------|--------|--------|-----------------| +| PC | [N] | [N] | [N] | PASS / FAIL | +| Console | [N] | [N] | [N] | PASS / FAIL | +| Mobile | [N] | [N] | [N] | PASS / FAIL | + +**Platform notes**: [any platform-specific observations not captured in pass/fail] + +Any platform with one or more FAIL checks contributes to the overall FAIL verdict. + +--- + ### Verdict: [PASS | PASS WITH WARNINGS | FAIL] [Verdict rules — first matching rule wins:] diff --git a/.claude/skills/soak-test/SKILL.md b/.claude/skills/soak-test/SKILL.md new file mode 100644 index 0000000..7a20e56 --- /dev/null +++ b/.claude/skills/soak-test/SKILL.md @@ -0,0 +1,284 @@ +--- +name: soak-test +description: "Generate a soak test protocol for extended play sessions. Defines what to observe, measure, and log during long play sessions to surface slow leaks, fatigue effects, and edge cases that only appear after sustained play. Primarily used in Polish and Release phases." +argument-hint: "[duration: 30m | 1h | 2h | 4h] [focus: memory | stability | balance | all]" +user-invocable: true +allowed-tools: Read, Glob, Grep, Write +context: fork +--- + +# Soak Test + +A soak test (also called an endurance test) is an extended play session run +with specific observation goals. Unlike a smoke check (broad critical path, +~10 min) or a single-feature playtest (~30 min), a soak test runs for **30 +minutes to several hours** to surface: + +- **Memory leaks** — gradual heap growth that only appears after scene transitions +- **Performance drift** — frame time degradation that worsens over time +- **State accumulation bugs** — issues that only appear after N repetitions + of a mechanic (inventory full, score overflow, AI state corruption) +- **Fun fatigue** — mechanics that feel good in a first session but grow + repetitive over extended play +- **Content exhaustion** — the point where players run out of novel content + +**This skill generates the observation protocol and analysis harness — the +human does the actual playing.** + +**Output:** `production/qa/soak-test-[date]-[duration].md` + +**When to run:** +- Polish phase — before `/gate-check release` +- After fixing a memory or stability issue (regression soak) +- When extended play has not been formally tracked + +--- + +## 1. Parse Arguments + +**Duration** (default: `1h`): +- `30m` — short soak; suitable for testing a single mechanic or scene +- `1h` — standard soak; covers most common leak categories +- `2h` — extended soak; recommended for first full Polish soak +- `4h` — deep soak; required for games with long session design (RPGs, sims) + +**Focus** (default: `all`): +- `memory` — focus on heap size, object count, leak patterns +- `stability` — focus on crash/freeze/hang detection +- `balance` — focus on fun fatigue, content exhaustion, difficulty perception +- `all` — all of the above + +--- + +## 2. Load Context + +Read: +- `.claude/docs/technical-preferences.md` — engine (for engine-specific memory + monitoring guidance), performance budgets (memory ceiling, target FPS) +- `design/gdd/game-concept.md` — intended session length (for comparison against + soak duration), core loop description +- Most recent file in `production/playtests/` — prior playtest findings + (to avoid re-documenting known issues) +- Most recent file in `production/qa/qa-plan-*.md` — current sprint test coverage + (to understand what has been formally tested vs. what the soak covers) + +Note any performance budget targets from technical-preferences.md: +- Memory ceiling: [N MB, or "not set"] +- Target FPS: [N, or "not set"] +- Frame budget: [N ms, or "not set"] + +--- + +## 3. Define Observation Checkpoints + +Based on duration, generate timed checkpoints: + +**30m soak**: T+0, T+10, T+20, T+30 +**1h soak**: T+0, T+15, T+30, T+45, T+60 +**2h soak**: T+0, T+20, T+40, T+60, T+80, T+100, T+120 +**4h soak**: T+0, T+30, T+60, T+90, T+120, T+180, T+240 + +At each checkpoint, the observer records the observation items defined in +Phase 4. + +--- + +## 4. Generate the Soak Test Protocol + +### Memory / Stability observation items (if focus = memory or all) + +Engine-specific monitoring guidance: + +**Godot 4:** +- Open Debugger → Monitors tab; track `Memory → Static Memory` and + `Object Count → Objects` across checkpoints +- Record: Static Memory (KB), Object Count, Orphan Nodes count +- Alert threshold: Memory growth > 20% from T+0 after the first 15 minutes + (some growth on load is expected; sustained growth indicates a leak) +- Note: `Performance.get_monitor(Performance.MEMORY_STATIC)` returns bytes + in Godot 4.6 + +**Unity:** +- Open Memory Profiler (Window → Analysis → Memory Profiler) +- Record: Total Reserved Memory (MB), GC Allocated (MB), Object Count at each checkpoint +- Alert threshold: GC Allocated growing monotonically across 3+ checkpoints + +**Unreal Engine:** +- Use `stat memory` console command at each checkpoint +- Record: Physical Memory Used (MB), Physical Memory Available +- Alert threshold: Physical Memory Used growth > 50MB over the full soak + +### Stability observation items (if focus = stability or all) + +At each checkpoint, note: +- [ ] No crash, hang, or freeze occurred since last checkpoint +- [ ] Frame rate still within target budget ([target FPS] fps) +- [ ] Audio still playing correctly (no desync or silence) +- [ ] All HUD elements still rendering correctly +- [ ] Input responding as expected (no input loss or lag spike) + +### Balance / fatigue observation items (if focus = balance or all) + +Collect subjective observations at each checkpoint: +- [ ] Core mechanic still feels rewarding (Y/N) +- [ ] Perceived difficulty level: [too easy / appropriate / too hard] +- [ ] Any "I've seen this before" moments since last checkpoint? (novel content exhaustion) +- [ ] Any moment of frustration since last checkpoint? Note cause. +- [ ] Any moment of peak engagement since last checkpoint? Note cause. + +--- + +## 5. Generate the Protocol Document + +```markdown +# Soak Test Protocol + +> **Date**: [date] +> **Duration**: [duration] +> **Focus**: [memory | stability | balance | all] +> **Engine**: [engine] +> **Generated by**: /soak-test + +--- + +## Pre-Session Setup + +Before starting the soak: + +- [ ] Game is running from a **fresh launch** (not resumed from a prior session) +- [ ] All background applications closed (minimise OS memory interference) +- [ ] Performance monitoring tool open and recording: + - **Godot**: Debugger → Monitors tab → Memory section visible + - **Unity**: Memory Profiler window open + - **Unreal**: `stat memory` ready in console +- [ ] Soak target confirmed: [session design intent from game concept] +- [ ] Prior known issues to watch for: [from most recent playtest / qa-plan] + +--- + +## Baseline (T+0) — Record Before Playing + +| Metric | Baseline Value | +|--------|---------------| +| Memory / Heap | [record before first frame of gameplay] | +| Object Count | [record] | +| FPS (first 30 seconds) | [record] | +| [Engine-specific metric] | [record] | + +--- + +## Checkpoint Log + +### T+[N] minutes + +**Memory / Stability** *(if applicable)*: + +| Metric | Value | Δ from Baseline | Alert? | +|--------|-------|-----------------|--------| +| Memory / Heap | | | | +| Object Count | | | | +| FPS | | | | +| Crashes / Hangs | | | | + +**Stability checks**: +- [ ] No crash or hang since last checkpoint +- [ ] Frame rate within budget ([N] fps target) +- [ ] Audio correct +- [ ] HUD rendering correctly +- [ ] Input responding correctly + +**Balance / Fatigue** *(if applicable)*: +- Core mechanic still rewarding: Y / N +- Difficulty perception: too easy / appropriate / too hard +- Notable moments: [note any peak engagement or frustration] +- Content exhaustion signs: Y / N — [describe] + +**Free observations**: +*(Note anything unexpected observed since the last checkpoint)* + +--- + +[Repeat Checkpoint Log section for each timed checkpoint] + +--- + +## Post-Session Analysis + +### Memory Trend + +| Checkpoint | Memory | Δ/hr extrapolated | +|------------|--------|-------------------| +| T+0 | | | +| [T+N] | | | + +**Leak detected?** Y / N +**Estimated time to OOM at current rate**: [N hours / not applicable] + +### Stability Summary + +Total crashes: [N] +Total hangs: [N] +Worst FPS observed: [N] fps at [checkpoint] +Performance degradation: stable / mild / severe + +### Balance / Fatigue Summary + +Fun curve: [engaged throughout / fatigue onset at T+N / repetitive from start] +Content exhaustion point: [never / at T+N / early] +Difficulty arc: [appropriate / too easy throughout / difficulty spike at T+N] + +### Issues Found + +| ID | Severity | Checkpoint | Description | +|----|----------|------------|-------------| +| SOAK-001 | S[1-4] | T+[N] | [description] | + +--- + +## Verdict: PASS / PASS WITH CONCERNS / FAIL + +**PASS**: No leaks detected, stability maintained, fun factor consistent +**PASS WITH CONCERNS**: Minor drift or fatigue noted; addressable in Polish +**FAIL**: Memory leak confirmed, stability breach, or severe fun fatigue + +--- + +## Sign-Off + +- **Tester**: [name] — [date] +- **QA Lead review**: [name] — [date] +``` + +--- + +## 6. Write Output + +Present the protocol summary in conversation, then ask: + +"May I write this soak test protocol to +`production/qa/soak-test-[date]-[duration].md`?" + +Write only after approval. + +After writing: + +"Protocol written. To run the soak: +1. Open the file and follow the Pre-Session Setup checklist +2. Record each checkpoint as you play +3. Complete the Post-Session Analysis section when done +4. File bugs from 'Issues Found' to `production/qa/bugs/` +5. Run `/bug-triage sprint` after the session to integrate any S1/S2 issues + +If the verdict is FAIL, run `/smoke-check` again after fixing the issues." + +--- + +## Collaborative Protocol + +- **This skill generates a protocol — humans run it** — never attempt to + run a soak test automatically. The observations require a human observer. +- **Duration should match the game's session design** — a 5-minute game + doesn't need a 4h soak; a city-builder might. Use judgment and ask if unclear. +- **First soak should be `all` focus** — narrow focus (memory-only) is for + regression soaks after a specific fix, not the first pass +- **Ask before writing** — always confirm before creating the protocol file diff --git a/.claude/skills/test-evidence-review/SKILL.md b/.claude/skills/test-evidence-review/SKILL.md new file mode 100644 index 0000000..c3455cf --- /dev/null +++ b/.claude/skills/test-evidence-review/SKILL.md @@ -0,0 +1,249 @@ +--- +name: test-evidence-review +description: "Quality review of test files and manual evidence documents. Goes beyond existence checks — evaluates assertion coverage, edge case handling, naming conventions, and evidence completeness. Produces ADEQUATE/INCOMPLETE/MISSING verdict per story. Run before QA sign-off or on demand." +argument-hint: "[story-path | sprint | system-name]" +user-invocable: true +allowed-tools: Read, Glob, Grep, Write +context: fork +--- + +# Test Evidence Review + +`/smoke-check` verifies that test files **exist** and **pass**. This skill +goes further — it reviews the **quality** of those tests and evidence documents. +A test file that exists and passes may still leave critical behaviour uncovered. +A manual evidence doc that exists may lack the sign-offs required for closure. + +**Output:** Summary report (in conversation) + optional `production/qa/evidence-review-[date].md` + +**When to run:** +- Before QA hand-off sign-off (`/team-qa` Phase 5) +- On any story where test quality is in question +- As part of milestone review for Logic and Integration story quality audit + +--- + +## 1. Parse Arguments + +**Modes:** +- `/test-evidence-review [story-path]` — review a single story's evidence +- `/test-evidence-review sprint` — review all stories in the current sprint +- `/test-evidence-review [system-name]` — review all stories in an epic/system +- No argument — ask which scope: "Single story", "Current sprint", "A system" + +--- + +## 2. Load Stories in Scope + +Based on the argument: + +**Single story**: Read the story file directly. Extract: Story Type, Test +Evidence section, story slug, system name. + +**Sprint**: Read the most recently modified file in `production/sprints/`. +Extract the list of story file paths from the sprint plan. Read each story file. + +**System**: Glob `production/epics/[system-name]/story-*.md`. Read each. + +For each story, collect: +- `Type:` field (Logic / Integration / Visual/Feel / UI / Config/Data) +- `## Test Evidence` section — the stated expected test file path or evidence doc +- Story slug (from file name) +- System name (from directory path) +- Acceptance Criteria list (all checkbox items) + +--- + +## 3. Locate Evidence Files + +For each story, find the evidence: + +**Logic stories**: Glob `tests/unit/[system]/[story-slug]_test.*` + - If not found, also try: Grep in `tests/unit/[system]/` for files + containing the story slug + +**Integration stories**: Glob `tests/integration/[system]/[story-slug]_test.*` + - Also check `production/session-logs/` for playtest records mentioning the story + +**Visual/Feel and UI stories**: Glob `production/qa/evidence/[story-slug]-evidence.*` + +**Config/Data stories**: Glob `production/qa/smoke-*.md` (any smoke check report) + +Note what was found (path) or not found (gap) for each story. + +--- + +## 4. Review Automated Test Quality (Logic / Integration) + +For each test file found, read it and evaluate: + +### Assertion coverage + +Count the number of distinct assertions (lines containing assert, expect, +check, verify, or engine-specific assertion patterns). Low assertion count is +a quality signal — a test that makes only 1 assertion per test function may +not cover the range of expected behaviour. + +Thresholds: +- **3+ assertions per test function** → normal +- **1-2 assertions per test function** → note as potentially thin +- **0 assertions** (test exists but no asserts) → flag as BLOCKING — the + test passes vacuously and proves nothing + +### Edge case coverage + +For each acceptance criterion in the story that contains a number, threshold, +or "when X happens" conditional: check whether a test function name or +test body references that specific case. + +Heuristics: +- Grep test file for "zero", "max", "null", "empty", "min", "invalid", + "boundary", "edge" — presence of any is a positive signal +- If the story has a Formulas section with specific bounds: check whether + tests exercise at minimum/maximum values + +### Naming quality + +Test function names should describe: the scenario + the expected result. +Pattern: `test_[scenario]_[expected_outcome]` + +Flag functions named generically (`test_1`, `test_run`, `testBasic`) as +**naming issues** — they make failures harder to diagnose. + +### Formula traceability + +For Logic stories where the GDD has a Formulas section: check that the test +file contains at least one test whose name or comment references the formula +name or a formula value. A test that exercises a formula without mentioning +it by name is harder to maintain when the formula changes. + +--- + +## 5. Review Manual Evidence Quality (Visual/Feel / UI) + +For each evidence document found, read it and evaluate: + +### Criterion linkage + +The evidence doc should reference each acceptance criterion from the story. +Check: does the evidence doc contain each criterion (or a clear rephrasing)? +Missing criteria mean a criterion was never verified. + +### Sign-off completeness + +Check for three sign-off lines (or equivalent fields): +- Developer sign-off +- Designer / art-lead sign-off (for Visual/Feel) +- QA lead sign-off + +If any are missing or blank: flag as INCOMPLETE — the story cannot be fully +closed without all required sign-offs. + +### Screenshot / artefact completeness + +For Visual/Feel stories: check whether screenshot file paths are referenced +in the evidence doc. If referenced, Glob for them to confirm they exist. + +For UI stories: check whether a walkthrough sequence (step-by-step interaction +log) is present. + +### Date coverage + +Evidence doc should have a date. If the date is earlier than the story's +last major change (heuristic: compare against sprint start date from the sprint +plan), flag as POTENTIALLY STALE — the evidence may not cover the final +implementation. + +--- + +## 6. Build the Review Report + +For each story, assign a verdict: + +| Verdict | Meaning | +|---------|---------| +| **ADEQUATE** | Test/evidence exists, passes quality checks, all criteria covered | +| **INCOMPLETE** | Test/evidence exists but has quality gaps (thin assertions, missing sign-offs) | +| **MISSING** | No test or evidence found for a story type that requires it | + +The overall sprint/system verdict is the worst story verdict present. + +```markdown +## Test Evidence Review + +> **Date**: [date] +> **Scope**: [single story path | Sprint [N] | [system name]] +> **Stories reviewed**: [N] +> **Overall verdict**: ADEQUATE / INCOMPLETE / MISSING + +--- + +### Story-by-Story Results + +#### [Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING] + +**Test/evidence path**: `[path]` (found) / (not found) + +**Automated test quality** *(Logic/Integration only)*: +- Assertion coverage: [N per function on average] — [adequate / thin / none] +- Edge cases: [covered / partial / not found] +- Naming: [consistent / [N] generic names flagged] +- Formula traceability: [yes / no — formula names not referenced in tests] + +**Manual evidence quality** *(Visual/Feel/UI only)*: +- Criterion linkage: [N/M criteria referenced] +- Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗] +- Artefacts: [screenshots present / missing / N/A] +- Freshness: [dated [date] — current / potentially stale] + +**Issues**: +- BLOCKING: [description] *(prevents story-done)* +- ADVISORY: [description] *(should fix before release)* + +--- + +### Summary + +| Story | Type | Verdict | Issues | +|-------|------|---------|--------| +| [title] | Logic | ADEQUATE | None | +| [title] | Integration | INCOMPLETE | Thin assertions (avg 1.2/function) | +| [title] | Visual/Feel | INCOMPLETE | QA lead sign-off missing | +| [title] | Logic | MISSING | No test file found | + +**BLOCKING items** (must resolve before story can be closed): [N] +**ADVISORY items** (should address before release): [N] +``` + +--- + +## 7. Write Output (Optional) + +Present the report in conversation. + +Ask: "May I write this test evidence review to +`production/qa/evidence-review-[date].md`?" + +This is optional — the report is useful standalone. Write only if the user +wants a persistent record. + +After the report: + +- For BLOCKING items: "These must be resolved before `/story-done` can mark the + story Complete. Would you like to address any of them now?" +- For thin assertions: "Consider running `/test-helpers [system]` to see + scaffolded assertion patterns for common cases." +- For missing sign-offs: "Manual sign-off is required from [role]. Share + `[evidence-path]` with them to complete sign-off." + +--- + +## Collaborative Protocol + +- **Report quality issues, do not fix them** — this skill reads and evaluates; + it does not modify test files or evidence documents +- **ADEQUATE means adequate for shipping, not perfect** — avoid nitpicking + tests that are functioning and comprehensive enough to give confidence +- **BLOCKING vs. ADVISORY distinction is important** — only flag BLOCKING when + the gap leaves a story criterion genuinely unverified +- **Ask before writing** — the report file is optional; always confirm before writing diff --git a/.claude/skills/test-flakiness/SKILL.md b/.claude/skills/test-flakiness/SKILL.md new file mode 100644 index 0000000..8f22831 --- /dev/null +++ b/.claude/skills/test-flakiness/SKILL.md @@ -0,0 +1,211 @@ +--- +name: test-flakiness +description: "Detect non-deterministic (flaky) tests by reading CI run logs or test result history. Aggregates pass rates per test, identifies intermittent failures, recommends quarantine or fix, and maintains a flaky test registry. Best run during Polish phase or after multiple CI runs." +argument-hint: "[ci-log-path | scan | registry]" +user-invocable: true +allowed-tools: Read, Glob, Grep, Write, Edit, Bash +context: fork +--- + +# Test Flakiness Detection + +A flaky test is one that sometimes passes and sometimes fails without any code +change. Flaky tests are worse than no tests in some ways — they train the team +to ignore red CI runs, masking genuine failures. This skill identifies them, +explains likely causes, and recommends whether to quarantine or fix each one. + +**Output:** Updated `tests/regression-suite.md` quarantine section + optional +`production/qa/flakiness-report-[date].md` + +**When to run:** +- Polish phase (tests have had many runs; statistical signal is reliable) +- When developers start dismissing CI failures as "probably flaky" +- After `/regression-suite` identifies quarantined tests that need diagnosis + +--- + +## 1. Parse Arguments + +**Modes:** +- `/test-flakiness [ci-log-path]` — analyse a specific CI run log file +- `/test-flakiness scan` — scan all available CI logs in `.github/` or + standard log output directories +- `/test-flakiness registry` — read existing regression-suite.md quarantine + section and provide remediation guidance for already-known flaky tests +- No argument — auto-detect: run `scan` if CI logs are accessible, else + `registry` + +--- + +## 2. Locate CI Log Data + +### Option A — GitHub Actions (preferred) + +Check for test result artifacts: +```bash +ls -t .github/ 2>/dev/null +ls -t test-results/ 2>/dev/null +``` + +For Godot projects: GdUnit4 outputs XML results compatible with JUnit format. +Check `test-results/` for `.xml` files. + +For Unity projects: game-ci test runner outputs NUnit XML to `test-results/` +by default. + +For Unreal projects: automation logs go to `Saved/Logs/`. Grep for +`Result: Success` and `Result: Fail` patterns. + +### Option B — Local log files + +If a path argument is provided, read that file directly. + +### Option C — No log data available + +If no logs found: +> "No CI log data found. To detect flaky tests, this skill needs test result +> history from multiple runs. Options: +> 1. Run the test suite at least 3 times and collect the output logs +> 2. Check CI pipeline output and save a log to `test-results/` +> 3. Run `/test-flakiness registry` to review tests already flagged as flaky +> in `tests/regression-suite.md`" + +Stop and ask the user which option to pursue. + +--- + +## 3. Parse Test Results + +For each CI log or result file found, parse: + +**JUnit XML format** (GdUnit4 / Unity): +- Grep for `25% of runs — quarantine immediately +- **Moderate flakiness**: Fails in 5–25% of runs — investigate and fix soon +- **Low/suspected flakiness**: Fails in 1–5% of runs — monitor; may be + genuinely rare failure + +For each flaky test, classify the likely cause: + +### Cause classification + +| Cause | Symptoms | Fix direction | +|-------|----------|---------------| +| **Timing / async** | Fails after awaiting signals or timers; pass rate correlates with system load | Add explicit await/synchronisation; avoid time-based delays | +| **Order dependency** | Fails when run after specific other tests; passes in isolation | Add proper setup/teardown; ensure test isolation | +| **Random seed** | Fails intermittently with no pattern; involves RNG | Pass explicit seed; don't use `randf()` in tests | +| **Resource leak** | Fails more often later in a test run | Fix cleanup in teardown; check orphan nodes (Godot) or object disposal (Unity) | +| **External state** | Fails when a file, scene, or global exists from a prior test | Isolate test from file system; use in-memory mocks | +| **Floating point** | Fails on comparisons like `== 0.5` | Use epsilon comparison (`is_equal_approx`, `Assert.AreApproximately`) | +| **Scene/prefab load race** | Fails when scenes are not yet ready | Await one frame after instantiation; use `await get_tree().process_frame` | + +Use Grep to check the test file for timing calls, randf, global state access, +or equality comparisons on floats to narrow down the cause. + +--- + +## 5. Recommend Action + +For each flaky test: + +**Quarantine (High flakiness):** +> "Quarantine this test immediately. Disable it in CI by adding +> `@pytest.mark.skip` / `[Ignore]` / `GdUnitSkip` annotation. Log it in +> `tests/regression-suite.md` quarantine section. The test is now opt-in only. +> Fix the root cause before removing quarantine." + +**Investigate and fix soon (Moderate):** +> "This test is intermittently unreliable. Root cause appears to be [cause]. +> Suggested fix: [specific fix based on cause classification]. Do not quarantine +> yet — fix the test directly." + +**Monitor (Low/suspected):** +> "This test shows suspected flakiness. Collect more run data before +> quarantining. Note it as 'suspected' in the regression suite." + +--- + +## 6. Generate Reports + +### In-conversation summary + +``` +## Flakiness Detection Results + +**Runs analysed**: [N] +**Tests tracked**: [N] + +### Flaky Tests Found + +| Test | System | Fail Rate | Likely Cause | Recommendation | +|------|--------|-----------|--------------|----------------| +| [test_name] | [system] | [N]% | Timing | Quarantine + fix async | +| [test_name] | [system] | [N]% | Float comparison | Fix: use epsilon compare | +| [test_name] | [system] | [N]% | Order dependency | Investigate teardown | + +### Clean Tests (no flakiness detected) + +[N] tests ran across [N] runs with consistent results — no flakiness detected. + +### Data Limitations + +[Note if fewer than 5 runs were available — fewer runs = less statistical confidence] +``` + +--- + +## 7. Update Regression Suite + Optional Report File + +Ask: "May I update the quarantine section of `tests/regression-suite.md` +with the flaky tests found?" + +If yes: use `Edit` to append entries to the Quarantined Tests table. +Never remove existing quarantine entries — only add new ones. + +Ask (separately): "May I write a full flakiness report to +`production/qa/flakiness-report-[date].md`?" + +The full report includes per-test analysis with cause details and +engine-specific fix snippets. + +After writing: + +- For each quarantined test: "Add the engine-specific skip annotation to + disable this test in CI. Re-enable after the root cause is fixed." +- For fix-eligible tests: "The fix for [test] is straightforward — + change the equality comparison on line [N] to use `is_equal_approx`." +- Summary: "Once all quarantine annotations are applied, CI should run green. + Schedule fix work for the [N] quarantined tests before the release gate." + +--- + +## Collaborative Protocol + +- **Never delete test files** — quarantine means annotate + list, not remove +- **Statistical confidence matters** — with < 3 runs, flag findings as + "suspected" not "confirmed"; ask if more run data is available +- **Fix is always the goal** — quarantine is temporary; surface the fix + direction even when recommending quarantine +- **Ask before writing** — both the regression-suite update and the report + file require explicit approval +- **Flakiness in CI is a team problem** — surface the list and recommended + actions clearly; do not just silently quarantine without the team knowing diff --git a/.claude/skills/test-helpers/SKILL.md b/.claude/skills/test-helpers/SKILL.md new file mode 100644 index 0000000..5499582 --- /dev/null +++ b/.claude/skills/test-helpers/SKILL.md @@ -0,0 +1,389 @@ +--- +name: test-helpers +description: "Generate engine-specific test helper libraries for the project's test suite. Reads existing test patterns and produces tests/helpers/ with assertion utilities, factory functions, and mock objects tailored to the project's systems. Reduces boilerplate in new test files." +argument-hint: "[system-name | all | scaffold]" +user-invocable: true +allowed-tools: Read, Glob, Grep, Write +context: fork +--- + +# Test Helpers + +Writing test cases is faster and more consistent when common setup, teardown, +and assertion patterns are abstracted into helpers. This skill generates a +`tests/helpers/` library tailored to the project's actual engine, language, +and systems — so every developer writes less boilerplate and more assertions. + +**Output:** `tests/helpers/` directory with engine-specific helper files + +**When to run:** +- After `/test-setup` scaffolds the framework (first time) +- When multiple test files repeat the same setup boilerplate +- When starting to write tests for a new system + +--- + +## 1. Parse Arguments + +**Modes:** +- `/test-helpers [system-name]` — generate helpers for a specific system + (e.g., `/test-helpers combat`) +- `/test-helpers all` — generate helpers for all systems with test files +- `/test-helpers scaffold` — generate only the base helper library (no + system-specific helpers); use this on first run +- No argument — run `scaffold` if no helpers exist, else `all` + +--- + +## 2. Detect Engine and Language + +Read `.claude/docs/technical-preferences.md` and extract: +- `Engine:` value +- `Language:` value +- `Framework:` from the Testing section + +If engine is not configured: "Engine not configured. Run `/setup-engine` first." + +--- + +## 3. Load Existing Test Patterns + +Scan the test directory for patterns already in use: + +``` +Glob pattern="tests/**/*_test.*" (all test files) +``` + +For a representative sample (up to 5 files), read the test files and extract: +- Setup patterns (how `before_each` / `setUp` / fixtures are written) +- Common assertion patterns (what is being asserted most often) +- Object creation patterns (how game objects or scenes are instantiated in tests) +- Mock/stub patterns (how dependencies are replaced) + +This ensures generated helpers match the project's existing style, not a +generic template. + +Also read: +- `design/gdd/systems-index.md` — to know which systems exist +- In-scope GDD(s) — to understand what data types and values need testing +- `docs/architecture/tr-registry.yaml` — to map requirements to tested systems + +--- + +## 4. Generate Engine-Specific Helpers + +### Godot 4 (GDUnit4 / GDScript) + +**Base helper** (`tests/helpers/game_assertions.gd`): + +```gdscript +## Game-specific assertion utilities for [Project Name] tests. +## Extends GdUnitAssertions with domain-specific helpers. +## +## Usage: +## var assert = GameAssertions.new() +## assert.health_in_range(entity, 0, entity.max_health) + +class_name GameAssertions +extends RefCounted + +## Assert a value is within the inclusive range [min_val, max_val]. +## Use for any formula output that has defined bounds in a GDD. +static func assert_in_range( + value: float, + min_val: float, + max_val: float, + label: String = "value" +) -> void: + assert( + value >= min_val and value <= max_val, + "%s %.2f is outside expected range [%.2f, %.2f]" % [label, value, min_val, max_val] + ) + +## Assert a signal was emitted during a callable block. +## Usage: assert_signal_emitted(entity, "health_changed", func(): entity.take_damage(10)) +static func assert_signal_emitted( + obj: Object, + signal_name: String, + action: Callable +) -> void: + var emitted := false + obj.connect(signal_name, func(_args): emitted = true) + action.call() + assert(emitted, "Expected signal '%s' to be emitted, but it was not." % signal_name) + +## Assert that a callable does NOT emit a signal. +static func assert_signal_not_emitted( + obj: Object, + signal_name: String, + action: Callable +) -> void: + var emitted := false + obj.connect(signal_name, func(_args): emitted = true) + action.call() + assert(not emitted, "Expected signal '%s' NOT to be emitted, but it was." % signal_name) + +## Assert a node exists at path within a parent. +static func assert_node_exists(parent: Node, path: NodePath) -> void: + assert( + parent.has_node(path), + "Expected node at path '%s' to exist." % str(path) + ) +``` + +**Factory helper** (`tests/helpers/game_factory.gd`): + +```gdscript +## Factory functions for creating test game objects. +## Returns minimal objects configured for unit testing (no scene tree required). +## +## Usage: var player = GameFactory.make_player(health: 100) + +class_name GameFactory +extends RefCounted + +## Create a minimal player-like object for testing. +## Override fields as needed. +static func make_player(health: int = 100) -> Node: + var player = Node.new() + player.set_meta("health", health) + player.set_meta("max_health", health) + return player +``` + +**Scene helper** (`tests/helpers/scene_runner_helper.gd`): + +```gdscript +## Utilities for scene-based integration tests. +## Wraps GdUnitSceneRunner for common patterns. + +class_name SceneRunnerHelper +extends GdUnitTestSuite + +## Load a scene and wait one frame for _ready() to complete. +func load_scene_and_wait(scene_path: String) -> Node: + var scene = load(scene_path).instantiate() + add_child(scene) + await get_tree().process_frame + return scene +``` + +--- + +### Unity (NUnit / C#) + +**Base helper** (`tests/helpers/GameAssertions.cs`): + +```csharp +using NUnit.Framework; +using UnityEngine; + +/// +/// Game-specific assertion utilities for [Project Name] tests. +/// Extends NUnit's Assert with domain-specific helpers. +/// +public static class GameAssertions +{ + /// + /// Assert a value is within an inclusive range [min, max]. + /// Use for any formula output defined in GDD Formulas sections. + /// + public static void AssertInRange(float value, float min, float max, string label = "value") + { + Assert.That(value, Is.InRange(min, max), + $"{label} ({value:F2}) is outside expected range [{min:F2}, {max:F2}]"); + } + + /// Assert a UnityEvent or C# event was raised during an action. + public static void AssertEventRaised(ref bool wasCalled, System.Action action, string eventName) + { + wasCalled = false; + action(); + Assert.IsTrue(wasCalled, $"Expected event '{eventName}' to be raised, but it was not."); + } + + /// Assert a component exists on a GameObject. + public static void AssertHasComponent(GameObject obj) where T : Component + { + var component = obj.GetComponent(); + Assert.IsNotNull(component, + $"Expected GameObject '{obj.name}' to have component {typeof(T).Name}."); + } +} +``` + +**Factory helper** (`tests/helpers/GameFactory.cs`): + +```csharp +using UnityEngine; + +/// +/// Factory methods for creating minimal test objects without loading scenes. +/// +public static class GameFactory +{ + /// Create a minimal GameObject with a named component for testing. + public static GameObject MakeGameObject(string name = "TestObject") + { + var go = new GameObject(name); + return go; + } + + /// + /// Create a ScriptableObject of type T for data-driven tests. + /// Dispose with Object.DestroyImmediate after test. + /// + public static T MakeScriptableObject() where T : ScriptableObject + { + return ScriptableObject.CreateInstance(); + } +} +``` + +--- + +### Unreal Engine (C++) + +**Base helper** (`tests/helpers/GameTestHelpers.h`): + +```cpp +#pragma once + +#include "CoreMinimal.h" +#include "Misc/AutomationTest.h" + +/** + * Game-specific assertion macros and helpers for [Project Name] automation tests. + * Include in any test file that needs domain-specific assertions. + * + * Usage: + * GAME_TEST_ASSERT_IN_RANGE(TestName, DamageValue, 10.0f, 50.0f, TEXT("Damage")); + */ + +// Assert a float value is within inclusive range [Min, Max] +#define GAME_TEST_ASSERT_IN_RANGE(TestName, Value, Min, Max, Label) \ + TestTrue( \ + FString::Printf(TEXT("%s (%.2f) in range [%.2f, %.2f]"), Label, Value, Min, Max), \ + (Value) >= (Min) && (Value) <= (Max) \ + ) + +// Assert a UObject pointer is valid (not null, not garbage collected) +#define GAME_TEST_ASSERT_VALID(TestName, Ptr, Label) \ + TestTrue( \ + FString::Printf(TEXT("%s is valid"), Label), \ + IsValid(Ptr) \ + ) + +// Assert an Actor is in the world (spawned successfully) +#define GAME_TEST_ASSERT_SPAWNED(TestName, ActorPtr, ClassName) \ + TestNotNull( \ + FString::Printf(TEXT("Spawned actor of class %s"), TEXT(#ClassName)), \ + ActorPtr \ + ) + +/** + * Helper to create a minimal test world. + * Remember to call World->DestroyWorld(false) in teardown. + */ +namespace GameTestHelpers +{ + inline UWorld* CreateTestWorld(const FString& WorldName = TEXT("TestWorld")) + { + UWorld* World = UWorld::CreateWorld(EWorldType::Game, false); + FWorldContext& WorldContext = GEngine->CreateNewWorldContext(EWorldType::Game); + WorldContext.SetCurrentWorld(World); + return World; + } +} +``` + +--- + +## 5. Generate System-Specific Helpers + +For `[system-name]` or `all` modes, generate a helper per system: + +Read the system's GDD to extract: +- Data types (entity types, component names) +- Formula variables and their bounds +- Common test scenarios mentioned in Edge Cases + +Generate `tests/helpers/[system]_factory.[ext]` with factory functions +specific to that system's objects. + +Example pattern for a `combat` system (Godot/GDScript): + +```gdscript +## Factory and assertion helpers for Combat system tests. +## Generated by /test-helpers combat on [date]. +## Based on: design/gdd/combat.md + +class_name CombatTestFactory +extends RefCounted + +const DAMAGE_MIN := 0 +const DAMAGE_MAX := 999 # From GDD: damage formula upper bound + +## Create a minimal attacker object for damage formula tests. +static func make_attacker(attack: float = 10.0, crit_chance: float = 0.0) -> Node: + var attacker = Node.new() + attacker.set_meta("attack", attack) + attacker.set_meta("crit_chance", crit_chance) + return attacker + +## Create a minimal target object for damage receive tests. +static func make_target(defense: float = 0.0, health: float = 100.0) -> Node: + var target = Node.new() + target.set_meta("defense", defense) + target.set_meta("health", health) + target.set_meta("max_health", health) + return target + +## Assert damage output is within GDD-specified bounds. +static func assert_damage_in_bounds(damage: float) -> void: + GameAssertions.assert_in_range(damage, DAMAGE_MIN, DAMAGE_MAX, "damage") +``` + +--- + +## 6. Write Output + +Present a summary of what will be created: + +``` +## Test Helpers to Create + +Base helpers (engine: [engine]): +- tests/helpers/game_assertions.[ext] +- tests/helpers/game_factory.[ext] +[engine-specific extras] + +System helpers ([mode]): +- tests/helpers/[system]_factory.[ext] ← from [system] GDD +``` + +Ask: "May I write these helper files to `tests/helpers/`?" + +**Never overwrite existing files.** If a file already exists, report: +"Skipping `[path]` — already exists. Remove the file manually if you want it +regenerated." + +After writing: + +"Helper files created. To use them in a test: +- Godot: `class_name` is auto-imported — no explicit import needed +- Unity: Add `using` directive or reference the test assembly +- Unreal: `#include \"tests/helpers/GameTestHelpers.h\"`" + +--- + +## Collaborative Protocol + +- **Never overwrite existing helpers** — they may contain hand-written + customisations. Only generate new files that don't exist yet +- **Generated code is a starting point** — the generated factory functions use + metadata patterns for simplicity; adapt to the actual class structure once + the code exists +- **Helpers should reflect the GDD** — bounds and constants in helpers should + trace to GDD Formulas sections, not invented values +- **Ask before writing** — always confirm before creating files in `tests/`