- Add CCGS Skill Testing Framework: self-contained QA layer with 72 skill specs, 49 agent specs, catalog.yaml, quality-rubric.md, templates, README, CLAUDE.md - Add /skill-improve: test-fix-retest loop covering static + category checks - Add 4 missing skills: /art-bible, /asset-spec, /day-one-patch, /security-audit - Add /skill-test category mode (Phase 2D) with quality rubric evaluation - Extend /skill-test audit to cover agent specs alongside skill specs - Update all skill-test and skill-improve path refs to CCGS Skill Testing Framework/ - Remove stale tests/skills/ directory (superseded by CCGS Skill Testing Framework) - Add director gate intensity modes (full/lean/solo) to gate-check and related skills Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.4 KiB
Agent Test Spec: analytics-engineer
Agent Summary
- Domain: Telemetry architecture and event schema design, A/B test framework design, player behavior analysis methodology, analytics dashboard specification, event naming conventions, data pipeline design (schema → ingestion → dashboard)
- Does NOT own: Game implementation of event tracking (appropriate programmer), economy design decisions informed by analytics (economy-designer), live ops event design (live-ops-designer)
- Model tier: Sonnet
- Gate IDs: None; produces schemas and test designs; defers implementation to programmers
Static Assertions (Structural)
description:field is present and domain-specific (references telemetry, A/B testing, event tracking, analytics)allowed-tools:list matches the agent's role (Read/Write for design/analytics/ and documentation; no game source or CI tools)- Model tier is Sonnet (default for operations specialists)
- Agent definition does not claim authority over game implementation, economy design, or live ops scheduling
Test Cases
Case 1: In-domain request — tutorial event tracking design
Input: "Design the analytics event tracking for our tutorial. We want to know where players drop off and which steps they complete." Expected behavior:
- Produces a structured event schema for each tutorial step: at minimum,
event_name,properties(step_id, step_name, player_id, session_id, timestamp), andtrigger_condition(when exactly the event fires — on step start, on step complete, on step skip) - Includes a funnel-completion event and a drop-off event (e.g.,
tutorial_step_abandonedif the player exits during a step) - Specifies the event naming convention: snake_case, prefixed by domain (e.g.,
tutorial_step_started,tutorial_step_completed,tutorial_abandoned) - Does NOT produce implementation code — marks implementation as [TO BE IMPLEMENTED BY PROGRAMMER]
- Output is a schema table or structured list, not a narrative description
Case 2: Out-of-domain request — implement the event tracking in code
Input: "Now that the event schema is designed, write the GDScript code to fire these events in our Godot tutorial scene." Expected behavior:
- Does not produce GDScript or any implementation code
- States clearly: "Telemetry implementation in game code is handled by the appropriate programmer (gameplay-programmer or systems-programmer); I provide the event schema and integration requirements"
- Optionally produces an integration spec: what the programmer needs to know to implement correctly (event name, properties, when to fire, what analytics SDK or endpoint to use)
Case 3: Domain boundary — A/B test design for a UI change
Input: "We want to A/B test two versions of our HUD: the current version and a minimal version with only a health bar. Design the test." Expected behavior:
- Produces a complete A/B test design document:
- Hypothesis: The minimal HUD will increase player engagement (measured by session length) by reducing UI cognitive load
- Primary metric: Average session length per player
- Secondary metrics: Tutorial completion rate, Day 1 retention
- Sample size: Calculated estimate based on expected effect size (or notes that exact calculation requires baseline data) — does NOT skip this field
- Duration: Minimum duration (e.g., "at least 2 weeks to capture weekly player behavior patterns")
- Randomization unit: Player ID (not session ID, to prevent players seeing both versions)
- Output is structured as a formal test design, not a bullet list of ideas
Case 4: Conflict — overlapping A/B test player segments
Input: "We have two A/B tests running simultaneously: Test A (HUD variants) affects all players, and Test B (tutorial variants) also affects all players." Expected behavior:
- Flags the overlap as a mutual exclusion violation: if both tests affect the same player, their results are confounded — neither test produces clean data
- Identifies the problem precisely: players in both tests will have HUD and tutorial variants interacting, making it impossible to attribute outcome differences to either variable alone
- Proposes resolution options: (a) run tests sequentially, (b) split the player population into exclusive segments (50% in Test A, 50% in Test B, 0% in both), or (c) run a factorial design if the interaction effect is also of interest (more complex, requires larger sample)
- Does NOT recommend continuing both tests on overlapping populations
Case 5: Context pass — new events consistent with existing schema
Input context: Existing event schema uses the naming convention: [domain]_[object]_[action] in snake_case. Example events: combat_enemy_killed, inventory_item_equipped, tutorial_step_completed.
Input: "Design event tracking for our new crafting system: players gather materials, open the crafting menu, and craft items."
Expected behavior:
- Produces events following the exact naming convention from the provided schema:
crafting_material_gathered,crafting_menu_opened,crafting_item_crafted - Does NOT invent a different naming pattern (e.g.,
gatherMaterial,craftingOpened) even if it might seem natural - Properties follow the same structure as existing events:
player_id,session_id,timestampas standard fields; domain-specific fields (material_type, item_id, crafting_time_seconds) as additional properties - Output explicitly references the provided naming convention as the standard being followed
Protocol Compliance
- Stays within declared domain (event schema design, A/B test design, analytics methodology)
- Redirects implementation requests to appropriate programmers with an integration spec, not code
- Produces complete A/B test designs (hypothesis, metric, sample size, duration, randomization unit) — never partial
- Flags mutual exclusion violations in overlapping A/B tests as data quality blockers
- Follows provided naming conventions exactly; does not invent alternative conventions
Coverage Notes
- Case 3 (A/B test design completeness) is a quality gate — an incomplete test design wastes experiment budget
- Case 4 (mutual exclusion) is a data integrity test — overlapping tests produce unusable results; this must be caught
- Case 5 is the most important context-awareness test; naming convention drift across schemas causes dashboard breakage
- No automated runner; review manually or via
/skill-test