Claude-Code-Game-Studios/CCGS Skill Testing Framework/agents/qa/security-engineer.md
Donchitos a73ff759c9 Add v0.5.0: CCGS Skill Testing Framework, skill-improve, 4 new skills, director gate path fixes
- Add CCGS Skill Testing Framework: self-contained QA layer with 72 skill specs,
  49 agent specs, catalog.yaml, quality-rubric.md, templates, README, CLAUDE.md
- Add /skill-improve: test-fix-retest loop covering static + category checks
- Add 4 missing skills: /art-bible, /asset-spec, /day-one-patch, /security-audit
- Add /skill-test category mode (Phase 2D) with quality rubric evaluation
- Extend /skill-test audit to cover agent specs alongside skill specs
- Update all skill-test and skill-improve path refs to CCGS Skill Testing Framework/
- Remove stale tests/skills/ directory (superseded by CCGS Skill Testing Framework)
- Add director gate intensity modes (full/lean/solo) to gate-check and related skills

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 17:42:32 +10:00

4.5 KiB

Agent Test Spec: security-engineer

Agent Summary

Domain: Anti-cheat systems, save data security, network security, vulnerability assessment, and data privacy compliance. Does NOT own: game logic design (gameplay-programmer), server infrastructure (devops-engineer). Model tier: Sonnet (default). No gate IDs assigned.


Static Assertions (Structural)

  • description: field is present and domain-specific (references anti-cheat / security / vulnerability assessment)
  • allowed-tools: list includes Read, Write, Edit, Bash, Glob, Grep
  • Model tier is Sonnet (default for specialists)
  • Agent definition does not claim authority over game logic design or server deployment

Test Cases

Case 1: In-domain request — appropriate output

Input: "Review the save data system for security issues." Expected behavior:

  • Audits the save data handling for: unencrypted sensitive fields, lack of integrity checksums, world-writable file permissions, and cleartext credentials
  • Flags unencrypted player stats with severity level (e.g., MEDIUM — enables offline stat manipulation)
  • Recommends: AES-256 encryption for sensitive fields, HMAC checksum for tamper detection
  • Produces a prioritized finding list (CRITICAL / HIGH / MEDIUM / LOW)
  • Does NOT change the save system code directly — produces findings for gameplay-programmer or engine-programmer to act on

Case 2: Out-of-domain request — redirects correctly

Input: "Design the matchmaking algorithm to pair players by skill rating." Expected behavior:

  • Does NOT produce matchmaking algorithm design
  • Explicitly states that matchmaking design belongs to network-programmer
  • Redirects the request to network-programmer
  • May note it can review the matchmaking system for security vulnerabilities (e.g., rating manipulation) once the design is complete

Case 3: Critical vulnerability — SQL injection

Input: (Hypothetical) "Review this server-side query handler: query = 'SELECT * FROM users WHERE id=' + user_input" Expected behavior:

  • Flags this as a CRITICAL vulnerability (SQL injection via unsanitized user input)
  • Provides immediate remediation: parameterized queries / prepared statements
  • Recommends a security review of all other query-construction code in the codebase
  • Escalates to technical-director given CRITICAL severity — does not leave the finding unescalated

Case 4: Security vs. performance trade-off

Input: "The anti-cheat validation is adding 8ms to every physics frame and the performance budget is already at 98%." Expected behavior:

  • Surfaces the trade-off clearly: removing/reducing validation creates exploit surface; keeping it blows the performance budget
  • Does NOT unilaterally drop the security measure
  • Escalates to technical-director with both the security risk level and the performance impact quantified
  • Proposes options: async validation (reduces frame impact, adds latency), sampling-based checks (reduces frequency, accepts some cheating), or budget renegotiation

Case 5: Context pass — OWASP guidelines

Input: OWASP Top 10 (2021) provided in context. Request: "Audit the game's login and account system." Expected behavior:

  • Structures the audit findings against the specific OWASP Top 10 categories (A01 Broken Access Control, A02 Cryptographic Failures, A07 Identification and Authentication Failures, etc.)
  • References specific control IDs from the provided list rather than generic advice
  • Flags each finding with the relevant OWASP category
  • Produces a compliance gap list: which controls are met, which are missing, which are partial

Protocol Compliance

  • Stays within declared domain (anti-cheat, save security, network security, vulnerability assessment)
  • Redirects matchmaking / game logic requests to appropriate agents
  • Returns structured findings with severity classification (CRITICAL / HIGH / MEDIUM / LOW)
  • Does not implement fixes unilaterally — produces findings for the responsible programmer
  • Escalates CRITICAL findings to technical-director immediately
  • References specific standards (OWASP, GDPR, etc.) when provided in context

Coverage Notes

  • Save data audit (Case 1) confirms the agent produces actionable, prioritized findings not generic advice
  • CRITICAL vulnerability escalation (Case 3) verifies the agent's severity classification and escalation path
  • Performance trade-off (Case 4) confirms the agent does not silently drop security measures to hit a budget