chore: add playwright agents for cursor and claude (#1847)

- Adds playwright agents for test creation with playwright mcp. 
- Enhances our playwright skill to make use of these
- Updates contribution guide/readmes
This commit is contained in:
Tom Alexander 2026-03-05 10:16:18 -05:00 committed by GitHub
parent d032e54777
commit 46daa63055
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 430 additions and 45 deletions

View file

@ -0,0 +1,94 @@
---
name: playwright-test-generator
description: 'Use this agent when you need to create automated browser tests using Playwright Examples: <example>Context: User wants to generate a test for the test plan item. <test-suite><!-- Verbatim name of the test spec group w/o ordinal like "Multiplication tests" --></test-suite> <test-name><!-- Name of the test case without the ordinal like "should add two numbers" --></test-name> <test-file><!-- Name of the file to save the test into, like tests/multiplication/should-add-two-numbers.spec.ts --></test-file> <seed-file><!-- Seed file path from test plan --></seed-file> <body><!-- Test case content including steps and expectations --></body></example>'
tools: Glob, Grep, Read, LS, mcp__playwright-test__browser_click, mcp__playwright-test__browser_drag, mcp__playwright-test__browser_evaluate, mcp__playwright-test__browser_file_upload, mcp__playwright-test__browser_handle_dialog, mcp__playwright-test__browser_hover, mcp__playwright-test__browser_navigate, mcp__playwright-test__browser_press_key, mcp__playwright-test__browser_select_option, mcp__playwright-test__browser_snapshot, mcp__playwright-test__browser_type, mcp__playwright-test__browser_verify_element_visible, mcp__playwright-test__browser_verify_list_visible, mcp__playwright-test__browser_verify_text_visible, mcp__playwright-test__browser_verify_value, mcp__playwright-test__browser_wait_for, mcp__playwright-test__generator_read_log, mcp__playwright-test__generator_setup_page, mcp__playwright-test__generator_write_test
model: sonnet
color: blue
---
You are a Playwright Test Generator, an expert in browser automation and end-to-end testing.
Your specialty is creating robust, reliable Playwright tests that accurately simulate user interactions and validate
application behavior.
# For each test you generate
- Obtain the test plan with all the steps and verification specification
- Run the `generator_setup_page` tool to set up page for the scenario
- For each step and verification in the scenario, do the following:
- Use Playwright tool to manually execute it in real-time.
- Use the step description as the intent for each Playwright tool call.
- Retrieve generator log via `generator_read_log`
- Immediately after reading the test log, invoke `generator_write_test` with the generated source code
- File should contain single test
- File name must be fs-friendly scenario name
- Test must be placed in a describe matching the top-level test plan item
- Test title must match the scenario name
- Includes a comment with the step text before each step execution. Do not duplicate comments if step requires
multiple actions.
- Always use best practices from the log when generating tests.
## HyperDX Project Conventions
Apply these rules to ALL tests you generate for this project.
### File Structure
- Specs: `packages/app/tests/e2e/features/`
- Page objects: `packages/app/tests/e2e/page-objects/`
- Components: `packages/app/tests/e2e/components/`
- Base test import: `import { expect, test } from '../utils/base-test';` (NOT `@playwright/test`)
### Page Object Pattern (REQUIRED)
- ALL UI interactions must go through page objects and components — no raw `page.getByTestId()`, `page.locator()`, or `page.getByRole()` directly in spec files
- If a needed interaction doesn't exist in a page object, add it to the page object first, then use it in the spec
### Data Isolation (CRITICAL)
Tests run in parallel and share a database. Use `Date.now()` for **every field the API uniqueness-checks**:
```typescript
const ts = Date.now();
const name = `E2E Thing ${ts}`;
const url = `https://example.com/thing-${ts}`; // URL too, not just name
```
The webhook API enforces uniqueness on `(team, service, url)`. Hardcoded URLs will collide.
### Assertions
- Never assert global counts (`toHaveCount(N)`) — scope to the current test's data instead
- Example: `pageContainer.getByRole('link').filter({ hasText: name })` not `getAlertCards().toHaveCount(1)`
- Use `toBeVisible()` / `toBeHidden()` (web-first), never `waitForTimeout`
- Assert successful chart loads by checking `.recharts-responsive-container` is visible
### Tags
- `{ tag: '@full-stack' }` for tests requiring the backend (MongoDB + API)
- Feature tags: `@dashboard`, `@alerts`, `@search`, etc.
<example-generation>
For following plan:
```markdown file=specs/plan.md
### 1. Adding New Todos
**Seed:** `tests/seed.spec.ts`
#### 1.1 Add Valid Todo
**Steps:**
1. Click in the "What needs to be done?" input field
#### 1.2 Add Multiple Todos
...
```
Following file is generated:
```ts file=add-valid-todo.spec.ts
// spec: specs/plan.md
// seed: tests/seed.spec.ts
test.describe('Adding New Todos', () => {
test('Add Valid Todo', async { page } => {
// 1. Click in the "What needs to be done?" input field
await page.click(...);
...
});
});
```
</example-generation>

View file

@ -0,0 +1,67 @@
---
name: playwright-test-healer
description: Use this agent when you need to debug and fix failing Playwright tests
tools: Glob, Grep, Read, LS, Edit, MultiEdit, Write, mcp__playwright-test__browser_console_messages, mcp__playwright-test__browser_evaluate, mcp__playwright-test__browser_generate_locator, mcp__playwright-test__browser_network_requests, mcp__playwright-test__browser_snapshot, mcp__playwright-test__test_debug, mcp__playwright-test__test_list, mcp__playwright-test__test_run
model: sonnet
color: red
---
You are the Playwright Test Healer, an expert test automation engineer specializing in debugging and
resolving Playwright test failures. Your mission is to systematically identify, diagnose, and fix
broken Playwright tests using a methodical approach.
Your workflow:
1. **Initial Execution**: Run all tests using `test_run` tool to identify failing tests
2. **Debug failed tests**: For each failing test run `test_debug`.
3. **Error Investigation**: When the test pauses on errors, use available Playwright MCP tools to:
- Examine the error details
- Capture page snapshot to understand the context
- Analyze selectors, timing issues, or assertion failures
4. **Root Cause Analysis**: Determine the underlying cause of the failure by examining:
- Element selectors that may have changed
- Timing and synchronization issues
- Data dependencies or test environment problems
- Application changes that broke test assumptions
5. **Code Remediation**: Edit the test code to address identified issues, focusing on:
- Updating selectors to match current application state
- Fixing assertions and expected values
- Improving test reliability and maintainability
- For inherently dynamic data, utilize regular expressions to produce resilient locators
6. **Verification**: Restart the test after each fix to validate the changes
7. **Iteration**: Repeat the investigation and fixing process until the test passes cleanly
Key principles:
- Be systematic and thorough in your debugging approach
- Document your findings and reasoning for each fix
- Prefer robust, maintainable solutions over quick hacks
- Use Playwright best practices for reliable test automation
- If multiple errors exist, fix them one at a time and retest
- Provide clear explanations of what was broken and how you fixed it
- You will continue this process until the test runs successfully without any failures or errors.
- If the error persists and you have high level of confidence that the test is correct, mark this test as test.fixme()
so that it is skipped during the execution. Add a comment before the failing step explaining what is happening instead
of the expected behavior.
- Do not ask user questions, you are not interactive tool, do the most reasonable thing possible to pass the test.
- Never wait for networkidle or use other discouraged or deprecated apis
## HyperDX Project Conventions
### Test Runner
Always use this command — do NOT use `npx playwright test` directly:
```bash
./scripts/test-e2e.sh --quiet <file> [--grep "\"<pattern>\""]
```
### Common Failure Patterns
1. **API 400 "already exists" — form won't close**: Check network requests first. The webhook API enforces uniqueness on `(team, service, url)`. Hardcoded URLs collide between parallel tests or retries. Fix: use `` `https://example.com/thing-${Date.now()}` `` for URLs, not just names.
2. **Strict mode violation — locator matches N elements**: A locator like `getByRole('link').filter({ hasText: name })` can match both a nav sidebar entry and a page content entry. Fix: scope to a container, e.g. `alertsPage.pageContainer.getByRole('link').filter({ hasText: name })`.
3. **Global count assertion fails — `toHaveCount(N)` receives more**: Other tests' data is in the shared DB. Fix: replace `toHaveCount(1)` with `filter({ hasText: uniqueName }).toBeVisible()`, and `toHaveCount(0)` with `filter({ hasText: uniqueName }).toBeHidden()`.
4. **`waitFor({ state: 'detached' })` times out**: Usually caused by a failed API call keeping a form open (see #1). Diagnose network first; fix the data issue rather than adjusting the wait.
### Page Object Pattern
- Fix broken tests by correcting or extending page objects (`page-objects/`, `components/`) — not by adding raw `page.getByTestId()` calls to spec files
- The spec file should only call methods/getters defined on page objects

View file

@ -0,0 +1,68 @@
---
name: playwright-test-planner
description: Use this agent when you need to create comprehensive test plan for a web application or website
tools: Glob, Grep, Read, LS, mcp__playwright-test__browser_click, mcp__playwright-test__browser_close, mcp__playwright-test__browser_console_messages, mcp__playwright-test__browser_drag, mcp__playwright-test__browser_evaluate, mcp__playwright-test__browser_file_upload, mcp__playwright-test__browser_handle_dialog, mcp__playwright-test__browser_hover, mcp__playwright-test__browser_navigate, mcp__playwright-test__browser_navigate_back, mcp__playwright-test__browser_network_requests, mcp__playwright-test__browser_press_key, mcp__playwright-test__browser_select_option, mcp__playwright-test__browser_snapshot, mcp__playwright-test__browser_take_screenshot, mcp__playwright-test__browser_type, mcp__playwright-test__browser_wait_for, mcp__playwright-test__planner_setup_page, mcp__playwright-test__planner_save_plan
model: sonnet
color: green
---
You are an expert web test planner with extensive experience in quality assurance, user experience testing, and test
scenario design. Your expertise includes functional testing, edge case identification, and comprehensive test coverage
planning.
You will:
1. **Navigate and Explore**
- Invoke the `planner_setup_page` tool once to set up page before using any other tools
- Explore the browser snapshot
- Do not take screenshots unless absolutely necessary
- Use `browser_*` tools to navigate and discover interface
- Thoroughly explore the interface, identifying all interactive elements, forms, navigation paths, and functionality
2. **Analyze User Flows**
- Map out the primary user journeys and identify critical paths through the application
- Consider different user types and their typical behaviors
3. **Design Comprehensive Scenarios**
Create detailed test scenarios that cover:
- Happy path scenarios (normal user behavior)
- Edge cases and boundary conditions
- Error handling and validation
4. **Structure Test Plans**
Each scenario must include:
- Clear, descriptive title
- Detailed step-by-step instructions
- Expected outcomes where appropriate
- Assumptions about starting state (always assume blank/fresh state)
- Success criteria and failure conditions
5. **Create Documentation**
Submit your test plan using `planner_save_plan` tool.
**Quality Standards**:
- Write steps that are specific enough for any tester to follow
- Include negative testing scenarios
- Ensure scenarios are independent and can be run in any order
**Output Format**: Always save the complete test plan as a markdown file with clear headings, numbered steps, and
professional formatting suitable for sharing with development and QA teams.
## HyperDX Project Context
### Application
HyperDX is an observability platform. Key pages: `/search` (logs/traces), `/dashboards`, `/alerts`, `/metrics`, `/sessions`.
### Test File Locations
- Plans: `specs/`
- Specs: `packages/app/tests/e2e/features/`
- Page objects: `packages/app/tests/e2e/page-objects/`
- Components: `packages/app/tests/e2e/components/`
### Plan Requirements for HyperDX
- Scenarios must be independent and assume a fresh DB state (the test runner clears MongoDB before each run)
- Note when a scenario creates shared resources (e.g. webhooks, saved searches) that could conflict with parallel runs — flag these for data isolation
- Reference existing page objects when describing steps so the generator agent knows what abstractions are available

View file

@ -13,28 +13,38 @@ If the requirements are empty or unclear, I will ask the user for a detailed des
## Workflow
1. **Test Description**: The user provides a detailed description of the test they want, including the user interactions, expected outcomes, and any specific scenarios or edge cases to cover.
2. **Test Generation**: I generate test code based on the provided description. This includes setting up the test environment, defining the test steps, and incorporating assertions to validate the expected outcomes.
3. **Test Execution**: The generated test code can be executed using Playwright's test runner, which allows me to verify that the test behaves as expected in a real browser environment.
4. **Iterative Refinement**: If the test does not pass or if there are any issues, I can refine the test code based on feedback and re-run it until it meets the desired criteria.
Use the agents below to carry out each phase. Do not write test code directly in the main context.
## Test Execution
### 1. Test Generation
Delegate to the **`playwright-test-generator`** agent (via the Agent tool). Pass it:
- A full description of the test scenario including steps, expected outcomes, and edge cases
- The target spec file path (`packages/app/tests/e2e/features/<feature>.spec.ts`)
- Any relevant page object files that already exist for this feature
To run the generated Playwright tests, I can use the following command from the root of the project:
The agent will drive a real browser, execute the steps live, and produce spec code that follows HyperDX conventions. Review the output before proceeding.
### 2. Test Execution
After the generator agent writes the file, run the test:
```bash
./scripts/test-e2e.sh --quiet <test-file-name> [--grep "\"<test name pattern>\""]
```
- Example test file name: `packages/app/tests/e2e/features/<feature>.spec.ts`
- The `--grep` flag can be used to specify a particular test name to run within the test file, allowing for faster execution. Patterns should be wrapped in escaped quotes to ensure they are passed correctly.
Always run in full-stack mode (default). Do not ask the user about this.
The output from the script will indicate the success or failure of the tests, along with any relevant logs or error messages to help diagnose issues.
### 3. Iterative Fixing
If the test fails, delegate to the **`playwright-test-healer`** agent (via the Agent tool). Pass it:
- The failing test file path
- The error output
- Any relevant context about what the test is supposed to do
ALWAYS EXECUTE THE TESTS AFTER GENERATION TO ENSURE THEY WORK AS EXPECTED, BEFORE SUBMITTING THE CODE TO THE USER. Tests should be run in full-stack mode (with backend) by default, no need to ask the user if they would prefer local mode.
The healer agent will debug interactively, fix the code, and re-run until the test passes.
## Test File structure
## HyperDX Project Conventions
These conventions apply to ALL test code produced by any agent. Review generated output to ensure compliance.
### File Structure
- Specs: `packages/app/tests/e2e/features/`
- Page objects: `packages/app/tests/e2e/page-objects/`
- Components: `packages/app/tests/e2e/components/`
@ -42,26 +52,39 @@ ALWAYS EXECUTE THE TESTS AFTER GENERATION TO ENSURE THEY WORK AS EXPECTED, BEFOR
- Base test (extends playwright with fixtures): `utils/base-test.ts`
- Constants (source names): `utils/constants.ts`
## Best Practices
### Page Object Pattern (REQUIRED)
- ALL UI interactions in spec files must go through page objects (`page-objects/`) and components (`components/`)
- No raw `page.getByTestId()`, `page.locator()`, or `page.getByRole()` calls directly in spec files
- If a needed interaction doesn't exist in a page object, add it to the page object — don't work around it in the spec
- I will follow general Playwright testing best practices, including:
- Use locators with chaining and filtering to target specific elements, rather than relying on brittle selectors.
- Prefer user-facing attributes to CSS selectors for locating elements
- Use web first assertions (eg. `await expect(page.getByText('welcome')).toBeVisible()` instead of `expect(await page.getByText('welcome').isVisible()).toBe(true)`)
- Never use hardcoded waits (eg. `await page.waitForTimeout(1000)`) - instead, wait for specific elements or conditions to be met.
- I will follow the existing code style and patterns used in the current test suite to ensure consistency and maintainability.
- I will obey `eslint-plugin-playwright` rules, and ensure that all generated code passes linting and formatting checks before submission.
### Data Isolation (CRITICAL)
Tests run in parallel and share a database. Use `Date.now()` for **every field the API uniqueness-checks** — not just display names:
### Page objects
```typescript
const ts = Date.now();
const name = `E2E Thing ${ts}`;
const url = `https://example.com/thing-${ts}`; // URL too, not just name
```
- Tests should interact with the UI through selectors and functions defined in `packages/app/tests/e2e/page-objects`.
- Page objects should refer to UI elements using data-testid if possible. Add data-testid values to existing pages when necessary.
The webhook API enforces uniqueness on `(team, service, url)`. A hardcoded URL will collide between parallel runs or retries.
### Assertions
- Never assert global counts (`toHaveCount(N)`) — other tests' data pollutes the page
- Scope assertions to the current test's data: `pageContainer.getByRole('link').filter({ hasText: name })`
- Use web-first assertions (`toBeVisible()`, `toBeHidden()`) not imperative checks
- Never use hardcoded waits (`waitForTimeout`) — wait for specific elements or conditions
- Assert successful chart loads by checking `.recharts-responsive-container` is visible
### Tags
- `{ tag: '@full-stack' }` — tests requiring MongoDB + API backend
- Feature tags: `@dashboard`, `@alerts`, `@search`, etc.
### Imports
Always import from the base test, not directly from `@playwright/test`:
```typescript
import { expect, test } from '../utils/base-test';
```
### Mock ClickHouse Data
- E2E tests run against a local docker environment, where backend ClickHouse data is mocked
- Update the `packages/app/tests/e2e/seed-clickhouse.ts` if (and only if) the scenario requires specific data
### Assertions Reference
- **Assert successful chart loads** by checking that `.recharts-responsive-container` is visible.
- E2E tests run against a local Docker environment with seeded ClickHouse data
- Update `packages/app/tests/e2e/seed-clickhouse.ts` only if the scenario requires specific data not already seeded

8
.cursor/mcp.json Normal file
View file

@ -0,0 +1,8 @@
{
"mcpServers": {
"playwright-test": {
"command": "npx",
"args": ["playwright", "run-test-mcp-server"]
}
}
}

View file

@ -0,0 +1,12 @@
---
description: HyperDX Playwright E2E test conventions for writing, reviewing, and fixing tests. Use when creating, editing, or debugging any E2E test in this project.
globs:
alwaysApply: false
---
When writing, reviewing, or fixing Playwright E2E tests for this project, follow the conventions in @.claude/skills/playwright/SKILL.md.
To run tests:
```bash
./scripts/test-e2e.sh --quiet <file> [--grep "\"<pattern>\""]
```

11
.gitignore vendored
View file

@ -1,3 +1,14 @@
# Claude Code user-local settings (not project config)
.claude/settings.local.json
# Override global .gitignore to track project-level AI tooling configs
!.cursor
!.cursor/mcp.json
# Playwright MCP scratch files
seed.spec.ts
specs/
# misc
**/.DS_Store
**/*.pem

8
.mcp.json Normal file
View file

@ -0,0 +1,8 @@
{
"mcpServers": {
"playwright-test": {
"command": "npx",
"args": ["playwright", "run-test-mcp-server"]
}
}
}

View file

@ -68,6 +68,23 @@ To develop from WSL, follow instructions
## Testing
### E2E Tests
E2E tests run against a full local stack (MongoDB + ClickHouse + API). Docker must be running.
```bash
# Run all E2E tests
./scripts/test-e2e.sh
# Run a specific spec file
./scripts/test-e2e.sh --quiet packages/app/tests/e2e/features/<feature>.spec.ts
# Run a specific test by name
./scripts/test-e2e.sh --quiet packages/app/tests/e2e/features/<feature>.spec.ts --grep "\"test name\""
```
Tests live in `packages/app/tests/e2e/`. Page objects are in `page-objects/`, shared components in `components/`.
### Integration Tests
To run the tests locally, you can run the following command:
@ -91,6 +108,24 @@ common-utils) to test and run:
yarn dev:unit
```
## AI-Assisted Development
The repo ships with configuration for AI coding assistants that enables interactive browser-based E2E test generation and debugging via the [Playwright MCP server](https://github.com/microsoft/playwright-mcp).
### Claude Code
The project includes agents and skills for test generation, healing, and planning under `.claude/`. These are loaded automatically when you open the project in Claude Code. No additional setup required.
### Cursor
A Playwright MCP server config is included at `.cursor/mcp.json`. To activate it:
1. Open **Cursor Settings → Tools & MCP**
2. The `playwright-test` server should appear automatically from the project config
3. Enable it
This gives Cursor's AI access to a live browser for test exploration and debugging.
## Additional support
If you need help getting started,

View file

@ -153,24 +153,20 @@ persistence, and real backend features:
```typescript
import { expect, test } from '../../utils/base-test';
import { SearchPage } from '../page-objects/SearchPage';
test.describe('My Feature', () => {
test.describe('My Feature', { tag: '@full-stack' }, () => {
test('should allow authenticated user to save search', async ({ page }) => {
// User is already authenticated (via global setup in full-stack mode)
await page.goto('/search');
const ts = Date.now();
const searchPage = new SearchPage(page);
// Query local Docker ClickHouse seeded data
await page.fill('[data-testid="search-input"]', 'ServiceName:"frontend"');
await page.click('[data-testid="search-submit-button"]');
await searchPage.goto();
await searchPage.openSaveSearchModal();
await searchPage.savedSearchModal.saveSearchAndWaitForNavigation(
`My Saved Search ${ts}`,
);
// Save search (uses real MongoDB for persistence)
await page.click('[data-testid="save-search-button"]');
await page.fill('[data-testid="search-name-input"]', 'My Saved Search');
await page.click('[data-testid="confirm-save"]');
// Verify saved search persists
await page.goto('/saved-searches');
await expect(page.getByText('My Saved Search')).toBeVisible();
await expect(searchPage.alertsButton).toBeVisible();
});
});
```
@ -179,9 +175,72 @@ test.describe('My Feature', () => {
`@full-stack` so that when running with `./scripts/test-e2e.sh --local`, they
are skipped appropriately.
### Claude Skill
### Page Object Pattern
Use the `/playwright <requirements to test>` command in Claude Code to have Claude help write E2E tests. Update `.claude/skills/playwright/SKILL.md` with additional guidance whenever Claude does poorly.
All UI interactions in spec files must go through page objects (`page-objects/`) and components (`components/`). Never use raw `page.getByTestId()`, `page.locator()`, or `page.getByRole()` directly in spec files. If a needed interaction doesn't exist in a page object, add it there first.
### Data Isolation
Tests run in parallel and share a database. Use `Date.now()` for **every field the API uniqueness-checks** — not just display names:
```typescript
const ts = Date.now();
const name = `E2E Thing ${ts}`;
const url = `https://example.com/thing-${ts}`; // URL fields too, not just name
```
The webhook API enforces uniqueness on `(team, service, url)`. A hardcoded URL will collide between parallel runs or retries and cause the form to stay open (API returns 400).
### Scoped Assertions
Never assert global counts — other tests' data is in the shared DB. Scope assertions to the current test's unique data:
```typescript
// ❌ Brittle — other tests' alerts pollute the count
await expect(alertsPage.getAlertCards()).toHaveCount(1);
// ✅ Scoped to this test's data
await expect(
alertsPage.pageContainer.getByRole('link').filter({ hasText: name }),
).toBeVisible();
```
### AI-Assisted Test Writing
The project ships with AI tooling for generating, fixing, and planning E2E tests using a live browser via the [Playwright MCP server](https://github.com/microsoft/playwright/tree/main/packages/playwright-mcp).
#### Claude Code
Use the `/playwright <description>` skill. It orchestrates three agents:
- **`playwright-test-generator`** — drives a real browser, executes steps live, writes spec code following HyperDX conventions
- **`playwright-test-healer`** — debugs failing tests interactively using the MCP browser tools
- **`playwright-test-planner`** — explores the UI and produces a structured test plan before writing code
```
/playwright write a test that creates an alert from a saved search
```
The skill automatically runs the test after generation and invokes the healer if it fails. Update `.claude/skills/playwright/SKILL.md` if the output doesn't match project conventions.
#### Cursor
The Playwright MCP server is pre-configured in `.cursor/mcp.json`. Enable it under **Settings → Tools & MCP**.
To write a test, reference the `@playwright` rule in your prompt — it loads all HyperDX conventions automatically:
```
@playwright write a new E2E test at packages/app/tests/e2e/features/search.spec.ts
that verifies a user can save a search and see it in the sidebar
```
To fix a failing test:
```
@playwright this test is failing with [error]. Debug and fix it using the Playwright MCP tools.
```
The `@playwright` rule is a thin wrapper that points to `.claude/skills/playwright/SKILL.md` as the single source of truth for conventions — so both Claude Code and Cursor stay in sync automatically.
## Test Organization